Skip to main content.
home | support | download

Back to List Archive

Re: "index file format is unknown" error & parsing images

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Mon Oct 06 2003 - 14:04:24 GMT
On Mon, Oct 06, 2003 at 06:25:10AM -0700, Christopher.Bragg@sth.nhs.uk wrote:
> Hi... 2 queries
> I am trying to index an intranet site using version 2.2.3 All was going
> well, but suddenly I have been receiving  "err: the index file format is
> unknown"  errors whenever I try to search the index files from the
> command line.

Is that the actual error message?

	moseley@bumby:~/swish-e/src$ fgrep 'format is unknown' *.h    
	moseley@bumby:~/swish-e/src$ fgrep 'format is unknown' *.c
	moseley@bumby:~/swish-e/src$

Anyway, are you sure you are searching the same file you are indexing?


> Also, is there any way to stop the parsing of .gif and .jpg files - I
> have the NoContents towards the end, but this isn't stopping it - is it
> possible to specifically block them from the parser?

Can you show an example?

	moseley@bumby:~$ swish-e  -i current_view.jpg | grep 'total words'
	1 file indexed.  38177 total bytes.  7409 total words.

	moseley@bumby:~$ cat c
	NoContents .jpg

	moseley@bumby:~$ swish-e  -c c -i current_view.jpg | grep 'total words'
	1 file indexed.  38177 total bytes.  3 total words.
	moseley@bumby:~$ swish-e  -c c -i current_view.jpg -v0 -T indexed_words
	    Adding:[1:swishdefault(1)]   'current'   Pos:110  Stuct:0x41 ( EM FILE )
	    Adding:[1:swishdefault(1)]   'view'   Pos:111  Stuct:0x41 ( EM FILE )
	    Adding:[1:swishdefault(1)]   'jpg'   Pos:112  Stuct:0x41 ( EM FILE )i

> IndexContents HTML* .htm .html .shtml
> IndexContents TXT .txt .log .text
I'd use TXT* there, too.

> IndexContents XML* .xml
> #DefaultContents HTML
> 
> Metanames swishdocpath swishtitle
> #PropertyNamesMaxLength 1000 swishdescription
You don't need that because you are using 2000 below in StoreDescription

> #PropertyNameAlias swishdescription body
> #StoreDescription TXT 2000
> #StoreDescription HTML* <body> 2000
> 
> IndexReport 3
> FollowSymLinks yes
> IgnoreTotalWordCountWhenRanking yes

That's the default since version 2.2.



> IgnoreLimit 50 1000

I think there's reasons not to use IgnoreLimit which I now forget,
but the use of stopwrods in general is a debatable issue.


> 
> #IndexComments 0
> # This option allows the user decide if to index the comments in the
> files
> # default is 1. Set to 0 if comment indexing is not required.

I think the default is 0 as of version 2.2.



-- 
Bill Moseley
moseley@hank.org
Received on Mon Oct 6 14:04:25 2003