On Mon, Oct 06, 2003 at 06:25:10AM -0700, Christopher.Bragg@sth.nhs.uk wrote:
> Hi... 2 queries
> I am trying to index an intranet site using version 2.2.3 All was going
> well, but suddenly I have been receiving "err: the index file format is
> unknown" errors whenever I try to search the index files from the
> command line.
Is that the actual error message?
moseley@bumby:~/swish-e/src$ fgrep 'format is unknown' *.h
moseley@bumby:~/swish-e/src$ fgrep 'format is unknown' *.c
Anyway, are you sure you are searching the same file you are indexing?
> Also, is there any way to stop the parsing of .gif and .jpg files - I
> have the NoContents towards the end, but this isn't stopping it - is it
> possible to specifically block them from the parser?
Can you show an example?
moseley@bumby:~$ swish-e -i current_view.jpg | grep 'total words'
1 file indexed. 38177 total bytes. 7409 total words.
moseley@bumby:~$ cat c
moseley@bumby:~$ swish-e -c c -i current_view.jpg | grep 'total words'
1 file indexed. 38177 total bytes. 3 total words.
moseley@bumby:~$ swish-e -c c -i current_view.jpg -v0 -T indexed_words
Adding:[1:swishdefault(1)] 'current' Pos:110 Stuct:0x41 ( EM FILE )
Adding:[1:swishdefault(1)] 'view' Pos:111 Stuct:0x41 ( EM FILE )
Adding:[1:swishdefault(1)] 'jpg' Pos:112 Stuct:0x41 ( EM FILE )i
> IndexContents HTML* .htm .html .shtml
> IndexContents TXT .txt .log .text
I'd use TXT* there, too.
> IndexContents XML* .xml
> #DefaultContents HTML
> Metanames swishdocpath swishtitle
> #PropertyNamesMaxLength 1000 swishdescription
You don't need that because you are using 2000 below in StoreDescription
> #PropertyNameAlias swishdescription body
> #StoreDescription TXT 2000
> #StoreDescription HTML* <body> 2000
> IndexReport 3
> FollowSymLinks yes
> IgnoreTotalWordCountWhenRanking yes
That's the default since version 2.2.
> IgnoreLimit 50 1000
I think there's reasons not to use IgnoreLimit which I now forget,
but the use of stopwrods in general is a debatable issue.
> #IndexComments 0
> # This option allows the user decide if to index the comments in the
> # default is 1. Set to 0 if comment indexing is not required.
I think the default is 0 as of version 2.2.
Received on Mon Oct 6 14:04:25 2003