Hi there --
I went through the motions that you described and I still have not made any
Your mentioning the document structure had my thinking about the files
Currently there are pdf files in the directory in question, and the filenames
most of them have spaces. I have installed the xpdf program onto the system, but
there is something else that needs to be installed or configured. Your thoughts?
I am going to set the ParserWarnLevel to 9 in the swish.conf file and e-mail you
results. Thanks again for your help.
[mailto:email@example.com]On Behalf Of Bill Moseley
Sent: Sunday, June 06, 2004 12:12 PM
To: Multiple recipients of list
Subject: [SWISH-E] Re: Error Message: Index file error: Could not open
On Sun, Jun 06, 2004 at 11:56:53AM -0400, Kaplan, Andrew H. wrote:
> Hi there --
> I'm sorry for sounding stupid, but could you elaborate on making sure
> that "Head" is in the index? Also, aside from the cgi script, what is
> the command syntax I would use to search the index? Thanks.
So, the situation is you index some files and then you search for "head"
and it says "no results" but you are sure it should be found because you
know it's in the file "body_parts.html".
So then you run swish like this:
swish-e -c myconfig -i body_parts.html -T indexed_words | grep head
and you see something like:
Adding:[1:swishdefault(1)] 'head' Pos:24 Stuct:0x9 ( BODY FILE )
which says the word "head" was indexed in file number 1 under metaname
"swishdefault" at word position number 24 and is in the BODY of the
Then you know you can do:
swish-e -w head
swish-e -w swishdefault=(head)
and swish-e will find it.
Now, if you don't see "head" in the output you then look at why it's not
getting indexed. What I'd likely do is run without grep
swish-e -c myconfig -i body_parts.html -T indexed_words | less
and then look for words that you know are around "head" in the document
and that might give you an idea what to look for.
Maybe you have a format error in body_parts.html? Adding to your swish
might generate some warnings about the structure of your document.
Maybe "head" is in an HTML comment? Then you need to enable indexing of
Maybe the above all works find, but when spidering the file is skipped?
If that's the case then you need to figure out why. spider.pl has
debugging features to tell you why a file is skipped.
The answer is divide et impera.
Received on Mon Jun 7 05:51:45 2004