The problem that I am now facing appears to have to do with the substituting
of embedded null characters in the
pdf and doc files that I am indexing. I did a check of the discussion lists
and it would seem this has to do with
the fact the files are binary. The first thing I tried was including the
IndexOnly, IndexContents and NoContents
lines added to the swish.conf file. That did not make a difference. I tried
using the spider.pl approach, but I am
unclear as to what and where is the SwishSpiderConfig.pl file. Could you
[mailto:email@example.com]On Behalf Of Bill Moseley
Sent: Monday, June 14, 2004 12:46 PM
To: Multiple recipients of list
Subject: [SWISH-E] Re: Error Message: Index file error: Could not open
On Mon, Jun 14, 2004 at 11:40:52AM -0400, Kaplan, Andrew H. wrote:
> I've continued work on trying to get Swish-e to be able to index the pdf
So, to be clear, the problem is what?
> I went through the motions
> of setting up the swish.conf file according to the instructions listed on
> the website. Here is what the file
> text looks like:
> IndexDir spider.pl
> SwishProgParameters default http://localhost/www
> Metanames swishtitle swishdocpath
> StoreDescription HTML* <body> 200000
> StoreDescription TXT* <body> 200000
I will note for the archives that those StoreDescription directives will
only work if the -S prog program tells swish-e the document type (as
spider.pl does), otherwise you need DefaultContents or IndexContents to
map a file extension to a type like HTML* or TXT*.
> I ran the command swish-e -S prog -c swish.conf and the result was the
> Indexing Data Source: "External-Program"
> Indexing "spider.pl"
> External Program found: /usr/local/lib/swish-e/spider.pl
> Removing very common words...
> no words removed.
> Writing main index...
> err: No unique words indexed!
> I have had no luck in resolving this issue.
Now, have you read any of my responses?
So there's no words indexed. So why not? Repeating, run the spider by
itself. Is it generating output? Yes or No. If No figure out why by
turning on debugging as I explained before. If Yes then figure out why
swish-e isn't indexing.
If the spider isn't indexing because it can't convert the PDF files, use
swish-filter-test program. See the debugging and testing comments at:
> I am the point where I am ready to install a pdf to word converter
> program that will change all the pdf files to .doc and/or .rtf files.
> Unless there is something else that I have missed, I have run out of
My vote is you missed something.
Received on Tue Jun 15 15:04:15 2004