I've continued work on trying to get Swish-e to be able to index the pdf
files. I went through the motions
of setting up the swish.conf file according to the instructions listed on
the website. Here is what the file
text looks like:
# Swish-e config to index the /www directory
# Use spider.pl for indexing (location of spider.pl set at installation
# Use spider.pl's default configuration and specify the URL to spider
SwishProgParameters default http://localhost/www
# Allow extra searching by title, path Metanames swishtitle swishdocpath
Metanames swishtitle swishdocpath
# Set StoreDescription for each parser to display context with search
StoreDescription HTML* <body> 200000
StoreDescription TXT* <body> 200000
I ran the command swish-e -S prog -c swish.conf and the result was the
Indexing Data Source: "External-Program"
External Program found: /usr/local/lib/swish-e/spider.pl
Removing very common words...
no words removed.
Writing main index...
err: No unique words indexed!
I have had no luck in resolving this issue. I am the point where I am ready
to install a pdf to word converter
program that will change all the pdf files to .doc and/or .rtf files. Unless
there is something else that I
have missed, I have run out of ideas.
From: Bill Moseley [mailto:firstname.lastname@example.org]
Sent: Thursday, June 10, 2004 6:00 PM
To: Kaplan, Andrew H.
Subject: Re: [SWISH-E] Re: Error Message: Index file error: Could not
Please keep your questions to the list in the future.
On Thu, Jun 10, 2004 at 05:48:17PM -0400, Kaplan, Andrew H. wrote:
> Hi there --
> I'm not trying to index MP3 tags. The reason the MP3 package is installed
> to my following the installation instructions.
> Will my removing the MP3::Tag package and running the script again resolve
Yes, or following the instructions I sent on updating Filter.pm.
> If the indexing method that I mentioned does not automatically index the
> files, what else do I need to configure to get the
> pdf files to appear on the list? When the indexing takes place, it appears
> swish-e is reading the pdf files and creating an
> index based on them.
Swish-e doesn't know how to index PDF files without using a helper
program. There's a few ways to use a helper program:
1) use FileFilter -- swish-e will pass the content through the filter
2) use spider.pl in the default config setup and it will attempt to use
a Perl module called SWISH::Filter and automatically filter PDF and MS
3) other ways you don't need to worry about right now.
I've explained all of this before. The documentation explains this.
I also pointed you to instructions on how to ask questions that will
help get your problem solved. Please review all of that again.
Received on Mon Jun 14 15:45:53 2004