Skip to main content.
home | support | download

Back to List Archive

Re: [swish-e] Swish-e not indexing doc or PDF files

From: Peter Karman <peter(at)not-real.peknet.com>
Date: Tue Feb 12 2008 - 03:00:50 GMT
Liam Buchanan wrote on 2/11/08 6:26 PM:
> Hi,
> Hope someone can suggest a solution to this frustrating problem.
> We are running swish-e on our development server that indexes our
> production intranet server. However the problem lies in the inability
> for the indexing to process .doc or PDF files. When the search reaches a
> hyperlink that is linked to a PDF or doc file the process halts and the
> error message is produced below (under output)
>  Before running swish-e, we connect to our production server via a proxy
> connection first (ntlmaps)

it isn't clear to me how you are aggregating your documents. spider.pl ? Some 
other crawler?

The FileFilter config can work at odds with the SWISH::Filter stuff in 
spider.pl, effectively trying to convert non-text files 2x.

Try indexing one, troublesome, document. Break down the process: fetching the 
doc, feeding it to swish-e, etc. Turn on verbosity and the -T debugging options.

-- 
Peter Karman  .  http://peknet.com/  .  peter(at)not-real.peknet.com
_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Mon Feb 11 22:00:54 2008