Skip to main content.
home | support | download

Back to List Archive

Re: Filter Word Files with -S fs indexing

From: <moseley(at)not-real.hank.org>
Date: Tue Aug 26 2003 - 15:24:11 GMT
On Tue, Aug 26, 2003 at 06:54:03AM -0700, Bucharow Leonard wrote:
> Hi Bill,
> 
> sorry for  newbe questions, I hope you help me though:
> 
> I'm trying now indexing file system, cause the spider.pl is not greatly
> suitable for search in Intranet with Java-Plugin's, JavaScript and PHP
> dynamic sites.

Can't process javascript without a javascript interpeter, but for PHP 
I'd think you would want to spider instead of using the file system 
search.


> Parsing PDF files works fine excepting few PDF files with
> error: "Bad annotation destination" or "Bad annotation action". I've read
> that comes from xpdf (pdfinfo or pdftotext). The xpdf help is unfortunately
> not huge. Do you know, what does it mean and what is at pdf files wrong?

No I don't, and google isn't much help.  I just set mail to the xpdf 
author, but you might also try asking on a group like comp.text.pdf.

> The second question:
> How can I filter MS Word files with -S fs indexing (if you have a solution
> for PowerPoint and Excel, it would be great)?

Here's some options:

1) use spider.pl with SWISH::Filter

2) if you have a good reason not to spider (like files are not 
available on a web server) use the prog-bin/DirTree.pl example program 
and copy in the code from SwishSpiderConfig.pl to use SWISH::Filter

3) try the filters listed at http://www.spocom.com/users/gjohnson/mutt/
and use a FileFilter directive.  I have not tried those filters.

Google might find other solutions.


-- 
Bill Moseley
moseley@hank.org
Received on Tue Aug 26 15:25:51 2003