On Tue, Aug 26, 2003 at 06:54:03AM -0700, Bucharow Leonard wrote:
> Hi Bill,
>
> sorry for newbe questions, I hope you help me though:
>
> I'm trying now indexing file system, cause the spider.pl is not greatly
> suitable for search in Intranet with Java-Plugin's, JavaScript and PHP
> dynamic sites.
Can't process javascript without a javascript interpeter, but for PHP
I'd think you would want to spider instead of using the file system
search.
> Parsing PDF files works fine excepting few PDF files with
> error: "Bad annotation destination" or "Bad annotation action". I've read
> that comes from xpdf (pdfinfo or pdftotext). The xpdf help is unfortunately
> not huge. Do you know, what does it mean and what is at pdf files wrong?
No I don't, and google isn't much help. I just set mail to the xpdf
author, but you might also try asking on a group like comp.text.pdf.
> The second question:
> How can I filter MS Word files with -S fs indexing (if you have a solution
> for PowerPoint and Excel, it would be great)?
Here's some options:
1) use spider.pl with SWISH::Filter
2) if you have a good reason not to spider (like files are not
available on a web server) use the prog-bin/DirTree.pl example program
and copy in the code from SwishSpiderConfig.pl to use SWISH::Filter
3) try the filters listed at http://www.spocom.com/users/gjohnson/mutt/
and use a FileFilter directive. I have not tried those filters.
Google might find other solutions.
--
Bill Moseley
moseley@hank.org
Received on Tue Aug 26 15:25:51 2003