Patrick Fitzgerald wrote:
Hello!
> >
> >I've made some enhancements to swish-e 1.1 to index Non-Text or HTML
> >files (e.g. to get PDF-files indexed) [I've sent the code changes to
> > Roy].
>
> Could you describe the code changes?
Starting a filter program as child process. The output of the filter prog
will be piped to the swish-e process. It was a minor change to the "Index
the current file" - routine of swish-e. It was more work to do to build in
the config file directives.
> Do you directly index the PDF files?
Yes. - I've implemented a FileFilter option, which enable you to include
filters for any filetype.
e.g. for PDF the entry in the config file:
FileFilter .pdf _pdf_filter.sh
The _pdf_filter.sh - prog is very simple:
#!/bin/sh
pdftotext "$1" - 2>/dev/null
... using the xpdf utility (pdftotext).
> To index PDF files, I implemented the following workaround:
>
> 1. For every PDF file (for example, "myfile.pdf"), create a file
> "myfile.pdf.html" that contains the plain text to be indexed.
> [...]
That's is to complicated to handle for me in practice. ;-)
The filter progs have to convert the contents of a file (pdf, word, xls)
to standard text and printing it on STDOUT.
cu -- Rainer
Received on Tue Aug 11 01:36:42 1998