On Tue, Oct 05, 2004 at 04:44:12PM -0700, Tim Hartley wrote:
> Hi all,
> I'm using the File System index method to create an index of about
> 450 pdf files. I can successfully do this, however it returns
> 'pdf_file_name.pdf' as the title, and I need it to display the
> actual pdf's title.
You use pdfinfo to extract out the title from the pdf.
> This is sort of discussed at these archives
> (http://swish-e.org/Discussion/archive/2003-12/6560.html and
> http://swish-e.org/archive/6277.html), but I'm assuming I can't call
> a 'filter_content' using the File System method, and I don't know
> how to tweak the PDF2HTML filter file to do what I need..
Why not use DirTree.pl and let it use SWISH::Filter to handle this?
It will be like faster than using the file system and calling the perl
script for every document.
I suppose you could use swish-filter-test to do the work for you (but
it would be very slow).
FilerFilter swish-filter-test "-content -quiet '%p'" .pdf
I'd recommend using DirTree.pl, though.
> ---Begin Config File (pdf_file_test.config)---
> #Name & location of the index file created by this search configuration
> IndexFile c:\swish-e\pdfTestIndex.index
> IndexDir C:\Inetpub\wwwroot\www.cm3.com\planetpdf\pdfs
> IndexOnly .pdf
> #Dont index anything other than the PDF directory
> FileMatch pathname contains pdfs
> IndexReport 3
> FilterDir C:/SWISH-E/lib/swish-e/perl/SWISH/Filters
> FileFilter ./pdf2html "'%p' -" /\\.pdf$/
> IndexContents HTML* .pdf .PDF
> StoreDescription HTML* <description> 200000
> PropertyNameAlias swishdescription title
> #Replace the pathname with a url
> ReplaceRules Replace "C:/Inetpub/wwwroot/www.cm3.com/" "http://cm3.planetpdf.com/"
> #run on cmd line: swish-e -S fs -c pdf_file_test.config
> ---End Config file---
> Any suggestions would be greatly appreciated!
Unsubscribe from or help with the swish-e list:
Help with Swish-e:
Received on Tue Oct 5 16:54:04 2004