Skip to main content.
home | support | download

Back to List Archive

Re: Displaying a filtered PDF's title in <swishtitle>

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Tue Oct 05 2004 - 23:53:52 GMT
On Tue, Oct 05, 2004 at 04:44:12PM -0700, Tim Hartley wrote:
> Hi all,
> 
> I'm using the File System index method to create an index of about
> 450 pdf files. I can successfully do this, however it returns
> 'pdf_file_name.pdf' as the title, and I need it to display the
> actual pdf's title.

You use pdfinfo to extract out the title from the pdf.


> This is sort of discussed at these archives
> (http://swish-e.org/Discussion/archive/2003-12/6560.html and
> http://swish-e.org/archive/6277.html), but I'm assuming I can't call
> a 'filter_content' using the File System method, and I don't know
> how to tweak the PDF2HTML filter file to do what I need..

Why not use DirTree.pl and let it use SWISH::Filter to handle this?
It will be like faster than using the file system and calling the perl
script for every document.

I suppose you could use swish-filter-test to do the work for you (but
it would be very slow).

   FilerFilter swish-filter-test "-content -quiet '%p'" .pdf

I'd recommend using DirTree.pl, though.


> 
> ---Begin Config File (pdf_file_test.config)---
> #Name & location of the index file created by this search configuration
> IndexFile c:\swish-e\pdfTestIndex.index
> IndexDir C:\Inetpub\wwwroot\www.cm3.com\planetpdf\pdfs
> IndexOnly .pdf
> #Dont index anything other than the PDF directory
> FileMatch pathname contains pdfs
> IndexReport 3
> FilterDir C:/SWISH-E/lib/swish-e/perl/SWISH/Filters
> FileFilter ./pdf2html "'%p' -" /\\.pdf$/
> IndexContents HTML* .pdf .PDF
> StoreDescription HTML* <description> 200000
> PropertyNameAlias swishdescription title
> #Replace the pathname with a url
> ReplaceRules Replace "C:/Inetpub/wwwroot/www.cm3.com/" "http://cm3.planetpdf.com/"
> #run on cmd line: swish-e -S fs -c pdf_file_test.config
> ---End Config file--- 
> 
> Any suggestions would be greatly appreciated!
> 
> -Tim
> 
> 

-- 
Bill Moseley
moseley@hank.org

Unsubscribe from or help with the swish-e list: 
   http://swish-e.org/Discussion/

Help with Swish-e:
   http://swish-e.org/current/docs
   swish-e@sunsite.berkeley.edu
Received on Tue Oct 5 16:54:04 2004