Skip to main content.
home | support | download

Back to List Archive

Re: [swish-e] How swish-e returns PDF's meta description

From: Daqi Li <dli7mar(at)not-real.yahoo.com>
Date: Thu Oct 15 2009 - 13:19:19 GMT
Hi Peter,

Thank you so much for your help.

Sorry about my_pdf2html.pl, it is the same as _pdf2html.pl. The only difference was I dumped the converted html into a file to see its meta contents.

I made the changes to swish.conf as you said. After that the output screen had one more line to show some information, but the result is still not as good as I expected. (Please see the attached two file). 

I appreciate for any more helps,
-Daqi



--- On Thu, 10/15/09, Peter Karman <peter@peknet.com> wrote:

> From: Peter Karman <peter@peknet.com>
> Subject: Re: [swish-e] How swish-e returns PDF's meta description
> To: "Swish-e Users Discussion List" <users@lists.swish-e.org>
> Date: Thursday, October 15, 2009, 12:46 AM
> Daqi Li wrote on 10/14/09 2:54 PM:
> > Hi,
> > 
> > I have swish-e-2.4.7 on Linux Fedora core 8 (see below
> uname -a).
> > 
> > I have PDF documents that have the summaries in their
> meta description (or keywords). When I do searches, if the
> keyword is found in a pdf body or title, I need swish-e
> returns its size, last modify date, etc. as well as the meta
> description (or keywords). Here are the things I did:
> >  
> > 1. I copied your swish.cgi to /var/www/cgi-bin.
> > 2. created .swishcgi.conf in /var/www/cgi-bin (as the
> attached).
> > 3. Created swish.conf in /var/www/cgi-bin (as the
> attached).
> > 3. Ran the command to index the files:
> >     swish-e -c swish.conf
> > 4. Then browsed to the URL http://localhost/cgi-bin/swish.cgi.
> > 
> > Here is The search result I got:
> > 1 09-71298_Spina_mem_op-signed.pdf -- rank: 1000 
> > Title: 09-71298_Spina_mem_op-signed.pdf 
> > Last Modified Date: 2009-10-14 12:44:14 EDT 
> > Document Size: 127153 
> > Description: (null) 
> > Keywords: 
> 
> try these changes in your swish.conf:
> 
> # don't know what my_pdf2html.pl looks like, but
> swish-filter-test
> # does the trick
> FileFilter .pdf /your/path/to/swish-filter-test '-headers
> -content %p'
> 
> # add the missing * after the parser type
> StoreDescription HTML* <meta> 1000
> StoreDescription TXT* 1000
> 
> -- 
> Peter Karman  .  http://peknet.com/  .  peter(at)not-real.peknet.com
> _______________________________________________
> Users mailing list
> Users@lists.swish-e.org
> http://lists.swish-e.org/listinfo/users
>


      
_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Thu Oct 15 09:19:21 2009