Skip to main content.
home | support | download

Back to List Archive

Re: DirTree works in pipe but not config file on PDF

From: Gertjan Hofman <gertjan_hofman(at)not-real.yahoo.com>
Date: Fri Jun 30 2006 - 22:16:05 GMT
Hi Peter,

yes, you are right. Below is the output.  I am finding
the order of the output a little confusion - it would
be good if SWISH-e would output the file name before
it starts processing. Anyway, I am open to
suggestions. As far as I can tell, it's just unhappy
with the PDF. So to me it seems the PDF parsing is
somehow different from the pipe example.

Gertjan


[ghofman@bi35-sensorinfo tmp]$ swish-e -v 5 -c
swish_file.conf -S prog
Parsing config file 'swish_file.conf'
Indexing Data Source: "External-Program"
Indexing "/room/swish_index/DirTree.pl"
External Program found: /room/swish_index/DirTree.pl
Error: May not be a PDF file (continuing anyway)
Error (0): PDF file is damaged - attempting to
reconstruct xref table...
Error: Couldn't find trailer dictionary
Error: Couldn't read xref table
/home/ghofman/tmp10/swish_text.pdf - Using HTML2
parser -  (no words indexed)

Removing very common words...
no words removed.
Writing main index...
err: No unique words indexed!

--- Peter Karman <peter@peknet.com> wrote:

> I was suggesting that the -v3 option would tell you
> if swish-e was in 
> fact parsing swish_test.pdf or if somehow it was
> being passed something 
> different. I just tried your example here and it
> worked for me, so I was 
> suggesting a way for you to start to debug what's
> going on.
> 
> Gertjan Hofman scribbled on 6/30/06 3:59 PM:
> > 
> > Peter -
> > 
> > Not sure I understand - I am passing only 1 file -
> > swish_test.pdf (as indiced in the config file I
> > enclosed).  Of course I started with entire
> folders
> > but for sake of demonstration of the problem only
> > parse the one file
> > 
> > I note there are older messages in the mailing
> list
> > with similar sounding problems - in that case
> > spider.pl failed from a config file but worked in
> a
> > pipe...
> > 
> > Thanks
> > 
> > Gertjan
> > 
> > 
> > --- Peter Karman <peter@peknet.com> wrote:
> > 
> >>
> >> Gertjan Hofman scribbled on 6/29/06 11:59 PM:
> >>
> >>> TRY 1: USING CONFIG FILE
> >>>
> >>> gertjan-laptop:~/tmp/swish_test> swish-e -S prog
> >> -c
> >>> swish_file.conf
> >>> Indexing Data Source: "External-Program"
> >>> Indexing "./DirTree.pl"
> >>> External Program found: ./DirTree.pl
> >>> Error: May not be a PDF file (continuing anyway)
> >>> Error (0): PDF file is damaged - attempting to
> >>> reconstruct xref table...
> >>> Error: Couldn't find trailer dictionary
> >>> Error: Couldn't read xref table
> >>> Removing very common words...
> >>> no words removed.
> >>> Writing main index...
> >>> err: No unique words indexed!
> >>>
> >>
> >> add the -v3 option to get more verbose. That
> should
> >> tell you the name of 
> >> the file being parsed with SWISH::Filter (xpdf).
> I'm
> >> betting the file 
> >> isn't getting passed correctly.
> >>
> >> -- 
> >> Peter Karman  .  http://peknet.com/  . 
> >> peter@peknet.com
> >>
> > 
> > 
> > __________________________________________________
> > Do You Yahoo!?
> > Tired of spam?  Yahoo! Mail has the best spam
> protection around 
> > http://mail.yahoo.com 
> > 
> 
> -- 
> Peter Karman  .  http://peknet.com/  . 
> peter@peknet.com
> 


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 
Received on Fri Jun 30 15:16:10 2006