yes, you are right. Below is the output. I am finding
the order of the output a little confusion - it would
be good if SWISH-e would output the file name before
it starts processing. Anyway, I am open to
suggestions. As far as I can tell, it's just unhappy
with the PDF. So to me it seems the PDF parsing is
somehow different from the pipe example.
[ghofman@bi35-sensorinfo tmp]$ swish-e -v 5 -c
swish_file.conf -S prog
Parsing config file 'swish_file.conf'
Indexing Data Source: "External-Program"
External Program found: /room/swish_index/DirTree.pl
Error: May not be a PDF file (continuing anyway)
Error (0): PDF file is damaged - attempting to
reconstruct xref table...
Error: Couldn't find trailer dictionary
Error: Couldn't read xref table
/home/ghofman/tmp10/swish_text.pdf - Using HTML2
parser - (no words indexed)
Removing very common words...
no words removed.
Writing main index...
err: No unique words indexed!
--- Peter Karman <email@example.com> wrote:
> I was suggesting that the -v3 option would tell you
> if swish-e was in
> fact parsing swish_test.pdf or if somehow it was
> being passed something
> different. I just tried your example here and it
> worked for me, so I was
> suggesting a way for you to start to debug what's
> going on.
> Gertjan Hofman scribbled on 6/30/06 3:59 PM:
> > Peter -
> > Not sure I understand - I am passing only 1 file -
> > swish_test.pdf (as indiced in the config file I
> > enclosed). Of course I started with entire
> > but for sake of demonstration of the problem only
> > parse the one file
> > I note there are older messages in the mailing
> > with similar sounding problems - in that case
> > spider.pl failed from a config file but worked in
> > pipe...
> > Thanks
> > Gertjan
> > --- Peter Karman <firstname.lastname@example.org> wrote:
> >> Gertjan Hofman scribbled on 6/29/06 11:59 PM:
> >>> TRY 1: USING CONFIG FILE
> >>> gertjan-laptop:~/tmp/swish_test> swish-e -S prog
> >> -c
> >>> swish_file.conf
> >>> Indexing Data Source: "External-Program"
> >>> Indexing "./DirTree.pl"
> >>> External Program found: ./DirTree.pl
> >>> Error: May not be a PDF file (continuing anyway)
> >>> Error (0): PDF file is damaged - attempting to
> >>> reconstruct xref table...
> >>> Error: Couldn't find trailer dictionary
> >>> Error: Couldn't read xref table
> >>> Removing very common words...
> >>> no words removed.
> >>> Writing main index...
> >>> err: No unique words indexed!
> >> add the -v3 option to get more verbose. That
> >> tell you the name of
> >> the file being parsed with SWISH::Filter (xpdf).
> >> betting the file
> >> isn't getting passed correctly.
> >> --
> >> Peter Karman . http://peknet.com/ .
> >> email@example.com
> > __________________________________________________
> > Do You Yahoo!?
> > Tired of spam? Yahoo! Mail has the best spam
> protection around
> > http://mail.yahoo.com
> Peter Karman . http://peknet.com/ .
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
Received on Fri Jun 30 15:16:10 2006