Hi Peter,
yes, you are right. Below is the output. I am finding
the order of the output a little confusion - it would
be good if SWISH-e would output the file name before
it starts processing. Anyway, I am open to
suggestions. As far as I can tell, it's just unhappy
with the PDF. So to me it seems the PDF parsing is
somehow different from the pipe example.
Gertjan
[ghofman@bi35-sensorinfo tmp]$ swish-e -v 5 -c
swish_file.conf -S prog
Parsing config file 'swish_file.conf'
Indexing Data Source: "External-Program"
Indexing "/room/swish_index/DirTree.pl"
External Program found: /room/swish_index/DirTree.pl
Error: May not be a PDF file (continuing anyway)
Error (0): PDF file is damaged - attempting to
reconstruct xref table...
Error: Couldn't find trailer dictionary
Error: Couldn't read xref table
/home/ghofman/tmp10/swish_text.pdf - Using HTML2
parser - (no words indexed)
Removing very common words...
no words removed.
Writing main index...
err: No unique words indexed!
--- Peter Karman <peter@peknet.com> wrote:
> I was suggesting that the -v3 option would tell you
> if swish-e was in
> fact parsing swish_test.pdf or if somehow it was
> being passed something
> different. I just tried your example here and it
> worked for me, so I was
> suggesting a way for you to start to debug what's
> going on.
>
> Gertjan Hofman scribbled on 6/30/06 3:59 PM:
> >
> > Peter -
> >
> > Not sure I understand - I am passing only 1 file -
> > swish_test.pdf (as indiced in the config file I
> > enclosed). Of course I started with entire
> folders
> > but for sake of demonstration of the problem only
> > parse the one file
> >
> > I note there are older messages in the mailing
> list
> > with similar sounding problems - in that case
> > spider.pl failed from a config file but worked in
> a
> > pipe...
> >
> > Thanks
> >
> > Gertjan
> >
> >
> > --- Peter Karman <peter@peknet.com> wrote:
> >
> >>
> >> Gertjan Hofman scribbled on 6/29/06 11:59 PM:
> >>
> >>> TRY 1: USING CONFIG FILE
> >>>
> >>> gertjan-laptop:~/tmp/swish_test> swish-e -S prog
> >> -c
> >>> swish_file.conf
> >>> Indexing Data Source: "External-Program"
> >>> Indexing "./DirTree.pl"
> >>> External Program found: ./DirTree.pl
> >>> Error: May not be a PDF file (continuing anyway)
> >>> Error (0): PDF file is damaged - attempting to
> >>> reconstruct xref table...
> >>> Error: Couldn't find trailer dictionary
> >>> Error: Couldn't read xref table
> >>> Removing very common words...
> >>> no words removed.
> >>> Writing main index...
> >>> err: No unique words indexed!
> >>>
> >>
> >> add the -v3 option to get more verbose. That
> should
> >> tell you the name of
> >> the file being parsed with SWISH::Filter (xpdf).
> I'm
> >> betting the file
> >> isn't getting passed correctly.
> >>
> >> --
> >> Peter Karman . http://peknet.com/ .
> >> peter@peknet.com
> >>
> >
> >
> > __________________________________________________
> > Do You Yahoo!?
> > Tired of spam? Yahoo! Mail has the best spam
> protection around
> > http://mail.yahoo.com
> >
>
> --
> Peter Karman . http://peknet.com/ .
> peter@peknet.com
>
__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com
Received on Fri Jun 30 15:16:10 2006