Skip to main content.
home | support | download

Back to List Archive

DirTree works in pipe but not config file on PDF

From: Gertjan Hofman <gertjan_hofman(at)not-real.yahoo.com>
Date: Fri Jun 30 2006 - 05:00:48 GMT
Dear Swish users,

I am new at using this indexing tool - still trying to
set it up. I ran into a snag that may be my
incompetence or a bug.
I want to use DirTree.pl but noticed it was not
parsing PDFs. I have xpdf etc installed.  However, in
piping mode, it works fine.

I made a sample PDF file using soffice. pdftotext
parses it fine.  Using the pipe command:
 ./DirTree.pl swish_test.pdf | swish-e -i stdin -S
prog

works like a charm (see complete output below). But
with my simple 3 line config file, it fails

# use spider for the web pages
# 
IndexDir ./DirTree.pl

SwishProgParameters ./swish_test.pdf

# end of the config tile


See complete file below - no keywords are found and it
complains about a damaged PDF. What is going ? I am
happy to provide more files/example etc.

Much appreciated

Gertjan
 Using swish 2.4.3 on Kubuntu 6.06


TRY 2: USING PIPE
gertjan-laptop:~R/tmp/swish_test> ./DirTree.pl
swish_test.pdf | swish-e -i stdin -S prog

Indexing Data Source: "External-Program"
Indexing "stdin"
Removing very common words...
no words removed.
Writing main index...
Sorting words ...
Sorting 33 words alphabetically
Writing header ...
Writing index entries ...
  Writing word text: Complete
  Writing word hash: Complete
  Writing word data: Complete
33 unique words indexed.
4 properties sorted.                                  
           
1 file indexed.  564 total bytes.  38 total words.
Elapsed time: 00:00:01 CPU time: 00:00:00
Indexing done!


TRY 1: USING CONFIG FILE

gertjan-laptop:~/tmp/swish_test> swish-e -S prog -c
swish_file.conf
Indexing Data Source: "External-Program"
Indexing "./DirTree.pl"
External Program found: ./DirTree.pl
Error: May not be a PDF file (continuing anyway)
Error (0): PDF file is damaged - attempting to
reconstruct xref table...
Error: Couldn't find trailer dictionary
Error: Couldn't read xref table
Removing very common words...
no words removed.
Writing main index...
err: No unique words indexed!




__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 
Received on Thu Jun 29 22:00:59 2006