Skip to main content.
home | support | download

Back to List Archive

Re: SWISH-E and Windows XP and a definitive config file

From: J. David Boyd <david(at)not-real.adboyd.com>
Date: Fri Oct 14 2005 - 13:24:02 GMT
David L Norris wrote:
> On Thu, 2005-10-13 at 13:53 -0700, J. David Boyd wrote:
> 
>>Running
>>'swish-e -c xxx -T index_words_only'
>>or
>>'swish-e -c xxx -T parse_words'
>>or anything shows that the words it is finding are garbage.  It looks as
>>if the pdftotext code is not running.
> 
> 
> That's certainly a possibility.  Hard to say without an example of how
> you're running the index process, though.
> 

Okay, here's one example:
------------------------------------
swish.cfg:

IndexDir .
IndexContents TXT2 .pdf
FileFilter .pdf pdftotext "'%p' -"
------------------------------------
and this outputs: (running as 'swish-e -c swish.cfg', from the directory
where the PDF file is...
------------------------------------
Indexing Data Source: "File-System"
Indexing "."
Error: Couldn't open file ''.\test.pdf''
Removing very common words...
no words removed.
Writing main index...
Sorting words ...
Sorting 44 words alphabetically
Writing header ...
Writing index entries ...
  Writing word text: ...
  Writing word text: Complete
  Writing word hash: ...
  Writing word hash:  10%
  Writing word hash:  20%
  Writing word hash:  30%
  Writing word hash:  40%
  Writing word hash:  50%
  Writing word hash:  60%
  Writing word hash:  70%
  Writing word hash:  80%
  Writing word hash:  90%
  Writing word hash: 100%
  Writing word hash: Complete
  Writing word data: ...
  Writing word data: Complete
44 unique words indexed.
Sorting property: swishdocpath
Sorting property: swishtitle
Sorting property: swishdocsize
Sorting property: swishlastmodified
4 properties sorted.
7 files indexed.  416,453 total bytes.  69 total words.
Elapsed time: 00:00:00 CPU time: 00:00:00
Indexing done!
------------------------------------
And here is the output of  'swish-e -T index_words_only
------------------------------------
02
09
10
14
2
2005
29
3
3index
4
a
agmz
cfg0co
co
data
daylight
e
eastern
f
file
filefilter
format
frx
index
indexcontents
indexdir
indexing
p
pdf
pdf6c
pdftotext
prop
s
source
swish
system
tempco
test
time
txt2
x
z


------------------------------------
none of which are in the PDF file I am trying to index.

Dave
Received on Fri Oct 14 06:24:15 2005