Skip to main content.
home | support | download

Back to List Archive

Re: SWISH-E and Windows XP and a definitive config file

From: J. David Boyd <david(at)not-real.adboyd.com>
Date: Fri Oct 14 2005 - 13:24:42 GMT
David L Norris wrote:
> On Thu, 2005-10-13 at 13:53 -0700, J. David Boyd wrote:
> 
>>Running
>>'swish-e -c xxx -T index_words_only'
>>or
>>'swish-e -c xxx -T parse_words'
>>or anything shows that the words it is finding are garbage.  It looks as
>>if the pdftotext code is not running.
> 
> 
> That's certainly a possibility.  Hard to say without an example of how
> you're running the index process, though.
> 

Here's another one, based on something I got of the archives

------------------------------------
swish.cfg:
IndexDir c:/xfer/swish
FileFilter .pdf ./lib/swish-e/swish_filter.pl '"%p" "%P"'
------------------------------------
and this outputs, running as 'swish-e -c swish.cfg', from the directory
where I installed SWISH-E
------------------------------------
Indexing Data Source: "File-System"
Indexing "c:/xfer/swish"
Removing very common words...
no words removed.
Writing main index...
Sorting words ...
Sorting 135 words alphabetically
Writing header ...
Writing index entries ...
  Writing word text: ...
  Writing word text: Complete
  Writing word hash: ...
  Writing word hash:  10%
  Writing word hash:  20%
  Writing word hash:  30%
  Writing word hash:  40%
  Writing word hash:  50%
  Writing word hash:  60%
  Writing word hash:  70%
  Writing word hash:  80%
  Writing word hash:  90%
  Writing word hash: 100%
  Writing word hash: Complete
  Writing word data: ...
  Writing word data: Complete
135 unique words indexed.
Sorting property: swishdocpath
Sorting property: swishtitle
Sorting property: swishdocsize
Sorting property: swishlastmodified
4 properties sorted.
5 files indexed.  416,599 total bytes.  282 total words.
Elapsed time: 00:00:00 CPU time: 00:00:00
Indexing done!
------------------------------------

However, the only PDF file in the directory being indexed contains 4
words, each on its own line - abercrombie, fitch, sears, roebuck.

I must be doing something wrong.

I can certainly index html files with no problem whatsoever, so I know
the basic program functionality is there.
Received on Fri Oct 14 06:24:46 2005