Skip to main content.
home | support | download

Back to List Archive

Win 2000, swish-e Filters

From: Sharon Beall <beall2(at)not-real.llnl.gov>
Date: Tue Sep 30 2003 - 21:01:46 GMT
Hello,

I have swish-e running and working on a Unix box for years.  I now have to 
implement it on a Win2000 machine :(.  Install was easy.  I put it in 
C:\tools.  I can index and search fine, except I need to use the 
FileFilters for pdf, etc etc.  Just trying to do pdfs and I'm failing.

C:\tools\SWISH-E>swish-e -V
SWISH-E 2.2.3


This is my config file, mostly copied from the Online Docs and other 
people's postings:
# Filter Directory
  FilterDir C:/tools/SWISH-E/filter-bin

# include all the available filters and mappings for files that we index
# I copied these from some news postings I read...
# eventually will need all of these to work for me as well,
# copied temporarily, but will need the files they reference from somewheres.
# Use the file filter to index pdf files
#FileFilter .pdf c:/tools/SWISH-E/filter-bin/_pdf2html.pl '"%p" -'
#FileFilter .pdf c:/tools/SWISH-E/filter-bin/pdftotext.exe '"%p" -'
#FileFilter .pdf /tools/swish-e/filter-bin/_pdf2html.pl
#FileFilter .PDF /tools/swish_e/filter-bin/_pdf2html.pl
#Filefilter .ppt /tools/xlhtml-0.5.1/bin/ppthtml "'%p'"
#FileFilter .doc /tools/catdoc-0.91.5/bin/catdoc "-a -s8859-1 -d8859-1 '%p'"
#FileFilter .xls /tools/xlhtml-0.5.1/bin/xlhtml "-nc '%p'"

# IndexContents .pdf .PDF
#IndexContents HTML2 .pdf .ppt .PDF .xls
#IndexContents TXT2 .doc .xls .exe .zip .ZIP .tar.Z .tar.gz .tgz .tar
#IndexContents TXT2 .gz .z .Z .ps .rtf

  # Define *what* to index
  # IndexDir can point to a directories and/or a files
  IndexDir .

  # only index x files
  IndexOnly .pdf .asp

  # Show basic info while indexing
  IndexReport 1



Here are some runs below:
---------------------------------------
# This is a run with the FileFilter uncommented for pdf as well as 
IndexContents
C:\Inetpub\wwwroot>"C:\tools\SWISH-E\swish-e" -c swish-e.config
err: IndexContents: Unknown document type ".pdf"


# this is a run commenting out all FileFilters/IndexContents, and saying 
IndexOnly .pdf
C:\Inetpub\wwwroot>"C:\tools\SWISH-E\swish-e" -c swish-e.config
Indexing Data Source: "File-System"
Indexing "."
Removing very common words...
no words removed.
Writing main index...
Sorting words ...
Sorting 235 words alphabetically
Writing header ...
Writing index entries ...
   Writing word text: Complete
   Writing word hash: Complete
   Writing word data: Complete
235 unique words indexed.
4 properties sorted.
1 file indexed.  712159 total bytes.  660 total words.
Elapsed time: 00:00:00 CPU time: 00:00:00
Indexing done!

# cool, it indexed the one pdf in the directory
# so I run a search on that pdf, but nothing
C:\Inetpub\wwwroot>"C:\tools\SWISH-E\swish-e" -w Bakajin
# SWISH format: 2.2.3
# Search words: Bakajin
err: no results
.



# this is a run commenting out all FileFilters, and saying IndexOnly .pdf .asp
C:\Inetpub\wwwroot>"C:\tools\SWISH-E\swish-e" -c swish-e.config
Indexing Data Source: "File-System"
Indexing "."
Removing very common words...
no words removed.
Writing main index...
Sorting words ...
Sorting 1606 words alphabetically
Writing header ...
Writing index entries ...
   Writing word text: Complete
   Writing word hash: Complete
   Writing word data: Complete
1606 unique words indexed.
4 properties sorted.
62 files indexed.  855309 total bytes.  8617 total words.
Elapsed time: 00:00:00 CPU time: 00:00:00
Indexing done!


# still no results on the pdf file
C:\Inetpub\wwwroot>"C:\tools\SWISH-E\swish-e" -w Bakajin
# SWISH format: 2.2.3
# Search words: Bakajin
err: no results
.

# search on something in an asp file, no problem.
C:\Inetpub\wwwroot>"C:\tools\SWISH-E\swish-e" -w DirList
# SWISH format: 2.2.3
# Search words: DirList
# Number of hits: 1
# Search time: 0.000 seconds
# Run time: 0.031 seconds
1000 ./DirList.asp "Directory Listing of " & strPath & "" 6702
.

What am I missing?  I'm floundering a bit, reading all the readmes and such...
Sharon
Received on Tue Sep 30 21:01:53 2003