Skip to main content.
home | support | download

Back to List Archive

Fwd: help for posting a question

From: Roy Tennant <roy.tennant(at)not-real.ucop.edu>
Date: Mon Feb 10 2003 - 01:48:23 GMT
Forwarded by request. Please do not reply to me, but to  
jchen%HDC@hdc.org.nz, who is having difficult subscribing to the list.
Roy

Begin forwarded message:

> From: jchen%HDC@hdc.org.nz
> Date: Sun Feb 9, 2003  1:03:09 PM US/Pacific
> To: roy.tennant@ucop.edu
> Subject: help for posting a question
>
>
> Hi, Roy,
>
> I have to send this mail to you, I really dont know how to post my 
> question, and I have tried many times to subscribe than post, it dont 
> work.
> here is my question:
>
>
> I am trying to install SWISH-E on a Win2000 system. I am using Apache 
> 1.3.27, Perl5.8.0 and Swish 2.2. I can run config and indexing for all 
> html files and get right searching results by running cgi script, but 
> I cant index any PDF files.
>
>
> I have 3 html files and 1 pdf file in the test folder,  report.pdf and 
> report.htm are two exactly same file. When I run swish.bat file, the 
> report.pdf file got 3 words, I dont know whether this pdf file was 
> indexed( or only name was ), even though it says 4 files indexed. Did 
> it work? but I can't get any result from searching page or even from a 
> .bat file
>
>
> ####---- by running  swish.bat file -------
>
>
> Indexing Data Source: "File-System"
> Indexing "C:/wwwroot/html/"
>
>
> Checking dir "C:/wwwroot/html"....
>
>
>   contct.html - using TXT parser - (385 words)
>   aboutus.html - using TXT parser - (124 words)
>   report.pdf - using TXT parser - (3 words)
>   report.html - using TXT parser - (3569 words)
>
>
> Removing very common words...
> no words removed.
>
>
> Writing main index...
> Sorting words ...
> Sorting 1060 words alphabetically
> Writing header ...
> Writing index entries ...
>
>    writing word text: Complete
>    Writing word hash: Complete
>    Writing word data: Complete
>
> 1068 unique words indexed.
> 5 properties sorted.
> 4 files indexed. 66781 total bytes. 4081 total words.
> Elapsed time: 00:00:04 CPU time: 00:00:04
> Indexing done!
>
> ## ------------- end! ----------
>
> I write  "c:/wwwroot/cgi-bin/xpdf/pdftotext.exe report.pdf " in .bat 
> file, it produces a report.txt file, which means the pdftotext.exe 
> works fine.
>
>  
>
> Here is my config file:
>
> ##--------------begin of config file ------------------
>
> # DIRECTIVES COMMON to  HTTP and FILESYSTEM METHODS
> ###################################################
>
> IndexDir  C:/wwwroot/html/
>
>
> IndexFile C:/wwwroot/indexing/swish.index
> # This is what the generated index file will be.
>
> IndexName "Site Index"
> #IndexDescription "Default Index"
>
> IndexPointer "http://127.0.0.1/"
> #IndexAdmin "Webmaster (webmaster@yoursite.com)"
> # Extra information you can include in the index file.
>
>
> IndexReport 3
> # This is how detailed you want reporting. You can specify numbers
> # 0 to 3 - 0 is totally silent, 3 is the most verbose.
>
> FollowSymLinks no
> # Put "yes" to follow symbolic links in indexing, else "no".
>
>
>
> IgnoreTotalWordCountWhenRanking yes
> # Put yes to ignore the total number of words in the file
> # when calculating ranking. Often better with merges and
> # small files. Default is no.
>
>
> IndexComments 0
> # This option allows the user decide if to index the comments in the 
> files
> # default is 1. Set to 0 if comment indexing is not required.
>
>
> ##################################
> # DIRECTIVES for FILESYSTEMS ONLY
> # Comment out if using HTTP
> ###################################
>
>
> IndexOnly .html .pdf .htm
>
>
> #FilterDir .pdf c:/wwwroot/cgi-bin/xpdf/pdftotext.exe
>
>
> FileFilter .pdf c:/wwwroot/cgi-bin/xpdf/pdftotext.exe '"%p" -'
>
> #FileFilter .pdf c:/wwwroot/cgi-bin/filter-bin/_pdf2html.pl "%p -"
> #I tried to use this FileFilter which  from examlpe/example 8  - 
> Filtering PDF files
> # It brings up the whole _pdf2html.pl file without running it. I dont 
> know why.
>
>
> IndexContents TXT .html .pdf .htm
>
>
> NoContents .gif .xbm .au .mov .mpg .pdf .ps .jpg .png
> # Files with these suffixes will not have their contents indexed -
> # only their file names will be indexed.
>
>
> # ---------------- end of config file ----------------
>
>
>
> The Swish.bat file is:
>
> #-------------- begin of swish.bat file ---------------------
>
> @ECHO OFF
>
>  swish-e -S fs -c swish.conf
>
> pause
>
>
> swish-e -w "RECOMMENDATIONS" -f C:/wwwroot/indexing/swish.index -v 1
>
> # " recommendations" is the word contained in report.pdf  and 
> report.html file
>
> #-------------- end of swish.bat file ---------------------
>
>
>
> So, is report.pdf file indexed in this situation? why cant I get any 
> search result? Please advice me.
>
>
> Thank you  Roy!
>
> o, by the way, the link on "About this archive" 
> http://swish-e.org/archive.html is broken.
>
> Jim
Received on Mon Feb 10 01:48:53 2003