Skip to main content.
home | support | download

Back to List Archive

Re: what words swish indexes

From: <00prneubauer(at)>
Date: Fri Mar 06 1998 - 13:26:54 GMT
John Kelly wrote:  
>I am wondering is there a way to control how many words swish will index
>from each TEXT document? These text documents contain a few lines in the
>following format and contain no HTML. These properties are passed on
>through the CGI script to create the HTML document. For example one file
>might have:
>TITLE=>Sports Page
>DESCRIPTION=>All kind of sports links.
>Ideally for my site I would like it to index every word in each file. I
>could also live with being able to tell swish to index all words on the
>third line of each file. This is the line that the CGI script uses to
>create a keyword meta tag in each corresponding HTML document.

SWISH may be the ideal tool for indexing the whole files, but if you
were to go with just a keyword index or a separate keyword index, it
would probably be simpler to use other tools.  For example, to get a
list of all the files and their keywords, the most straightforward
thing to do would be:

	grep KEYWORDS * >keywords.list

which will give you a file containing lines like:

cars.txt:KEYWORDS=>ford,chevrolet,alfa romeo,toyota

and so forth.  You should have no trouble reformatting that file (or
just piping the output of grep through whatever reformatting you need
before writing to a file).  You will probably need to think about the
grep command a little more, depending on whether all the text files
are in the same directory or wherever they are, but I'm sure you get
the picture.  In brief, I would guess that for something as
specialized as a keyword index, it would be much less work to write a
1 or 2 line shell script than to figure out a way to get swish to do
something unusual.

Paul Neubauer  
For PGP Public Key send mail with subject="Send PGP Public Key" 
1024 bits -- Key ID: 3FEB993D
Key Fingerprint: 85 AA A5 91 00 49 7A 7B  23 26 F7 B8 DB 72 C9 48
Received on Fri Mar 6 05:35:33 1998