Skip to main content.
home | support | download

Back to List Archive

RE: how to get a description

From: <Jeffrey.Grunstein(at)>
Date: Tue Nov 19 2002 - 20:54:06 GMT
I said in a previous post that I thought that HTML* was not valid in the
StoreDescription or IndexContents directives.

I retract that statement - according to Bill Mosely, it is and will use
HTML2 is available, and otherwise will use HTML.

                    "Wolf, Dena"                                                                                                                  
                    <dena.wolf@orcinc.       To:     Multiple recipients of list <>                                   
                    com>                     cc:                                                                                                  
                    Sent by:                 Subject:     [SWISH-E] RE: how to get a description                                                  
                    11/19/2002 03:04                                                                                                              
                    Please respond to                                                                                                             

I am doing this on the web, so I need my indexing to store the
Users will just be searching for words on the website, and I want a
summary or excerpt to appear below the links to the documents that contain
the words they are looking for. Does this make sense.  They will not be
entering any switches when they search on the web.  I put in the HTML2
& still get bad directive for those 3 lines.  I have been going at this for
4 days now :(

Thanks a bunch for your email!

-----Original Message-----
From: []
Sent: Tuesday, November 19, 2002 2:17 PM
Subject: Re: [SWISH-E] how to get a description

Try this in your config file:

IndexContents HTML2 .html
IndexContents HTML2 .htm
StoreDescription HTML2 <BODY> 100000

# To index PDF files as well, try something like this...
FilterDir /opt/sfw/bin
FileFilter .pdf pdftotext "'%p' -"
IndexContents TXT .pdf
StoreDescription TXT 250000

This will store the BODY tag text of all files that end in .htm and .html,
using the HTML2 parser.
If you're running a slower machine and performance is an issue, lower the
100,000 number to somthing
smaller.  If you have mostly smaller HTML files, this number can be lower
and you won't lose any content
when the descriptions are stored.

The command you listed looks like something you'd use to create the index.
As long as your config
file is right, you don't need to do anything else to store your
descriptions.  You just need the right switches
when doing your search.

Try doing a search like this once you've created the new index file:
cgi-bin/swish-e -w <your search string> -f index.swish -x '%t -
%p\n%d\nlast updated %D\trank %r\tsize %l bytes\n\n'

This will actually return a lot more info than just the description.  The
%d part shows the description.

Take a look at


and scroll down to the
section titled "-x formatstring (extended output format)".

                    "Wolf, Dena"

                    <dena.wolf@orcinc.       To:     Multiple recipients of
list <>
                    com>                     cc:

                    Sent by:                 Subject:     [SWISH-E] how to
get a description


                    11/19/2002 01:33


                    Please respond to


Two questions; Ive been reading the past archives that deal with this and
understanding a little but don't know if I am doing this at all right.
My indexing is working and I am getting results now.  Now what I am trying
to do is to get a chunk of the body of the document in the results page
has say 40 words of the document body in it that includes the search word

In my config file:
IndexFile index.swish
#MetaNames keywords description
IndexReport 3
FollowSymLinks no
IgnoreTotalWordCountWhenRanking yes
ReplaceRules replace "/export/home/orcsolar/html/" ""
ReplaceRules remove "html/"
IgnoreLimit 50 1000
FileRules pathname contains members
IndexComments 0
IndexOnly .html .doc .xls .htm .ppt .txt .pdf
IndexContents HTML* .html .htm
StoreDescription HTML <body> 40
NoContents .gif .xbm .au .mov .mpg .ps

I added the IndexContents line & the StoreDescription line.  I get a bad
directive error for both of those 2 new lines.  Why? I checked that there
no space.

Also, in my index command line, how do I add something to make the
description run (assuming i get the indexing to work).
Right now my line says: cgi-bin/swish-e -c cgi-bin/orcsolar/config -i html
-v -f index.swish
Can I put -p swishdescription somewhere in that line?  If so where?

I'm sorry I am having so much trouble trying to get all this to work.
for your help.

Dena Wolf
Web Developer
Organization Resources Counselors, Inc.

Received on Tue Nov 19 20:54:15 2002