I said in a previous post that I thought that HTML* was not valid in the
StoreDescription or IndexContents directives.
I retract that statement - according to Bill Mosely, it is and will use
HTML2 is available, and otherwise will use HTML.
<dena.wolf@orcinc. To: Multiple recipients of list <email@example.com>
Sent by: Subject: [SWISH-E] RE: how to get a description
Please respond to
I am doing this on the web, so I need my indexing to store the
Users will just be searching for words on the website, and I want a
summary or excerpt to appear below the links to the documents that contain
the words they are looking for. Does this make sense. They will not be
entering any switches when they search on the web. I put in the HTML2
& still get bad directive for those 3 lines. I have been going at this for
4 days now :(
Thanks a bunch for your email!
From: Jeffrey.Grunstein@ny.frb.org [mailto:Jeffrey.Grunstein@ny.frb.org]
Sent: Tuesday, November 19, 2002 2:17 PM
To: firstname.lastname@example.org; email@example.com
Subject: Re: [SWISH-E] how to get a description
Try this in your config file:
IndexContents HTML2 .html
IndexContents HTML2 .htm
StoreDescription HTML2 <BODY> 100000
# To index PDF files as well, try something like this...
FileFilter .pdf pdftotext "'%p' -"
IndexContents TXT .pdf
StoreDescription TXT 250000
This will store the BODY tag text of all files that end in .htm and .html,
using the HTML2 parser.
If you're running a slower machine and performance is an issue, lower the
100,000 number to somthing
smaller. If you have mostly smaller HTML files, this number can be lower
and you won't lose any content
when the descriptions are stored.
The command you listed looks like something you'd use to create the index.
As long as your config
file is right, you don't need to do anything else to store your
descriptions. You just need the right switches
when doing your search.
Try doing a search like this once you've created the new index file:
cgi-bin/swish-e -w <your search string> -f index.swish -x '%t -
%p\n%d\nlast updated %D\trank %r\tsize %l bytes\n\n'
This will actually return a lot more info than just the description. The
%d part shows the description.
Take a look at
and scroll down to the
section titled "-x formatstring (extended output format)".
<dena.wolf@orcinc. To: Multiple recipients of
Sent by: Subject: [SWISH-E] how to
get a description
Please respond to
Two questions; Ive been reading the past archives that deal with this and
understanding a little but don't know if I am doing this at all right.
My indexing is working and I am getting results now. Now what I am trying
to do is to get a chunk of the body of the document in the results page
has say 40 words of the document body in it that includes the search word
In my config file:
#MetaNames keywords description
ReplaceRules replace "/export/home/orcsolar/html/" "http://www.orcinc.com/"
ReplaceRules remove "html/"
IgnoreLimit 50 1000
FileRules pathname contains members
IndexOnly .html .doc .xls .htm .ppt .txt .pdf
IndexContents HTML* .html .htm
StoreDescription HTML <body> 40
NoContents .gif .xbm .au .mov .mpg .ps
I added the IndexContents line & the StoreDescription line. I get a bad
directive error for both of those 2 new lines. Why? I checked that there
Also, in my index command line, how do I add something to make the
description run (assuming i get the indexing to work).
Right now my line says: cgi-bin/swish-e -c cgi-bin/orcsolar/config -i html
-v -f index.swish
Can I put -p swishdescription somewhere in that line? If so where?
I'm sorry I am having so much trouble trying to get all this to work.
for your help.
Organization Resources Counselors, Inc.
Received on Tue Nov 19 20:54:15 2002