Would this apply similarly to using xpdf to parse PDF docs?
IndexContents HTML* .htm .html .shtml .php
IndexContents TXT* .txt .log .text .pdf
IndexContents XML* .xml
StoreDescription TXT* 10000
StoreDescription HTML* <body>
From: firstname.lastname@example.org [mailto:email@example.com]
On Behalf Of Bill Moseley
Sent: Thursday, October 28, 2004 10:06 AM
To: Multiple recipients of list
Subject: [SWISH-E] Re: <swishdecription> returning blank?
On Thu, Oct 28, 2004 at 12:47:04AM -0700, Tim Hartley wrote:
> Ok I just tried swapping StoreDescription HTML2 <tbody> 2000, with the
> extra line PropertyNames tbody added also, and I'm still getting the
> blanks where swishdescription should be. My config file now looks
Look at the StoreDescription line. It's saying store all HTML2 type of
files (HTML2 is one of the available parsers). But you are not telling
swish that .asp is a HTML2 type of file.
This is a confusing issue because swish will use the HTML2 parser for
parsing by default but that doesn't mean the document is classified as a
HTML2 type of document.
You need to used either DefaultContents or IndexContents.
Go back and look at the docs of the examples of StoreDescription. It Let me
know if there's a place you were looking that doesen't use DefaultContents
or IndexContents so that can be fixed.
> ---Config file---
> IndexFile c:\swish-e\ForumVirtualIndex.index
> IndexDir C:/Inetpub/VirtualRoot/planetpdfforumarchive
> IndexOnly .asp
> StoreDescription HTML2 <tbody> 2000
> FileRules filename is forum6\.asp
> FileRules filename is forum52\.asp
> FileRules filename is forum2\.asp
> FileRules filename is forum3\.asp
> FileRules filename is forum5\.asp
> FileRules filename is forum34\.asp
> FileRules filename is forum9\.asp
> FileRules filename is forum68\.asp
> FileRules filename is forum18\.asp
> FileRules filename is forum73\.asp
> FileRules filename is forum4\.asp
> FileRules filename is forum7\.asp
> FileRules filename is forum12\.asp
> FileRules filename is attachlist\.asp
> IndexReport 3
> PropertyNames tbody
> ReplaceRules Replace "C:/Inetpub/VirtualRoot/planetpdfforumarchive"
> ---end file---
> --excerpt of results----
> (of the format
> # SWISH format: 2.4.2 # Search words: eat # Removed stopwords: #
> Number of hits: 30 # Search time: 0.110 seconds # Run time: 0.125
> 1000|Planet PDF Forum Archive - Is that all there is to
> 1000|13:23:52 AUS Eastern Standard Time|
> 633|Planet PDF Forum Archive -
> 633|13:06:36 AUS Eastern Standard Time|
> Note the blanks after the <swishlastmodified>| section. :(
> -----Original Message-----
> From: Peter Karman [mailto:firstname.lastname@example.org]
> Sent: Thursday, 28 October 2004 5:01 PM
> To: Tim Hartley
> Cc: Multiple recipients of list
> Subject: Re: [SWISH-E] <swishdecription> returning blank?
> Tim Hartley wrote on 10/27/04 9:07 PM:
> > Hi Bill, all.
> > I'm using the File Access method to index a folder of .asp files (as
> > it can do it waaaaaaaay quicker than spidering them) Anyway, I'm having
problems in that it doesn't seem to be getting any values in the
<swishdescription>, so my results are coming back with the
<swishrank><swishtitle><swishdocpath><swishlastmodified>, but NOT
<swishdescription>. All my other indexes return it. Mind you they use either
the dirtree.pl or swishspider.pl to create the indexes. Anyway, details
> > StoreDescription HTML2 <swishdescription> 2000
> I believe that the StoreDescription <tag> syntax is for the tag in the
> source you want to include from, not for the swish property name. Is
> there a tag called 'swishdescription' in your source files? otherwise,
> you likely want <body> instead.
Unsubscribe from or help with the swish-e list:
Help with Swish-e:
Received on Thu Oct 28 07:14:02 2004