Skip to main content.
home | support | download

Back to List Archive

Re: Re: Config files and spider.pl

From: <adivey1(at)not-real.cox.net>
Date: Thu Jun 03 2004 - 16:25:33 GMT
Documentation not too helpful :(

Let's steer the question this direction...

In my SWISH-E configuration file, I have

StoreDescription HTML* <body> 20000

Using spider.pl, with -v 3, I can see that it's using the HTML2 parser. So shouldn't it then pass that onto my StoreDescription line? When I run a swish.cgi through my browser, I don't get any text to highlight (summary, whatever you wanna call it) for anything other than HTML files. I haven't tried TXT or PPT, but PDF and DOC don't have anything. They don't have null though.

Anyway, what do I have to change, and where do I change it, to get the snapshot (summary, highlightable area, etc) to display on the search page?

I feel like that didn't make any sense... Thanks in advance!

-Alan

> 
> From: Bill Moseley <moseley@hank.org>
> Date: 2004/06/02 Wed PM 04:04:28 EDT
> To: adivey1@cox.net
> CC: Multiple recipients of list <swish-e@sunsite.berkeley.edu>
> Subject: Re: Config files and spider.pl
> 
> On Wed, Jun 02, 2004 at 12:11:13PM -0700, adivey1@cox.net wrote:
> > Where can I find documentation that'll tell me which configuration
> > files are overriden by using spider.pl? Obviously, IndexDir is
> > specified in the file, but entries like  StoreDescription and
> > IndexContents, and MetaNames, how do I know which ones are being read
> > and ignored?
> 
> http://www.swish-e.org/current/docs/SWISH-CONFIG.html
> 
>     *  Swish-e CONFIGURATION FILE
>           o Alphabetical Listing of Directives
>           o Directives that Control Swish
>           o Administrative Headers Directives
>           o Document Source Directives
>           o Document Contents Directives
>           o Directives for the File Access method only
>           o Directives for the HTTP Access Method Only
>           o Directives for the prog Access Method Only
>           o Document Filter Directives
>                 + Filtering with SWISH::Filter
>                 + Filtering with the FileFilter feature 
>     * Document Info 
> 
> So, they are suppose to be broken up by options that control the
> indexing vs. options that control what files are passed to swish for
> indexing.
> 
> So, setting swish-e to follow symbolic links probably isn't going to
> effect spidering.  But MetaNames is how the data is indexed, regardless
> of where it comes from.
> 
> Technical writers always welcome.
> 
> -- 
> Bill Moseley
> moseley@hank.org
> 
> 
Received on Thu Jun 3 09:25:36 2004