Skip to main content.
home | support | download

Back to List Archive

RE: FileFilter with http

From: <DUDGEON(at)not-real.britbio.co.uk>
Date: Mon Sep 11 2000 - 19:15:59 GMT
Hi,

does spidering work with 2.0.1?

I've tried setting up 2.0.1. Indexing a filesystem works fine, but I'm
having problems with spidering. First the SpiderDirectory directive seems to
be ignored. Only the compiled in default seems to work. Similarly the TmpDir
directive is ignored and I can't figure out where the spider is putting its
output - it doesn't apear to be the compiled in default, /var/tmp.

When indexing using the spider, I do appear to get the files, but there
contents don't appear to be indexed (0, 2 or 3 words only reported), and the
previously encountered files are not ignored.

Should this be working, or have I missed something?

Many thanks

Tim



> -----Original Message-----
> From: Rainer.Scherg@rexroth.de [mailto:Rainer.Scherg@rexroth.de]
> Sent: 07 September 2000 2:28 PM
> To: DUDGEON@britbio.co.uk; swish-e@sunsite.berkeley.edu
> Subject: RE: [SWISH-E] FileFilter with http
> 
> 
> Hi!
> 
> The filter feature in 1.3.2f was only testet on filesystem 
> (because I didn't
> use
> http spidering - see readme file). But the filter should also work for
> http indexing, because it's the same mechanism.
> 
> To track down the problem:
> 
>   - please upgrade to swish-e 2.0.1: http://www.boe.es/swish-e/
> 
> If this problem still exists, use a simple filter shell script to test
> the files to filter (just send the files/results via "cat $1 
> | strings 1>&2"
> to stderr and print the arguments passed to the filter script...
> 
> cu Rainer
> 
> 
> -----Original Message-----
> From: DUDGEON@britbio.co.uk [mailto:DUDGEON@britbio.co.uk]
> Sent: Wednesday, September 06, 2000 6:48 PM
> To: Multiple recipients of list
> Subject: [SWISH-E] FileFilter with http
> 
> 
> I'm trying to use file filtering to index PDFs etc.
> This wotks fine while I access the files through the file system, but
> doesn't work when accessed by http. The PDF are retrieved, 
> but don't appear
> to be filtered (the 2 words indexed symptom).
> 
> Have I missed something, or isn't this expected to work?
> 
> I'm using using swish-e_1_3_2_f
> 
> 
> Thanks
> 
> Tim
> 
> 
> 
> --------------------------------------------------
> DISCLAIMER: This message contains proprietary
> information some or all of which may be
> confidential and/or legally privileged. It is for
> the intended recipient only who may use and apply
> the information only for the intended purpose.
> Internet communications are not secure and
> therefore the British Biotech group does not
> accept legal responsibility for the contents of
> this message. Any views or opinions presented are
> only those of the author and not those of the
> British Biotech group. If you are not the intended
> recipient please delete this e-mail and notify the
> author immediately by calling ++44 (0)1865 748747;
> do not use, disclose, distribute, copy, print or
> rely on this e-mail.
> 
> 
> ----------------------------------------------------------------------
> This Mail has been checked for Viruses
> Attention: Encrypted Mails can NOT be checked !
> 
> * * *
> 
> Diese Mail wurde auf Viren ueberprueft
> Hinweis: Verschluesselte Mails koennen NICHT geprueft werden !
> ----------------------------------------------------------------------
> 
Received on Mon Sep 11 19:17:09 2000