Skip to main content.
home | support | download

Back to List Archive

Re: [swish-e] swish.conf problems - was ignorewords wildcard?

From: <Rene.Kloos(at)not-real.esa.int>
Date: Thu May 24 2007 - 12:25:12 GMT
As you are using the spider, why not use a spider configuration file with a
test_url subroutine. See the online documentation on the spider.pl. This way
you can at least skip the files with "dsc_". I don't know how to avoid
directories containing .htaccess.

In your swish-e config:

SwishProgParameters spider.conf

In your spider.conf:

my %mySite = (
      use_default_config      =>    1,
      base_url          =>    'http://nottherealsitename.com/',
      test_url                =>    sub {
                                my $uri = shift;
                                return 0 if $uri->path =~ /\/dsc_/;
                                return 1;
                              }
);

@servers = ( \%mySite );

Good luck!


users-bounces@lists.swish-e.org wrote on 24/05/2007 13:33:10:

> OK, let's start over. . .
>
> I want to index the site.
> Only .htm and .html
> I don't want to index directories containing .htaccess
> I don't want to index documents beginning with "dsc_" )
>
> --
> Swish-e version:  2.4.5
> OS:  RH9
> Current run string:  swish-e -S prog -c swish.conf
>
> Current swish.conf:
>
> # Swish-e config
> #
> IndexDir spider.pl
> IndexFile index.swish-e
>
> SwishProgParameters default http://nottherealsitename.com/
>
> IndexReport 3
>
> Metanames swishtitle swishdocpath
>
> IndexOnly .htm .html
>
> IgnoreWords File: /usr/local/swish-e-2.4.5/conf/stopwords/english.txt
>
> StoreDescription TXT* 10000
> StoreDescription HTML* <body> 10000
>
>
> Need some help.
>
>
> Bill Moseley wrote:
> > On Wed, May 23, 2007 at 10:35:47PM -0400, Frank Hunt wrote:
> >> this fails:
> >>
> >> IndexDir spider.pl
> >> SwishProgParameters default http://website.com/
> >> FileRules directory contains ^\.htaccess
> >>
> >> run string:  swish-e -S prog -c swish.conf2
> >
> > -S prog means you are not reading from the file system -- FileRules is
> > only for reading from the file system.
> >
> >
> >
> >
>
> --
> frank hunt
> PLUG member-in-absentia
> confused linux admin
> part time windows(r) washer
> rochester hills, mi
> _______________________________________________
> Users mailing list
> Users@lists.swish-e.org
> http://lists.swish-e.org/listinfo/users

_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Thu May 24 08:25:29 2007