Skip to main content.
home | support | download

Back to List Archive

Re: [swish-e] swish.conf problems - was ignorewords wildcard?

From: Frank Hunt <linux(at)not-real.frankhunt.com>
Date: Thu May 24 2007 - 12:51:59 GMT
Yeah, it blocks and times out.  Bypassing the .htaccess will speed 
things up.  Not a really big deal, merely a tweak.  Now, of course, it 
has become a challenge.

Rene.Kloos@esa.int wrote:
> BTW, if using the spider, won't that simply get blocked when coming across a
> directory with .htaccess? After all I suppose that's what the .htaccess is
> for, to set up some form of access control. You can provide the spider with
> the appropriate credentials to get in, but if that's not what you want, then
> things should be fine. Or is that too simplistic :-)
> 
> Bye,
> René
> 
> users-bounces@lists.swish-e.org wrote on 24/05/2007 13:33:10:
> 
>> OK, let's start over. . .
>>
>> I want to index the site.
>> Only .htm and .html
>> I don't want to index directories containing .htaccess
>> I don't want to index documents beginning with "dsc_" )
>>
>> --
>> Swish-e version:  2.4.5
>> OS:  RH9
>> Current run string:  swish-e -S prog -c swish.conf
>>
>> Current swish.conf:
>>
>> # Swish-e config
>> #
>> IndexDir spider.pl
>> IndexFile index.swish-e
>>
>> SwishProgParameters default http://nottherealsitename.com/
>>
>> IndexReport 3
>>
>> Metanames swishtitle swishdocpath
>>
>> IndexOnly .htm .html
>>
>> IgnoreWords File: /usr/local/swish-e-2.4.5/conf/stopwords/english.txt
>>
>> StoreDescription TXT* 10000
>> StoreDescription HTML* <body> 10000
>>
>>
>> Need some help.
>>
>>
>> Bill Moseley wrote:
>>> On Wed, May 23, 2007 at 10:35:47PM -0400, Frank Hunt wrote:
>>>> this fails:
>>>>
>>>> IndexDir spider.pl
>>>> SwishProgParameters default http://website.com/
>>>> FileRules directory contains ^\.htaccess
>>>>
>>>> run string:  swish-e -S prog -c swish.conf2
>>> -S prog means you are not reading from the file system -- FileRules is
>>> only for reading from the file system.
>>>
>>>
>>>
>>>
>> --
>> frank hunt
>> PLUG member-in-absentia
>> confused linux admin
>> part time windows(r) washer
>> rochester hills, mi
>> _______________________________________________
>> Users mailing list
>> Users@lists.swish-e.org
>> http://lists.swish-e.org/listinfo/users
> 
> _______________________________________________
> Users mailing list
> Users@lists.swish-e.org
> http://lists.swish-e.org/listinfo/users
> 

-- 
frank hunt
PLUG member-in-absentia
confused linux admin
part time windows(r) washer
rochester hills, mi
_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Thu May 24 08:52:03 2007