Yeah, it blocks and times out. Bypassing the .htaccess will speed
things up. Not a really big deal, merely a tweak. Now, of course, it
has become a challenge.
Rene.Kloos@esa.int wrote:
> BTW, if using the spider, won't that simply get blocked when coming across a
> directory with .htaccess? After all I suppose that's what the .htaccess is
> for, to set up some form of access control. You can provide the spider with
> the appropriate credentials to get in, but if that's not what you want, then
> things should be fine. Or is that too simplistic :-)
>
> Bye,
> René
>
> users-bounces@lists.swish-e.org wrote on 24/05/2007 13:33:10:
>
>> OK, let's start over. . .
>>
>> I want to index the site.
>> Only .htm and .html
>> I don't want to index directories containing .htaccess
>> I don't want to index documents beginning with "dsc_" )
>>
>> --
>> Swish-e version: 2.4.5
>> OS: RH9
>> Current run string: swish-e -S prog -c swish.conf
>>
>> Current swish.conf:
>>
>> # Swish-e config
>> #
>> IndexDir spider.pl
>> IndexFile index.swish-e
>>
>> SwishProgParameters default http://nottherealsitename.com/
>>
>> IndexReport 3
>>
>> Metanames swishtitle swishdocpath
>>
>> IndexOnly .htm .html
>>
>> IgnoreWords File: /usr/local/swish-e-2.4.5/conf/stopwords/english.txt
>>
>> StoreDescription TXT* 10000
>> StoreDescription HTML* <body> 10000
>>
>>
>> Need some help.
>>
>>
>> Bill Moseley wrote:
>>> On Wed, May 23, 2007 at 10:35:47PM -0400, Frank Hunt wrote:
>>>> this fails:
>>>>
>>>> IndexDir spider.pl
>>>> SwishProgParameters default http://website.com/
>>>> FileRules directory contains ^\.htaccess
>>>>
>>>> run string: swish-e -S prog -c swish.conf2
>>> -S prog means you are not reading from the file system -- FileRules is
>>> only for reading from the file system.
>>>
>>>
>>>
>>>
>> --
>> frank hunt
>> PLUG member-in-absentia
>> confused linux admin
>> part time windows(r) washer
>> rochester hills, mi
>> _______________________________________________
>> Users mailing list
>> Users@lists.swish-e.org
>> http://lists.swish-e.org/listinfo/users
>
> _______________________________________________
> Users mailing list
> Users@lists.swish-e.org
> http://lists.swish-e.org/listinfo/users
>
--
frank hunt
PLUG member-in-absentia
confused linux admin
part time windows(r) washer
rochester hills, mi
_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Thu May 24 08:52:03 2007