Skip to main content.
home | support | download

Back to List Archive

Re: [swish-e] swish-e looping over the same files again and again

From: Jo Rhett <jrhett(at)not-real.netconsonance.com>
Date: Fri Jul 11 2008 - 16:40:05 GMT
On Jul 11, 2008, at 6:43 AM, Bill Moseley wrote:
> On Fri, Jul 11, 2008 at 01:11:56AM -0700, Jo Rhett wrote:
>> (query string?)
>>
>> So while debugging a different problem I looked at my httpd logs and
>> realized something I'd apparently missed before.  The swish-e spider
>> is looping over the same files dozens and dozens of times, each time
>> with different query arguments.  Because all of the links on the site
>> contain a query_string containing the page they came from and a  
>> unique
>> id for the visitor (and a dynamic toolbar has links to every page),
>> this means that each page is indexed N-1 times, where N is the number
>> of pages on the site.
>
> Why don't you use cookies for session management?  Your setup kind of
> makes it hard for browsers to do any caching.

It does.  If the browser submits a cookie then it uses them.  If the  
browser doesn't submit a cookie then it adds query strings to track  
the browser.  Since spider ignores the cookies, it gets the query  
strings added.

>> Is there an option to tell the swish spider to ignore the query  
>> string
>> when considering URLs?   I realize that this would be inappropriate
>> for many sites, but it is essential for this site, so an option would
>> be very useful.
>
> Quick search of the archives turns up this:
>
> http://swish-e.org/archive/2004-08/8106.html


I missed that, as it contains nothing I was searching for.  Problem is  
-- this isn't clear what he's talking about.  Is this to modify  
spider.pl?  This is on a shared host, and only one customer has this  
problem.

-- 
Jo Rhett
Net Consonance : consonant endings by net philanthropy, open source  
and other randomness


_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Fri Jul 11 12:40:13 2008