Skip to main content.
home | support | download

Back to List Archive

Re: [swish-e] swish-e looping over the same files again and again

From: Jo Rhett <jrhett(at)>
Date: Fri Jul 11 2008 - 16:40:05 GMT
On Jul 11, 2008, at 6:43 AM, Bill Moseley wrote:
> On Fri, Jul 11, 2008 at 01:11:56AM -0700, Jo Rhett wrote:
>> (query string?)
>> So while debugging a different problem I looked at my httpd logs and
>> realized something I'd apparently missed before.  The swish-e spider
>> is looping over the same files dozens and dozens of times, each time
>> with different query arguments.  Because all of the links on the site
>> contain a query_string containing the page they came from and a  
>> unique
>> id for the visitor (and a dynamic toolbar has links to every page),
>> this means that each page is indexed N-1 times, where N is the number
>> of pages on the site.
> Why don't you use cookies for session management?  Your setup kind of
> makes it hard for browsers to do any caching.

It does.  If the browser submits a cookie then it uses them.  If the  
browser doesn't submit a cookie then it adds query strings to track  
the browser.  Since spider ignores the cookies, it gets the query  
strings added.

>> Is there an option to tell the swish spider to ignore the query  
>> string
>> when considering URLs?   I realize that this would be inappropriate
>> for many sites, but it is essential for this site, so an option would
>> be very useful.
> Quick search of the archives turns up this:

I missed that, as it contains nothing I was searching for.  Problem is  
-- this isn't clear what he's talking about.  Is this to modify  This is on a shared host, and only one customer has this  

Jo Rhett
Net Consonance : consonant endings by net philanthropy, open source  
and other randomness

Users mailing list
Received on Fri Jul 11 12:40:13 2008