On Fri, Jul 11, 2008 at 01:11:56AM -0700, Jo Rhett wrote:
> (query string?)
>
> So while debugging a different problem I looked at my httpd logs and
> realized something I'd apparently missed before. The swish-e spider
> is looping over the same files dozens and dozens of times, each time
> with different query arguments. Because all of the links on the site
> contain a query_string containing the page they came from and a unique
> id for the visitor (and a dynamic toolbar has links to every page),
> this means that each page is indexed N-1 times, where N is the number
> of pages on the site.
Why don't you use cookies for session management? Your setup kind of
makes it hard for browsers to do any caching.
> Is there an option to tell the swish spider to ignore the query string
> when considering URLs? I realize that this would be inappropriate
> for many sites, but it is essential for this site, so an option would
> be very useful.
Quick search of the archives turns up this:
http://swish-e.org/archive/2004-08/8106.html
--
Bill Moseley
moseley@hank.org
Unsubscribe from or help with the swish-e list:
http://swish-e.org/Discussion/
Help with Swish-e:
http://swish-e.org/current/docs
_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Fri Jul 11 09:43:55 2008