Skip to main content.
home | support | download

Back to List Archive

Re: Finding Session Var While Spidering?

From: Bill Moseley <moseley(at)>
Date: Thu Mar 11 2004 - 06:22:14 GMT
On Wed, Mar 10, 2004 at 05:03:58PM -0800, Justin Tang wrote:
> Hi:
>   Is there anyway, when using, to avoid spidering duplicate pages
> with different session vars?
> For ex:
> This are two of the same pages, but getting spidered twice.  Thanks.

This is all without testing, but...

One way would be to keep a hash in a test_url callback and reject
duplicates.  Or, in the test_url() function you could remove the sesID
from the URL.  The test to see if a page has been seen comes after
test_url() is called, IIRC.

Bill Moseley
Received on Wed Mar 10 22:22:22 2004