On Wed, Mar 10, 2004 at 05:03:58PM -0800, Justin Tang wrote:
> Is there anyway, when using spider.pl, to avoid spidering duplicate pages
> with different session vars?
> For ex:
> This are two of the same pages, but getting spidered twice. Thanks.
This is all without testing, but...
One way would be to keep a hash in a test_url callback and reject
duplicates. Or, in the test_url() function you could remove the sesID
from the URL. The test to see if a page has been seen comes after
test_url() is called, IIRC.
Received on Wed Mar 10 22:22:22 2004