On Thu, May 19, 2005 at 11:57:04AM -0700, Bill Moseley wrote:
> One thing I'm seeing is the Referer: header is the new $uri set in the
> filter_content() callback. Your server isn't looking at the Referer
> header, is it?
Here's how to fix the Referer header, just in case:
Since you are modifying the $uri object for output you are making
a global change to that object.
In the spider() sub there's these lines:
my $new_links = process_link( $server, $uri, $parent, $depth );
push @link_array, map { [ $_, $uri, $depth+1 ] } @$new_links if $new_links;
process_link() fetches $uri, calls test_response() at the start of the
response, then calls filter_content() after fetching the content.
Finally, links are extracted from the page and for each linke
test_url() is called. That list of extracted links is what is
returned from prosess_link (i.e. $new_links). Then those links are
added to the @link_array for later processing. Each entry in
@link_array is an array of the link, the link's parent, and the depth
of the link.
Notice that $uri in the second line (in that three element array)?
That's the "parent". So since you modify $uri during process_link()
the "parent" gets changed and that is used as the Referer: header in
later requests.
I think the easy solution is changing that first line to clone the
$uri:
my $new_links = process_link( $server, $uri->clone, $parent, $depth );
Then the Referer: header will be ok.
--
Bill Moseley
moseley@hank.org
Unsubscribe from or help with the swish-e list:
http://swish-e.org/Discussion/
Help with Swish-e:
http://swish-e.org/current/docs
swish-e@sunsite.berkeley.edu
Received on Thu May 19 12:21:24 2005