I'm spidering a small site:
my %serverA = (
base_url => 'http://www.generac-portables.com',
same_hosts => [ qw/generac-portables.com/],
email => 'jon@starkmedia.com',
keep_alive => 0,
use_md5 => 1,
);
my %serverB = (
base_url =>
[qw!https://secure-sm02.starkmedia.com/generac/ordering/index.cfm?action1=Pr
od&Product=Generators
https://secure-sm02.starkmedia.com/generac/ordering/index.cfm?action1=Prod&Product=PressureWashers
!],
email => 'jon@starkmedia.com',
keep_alive => 0,
test_url => sub {
my $uri = shift;
return 0 if $uri->path =~ /action2/;
return 1;
},
use_md5 => 1,
);
@servers = ( \%serverA, \%serverB, );
I'm getting this output as swish-e indexes:
http://www.generac-portables.com/pressure_washers/pressure_washer.cfm?order=2&id=214&use=&price=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=Ï^=Ï^=Ï^=
Using HTML2 parser - (221 words)
http://www.generac-portables.com/pressure_washers/pressure_washer.cfm?order=4&id=138&use=&price=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=Ï^=Ï^=Ï^= -
Using HTML2 parser - (298 words)
when I run swish-e on our production server it doesn't do this.
Just on our live box. they're both linux boxes
Any help or suggestions are appreciated.
Received on Fri Oct 15 14:57:29 2004