Skip to main content.
home | support | download

Back to List Archive

weird output while indexing

From: Jon Sorensen <jon(at)not-real.starkmedia.com>
Date: Fri Oct 15 2004 - 21:57:15 GMT
 I'm spidering a small site:

 my %serverA = (
         base_url    => 'http://www.generac-portables.com',
         same_hosts  => [ qw/generac-portables.com/],
         email       => 'jon@starkmedia.com',
         keep_alive  => 0,
         use_md5  => 1,
 );

 my %serverB = (
         base_url    =>

[qw!https://secure-sm02.starkmedia.com/generac/ordering/index.cfm?action1=Pr
 od&Product=Generators


https://secure-sm02.starkmedia.com/generac/ordering/index.cfm?action1=Prod&Product=PressureWashers
       !],
         email       => 'jon@starkmedia.com',
         keep_alive  => 0,
         test_url    => sub {
             my $uri = shift;
             return 0 if $uri->path =~ /action2/;
             return 1;
         },
   use_md5  => 1,
 );

 @servers = ( \%serverA, \%serverB, );


 I'm getting this output as swish-e indexes:


http://www.generac-portables.com/pressure_washers/pressure_washer.cfm?order=2&id=214&use=&price=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=^=^=^=
 Using HTML2 parser -  (221 words)

http://www.generac-portables.com/pressure_washers/pressure_washer.cfm?order=4&id=138&use=&price=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=%cf^=^=^=^= -
 Using HTML2 parser -  (298 words)

 when I run swish-e on our production server it doesn't do this.
Just on our live box. they're both linux boxes

 Any help or suggestions are appreciated.
Received on Fri Oct 15 14:57:29 2004