Skip to main content.
home | support | download

Back to List Archive

Re: Query String Being Converted to HTML Entity

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Tue Nov 23 2004 - 18:22:55 GMT
On Tue, Nov 23, 2004 at 09:52:38AM -0800, Jon Sorensen wrote:
> I have been trying to spider a site like so:
> 
> my %serverA =3D (
>      base_url    =3D> =
> 'http://www.generac-portables.com/pressure_washers/pressure_washer.cfm?id=
> =3D183&use=3D&price=3D&psi=3D&order=3D1',
>      keep_alive  =3D> 0,
>      test_url    =3D> sub {
>          my $uri =3D shift;
>             if ($uri->path =3D~ /pressure_washer\.cfm/){
>           return 1 ;}
>          else {return 0;}
>          },
>     use_md5  =3D> 1,
>     max_files   =3D> 30,   =20
> );
> 
> 
> @servers =3D ( \%serverA, );
> 
> #######################################
> 
> In the output, swish was getting hung up on "&psi=3D"  in the query =
> string.
> It was converting it to the character entity of the greek alphabet "Psi" =
> (&psi;)
> and getting caught in an infintite loop:

Interesting.  It's hard to see what's going -- seems like the list
server doesn't deal with quoted printable mail. 

I also tried spidering and I didn't see any problems with psi.

my %serverA = (
     base_url => 'http://www.generac-portables.com/pressure_washers/pressure_washer.cfm?id=183&use=&psi=&order=1',
     keep_alive  => 0,

     email => 'moseley@hank.org',
     test_url    => sub {
         my $uri = shift;
            if ($uri->path =~ /pressure_washer\.cfm/){
          return 1 ;}
         else {return 1;}
         },
    use_md5  => 1,
    delay_sec => 0,
    max_files   => 30,
);


@servers = ( \%serverA, );

moseley@bumby:~$ /usr/local/lib/swish-e/spider.pl spider.conf  >xx
/usr/local/lib/swish-e/spider.pl: Reading parameters from 'spider.conf'

moseley@bumby:~$ fgrep Path-Name xx
Path-Name: http://www.generac-portables.com/pressure_washers/pressure_washer.cfm?id=183&use=&psi=&order=1
Path-Name: http://www.generac-portables.com/generators/index.cfm
Path-Name: http://www.generac-portables.com/pressure_washers/index.cfm
Path-Name: http://www.generac-portables.com/where_to_buy/index.cfm
Path-Name: http://www.generac-portables.com/service_support/faq/index.cfm
Path-Name: http://www.generac-portables.com/index.cfm
Path-Name: http://www.generac-portables.com/pressure_washers/pw_basics.cfm
Path-Name: http://www.generac-portables.com/pressure_washers/pw_project_tips.cfm
Path-Name: http://www.generac-portables.com/pressure_washers/glossary.cfm
Path-Name: http://www.generac-portables.com/pressure_washers/pressure_washer.cfm?order=2&id=214&use=&price=&ppsi=


moseley@bumby:~$ swish-e -i stdin -S prog -v0 < xx    
moseley@bumby:~$ swish-e -w not dkdkdkd -x '%p\n'
# SWISH format: 2.5.2
# Search words: not dkdkdkd
# Removed stopwords: 
# Number of hits: 10
# Search time: 0.017 seconds
# Run time: 0.036 seconds
http://www.generac-portables.com/pressure_washers/pressure_washer.cfm?id=183&use=&psi=&order=1
http://www.generac-portables.com/pressure_washers/glossary.cfm
http://www.generac-portables.com/pressure_washers/pw_project_tips.cfm
http://www.generac-portables.com/pressure_washers/pw_basics.cfm
http://www.generac-portables.com/index.cfm
http://www.generac-portables.com/service_support/faq/index.cfm
http://www.generac-portables.com/where_to_buy/index.cfm
http://www.generac-portables.com/pressure_washers/index.cfm
http://www.generac-portables.com/generators/index.cfm
http://www.generac-portables.com/pressure_washers/pressure_washer.cfm?order=2&id=214&use=&price=&ppsi=
.

-- 
Bill Moseley
moseley@hank.org

Unsubscribe from or help with the swish-e list: 
   http://swish-e.org/Discussion/

Help with Swish-e:
   http://swish-e.org/current/docs
   swish-e@sunsite.berkeley.edu
Received on Tue Nov 23 10:22:55 2004