Skip to main content.
home | support | download

Back to List Archive

RE: HTTP Crawler

From: Hsiao Ketung Contr 61 CS/SCBN <KETUNG.HSIAO(at)not-real.LOSANGELES.AF.MIL>
Date: Thu May 02 2002 - 16:51:07 GMT

.response is 500.
like this:
-rw-r--r--   1 root     other       5321 May  1 16:21 ..contents
-rw-r--r--   1 root     other        638 May  1 16:21 ..links
-rw-r--r--   1 root     other          4 May  1 16:32 ..response

% more ..response

.response is changed and the other 2 files are not changed by looking
at the time stamp.
I'll have to see what ..response =500 means.

-----Original Message-----
From: Bill Moseley []
Sent: Wednesday, May 01, 2002 4:45 PM
To: Hsiao Ketung Contr 61 CS/SCBN; Multiple recipients of list
Subject: RE: [SWISH-E] HTTP Crawler

At 04:36 PM 05/01/02 -0700, Hsiao Ketung Contr 61 CS/SCBN wrote:
>But if I run the following (from src directory)
>./swishspider . http://my-intranet-server-name/tmp.html.
>The content in ..links is unchanged.

How about the response code and the content?

>So, the run for the intranet URL is not working.
>How do I get swishspider to to run intranet also ?

Find out what's blocking the request.

>Can anyone please shed some light on this one ?
>>$url =~ s/http\:\/\/www\.losangeles\.af\.mil\///;
>>	into  the while loop in
>>	sub search_parse.
>Yes, the above is Perl code.  The above code is to blank out
> from the $url variable.

Yes, I know what it does.  I just don't know what that applies to.  Some
CGI script you are running?

If all the slashes make you dizzy then you might try:

  $url =~ s[\Q][];

\Q is probably not needed.

Bill Moseley
Received on Thu May 2 16:51:10 2002