Skip to main content.
home | support | download

Back to List Archive

RE: Patched Spider

From: PropheZine Webmaster <bob(at)not-real.prophezine.com>
Date: Sat Jun 24 2000 - 10:31:24 GMT
Hi:

I tested your spider against my personal site and it worked fine.  I tested
against
http://www.prophecyinthenews.com/index.asp

and it does not function:



Indexing Data Source: "HTTP-Crawler"
retrieving http://www.prophecyinthenews.com/ (0)...
 (18 words)

Removing very common words... no words removed.
Writing main index... 14 unique words indexed.
Writing file index... 1 file indexed.
Running time: 4 seconds.
Indexing done!
[usr147@unix2502 search]$






Bob

-----Original Message-----
From: swish-e@sunsite.berkeley.edu
[mailto:swish-e@sunsite.berkeley.edu]On Behalf Of David Norris
Sent: Friday, June 23, 2000 10:55 PM
To: Multiple recipients of list
Subject: [SWISH-E] Patched Spider


PropheZine Webmaster wrote:
> 2.  applied the spider and spider2 patches

The spider patch is already applied to http.c in 1.3.2.

I have a patched and tested swishspider at:
http://www.webaugur.com/wares/files/swishspider

I also changed the #! to /usr/bin/perl since that is a standard location
for the PERL binary.  The reference in the distribution is SPARC
specific.

The spider works perfectly with "perl, version 5.005_03 built for
i386-linux"  I can't test on BSD since the version of PERL on the system
is ancient (as is the system).

> if( substr($response-header("content-type"), 0, length("text/html")) eq
> "text/html" ) {

Looks correct to me.

--
,David Norris
  Dave's Web - http://www.webaugur.com/dave/
  Dave's Weather - http://www.webaugur.com/dave/wx
  ICQ Universal Internet Number - 412039
  E-Mail - dave@webaugur.com
Received on Sat Jun 24 06:46:52 2000