Skip to main content.
home | support | download

Back to List Archive

Incorrect behavior for swishspider script

From: Andrew Ho <andrew(at)not-real.tellme.com>
Date: Sat Apr 08 2000 - 01:42:46 GMT
Hello,

The swishspider script that comes with SWISH-E has a slight error, it will
not look for or report links in any document that has a content-type that
is not exactly "text/html". Unfortunately, this means that a page with
this perfectly valid HTTP 1.1 header:

  Content-type: text/html; charset=ISO-8859-1

does not get indexed. A quick fix to the script is to change line 50 of
the script swishspider from:

  if( $response->header("content-type") eq "text/html" ) {

to:

  if( $response->header("content-type") =~ m(text/html) ) {

On another note, perhaps there should be a configuration option to set the
full path AND FILENAME of the spidering program, such that the spidering
program does not need to be explicitly called "swishspider" (if, for
example, I wanted to write an intelligent spider of my own that knows the
structure of my site).

Or at the very least some documentation about the interaction between the
spider program and the SWISH-E indexing program.

Humbly,

Andrew

----------------------------------------------------------------------
Andrew Ho               http://www.tellme.com/       andrew(at)not-real.tellme.com
Engineer                   info@tellme.com          Voice 650-930-9062
Tellme Networks, Inc.                                 Fax 650-930-9101
----------------------------------------------------------------------
Received on Fri Apr 7 21:45:27 2000