I need to spider a web site that has a Metadata tag in the main index.html
that redirects the browser to index.php.
Example:
www.intellivence.com/index.html ->
<meta http-equiv="refresh" content="0;
URL=http://www.intellivence.com/index.php">
This works fine via browsers (albeit slowly).
I know there are better ways to do this (use web server to set head html
correctly, etc.) , but we can't control foreign sites. Should not the
spider system behave similar to browsers?
Is this the current behavior or did I miss something?
Regards,
Robert Keith
==============================================
The command I run is:
/usr/bin/perl /fs/area/search/prog-bin/spider.pl
/fs/area/search/conf/prof2.pl | swish-e -S prog -c
/fs/area/search/conf/prof2 -i stdin -v3
The output is:
Parsing config file '/fs/area/search/conf/prof2'
Parsing config file '/fs/area/search/conf/common.config'
Indexing Data Source: "External-Program"
Indexing "stdin"
/fs/area/search/prog-bin/spider.pl: Reading parameters from
'/fs/area/search/conf/prof2.pl'
-- Starting to spider: http://www.intellivence.com/ --
>> +Fetched 0 Cnt: 1 http://www.intellivence.com/ 200 OK text/html 130
parent:
Summary for: http://www.intellivence.com/
Total Bytes: 130 (130.0/sec)
Total Docs: 1 (1.0/sec)
Unique URLs: 1 (1.0/sec)
http://www.intellivence.com/ - Using HTML parser - (no words indexed)
Removing very common words...
no words removed.
Writing main index...
err: No unique words indexed!
Received on Mon Jul 28 08:43:56 2003