Skip to main content.
home | support | download

Back to List Archive

Swish Spider with some support for FRAMESET HTMLS

From: Chris Humphries <ChrisJMH(at)>
Date: Mon Mar 06 2000 - 15:18:55 GMT
*** I've been having some trouble with file attachments. Just in case this 
one didn't get through intact, here it is again ***

Here is a modified version of the Swish Spider that can handle FRAMESET 
HTML documents.

Before trying to use it, please read the notes below:

* It works by reading through frame source links and creating a single HTML 
file which is passed on to be indexed.

* Although Swish-E does not index documents in different domains, the 
spidering operation that reads through the frame source links *does*. This 
is because frame source HTML documents are sometimes in a different 

* Any href links found in any frame source files are passed on as if they 
were links off that single HTML file. Because these may be in different 
domains, they may not be indexed.

* If you start your HTTP spidering with a file which you know is part of a 
frame set, but which is not the root frame set file THIS NEW SPIDER CAN NOT 
RECOGNISE THAT. It will *only* spider those frame set files BELOW the file 
that you start with.

This version of the Swish Spider has been modified by Chris Humphries.
It comes with no guarantees.
It has been tested to a limited degree on real data.
It has not been tested exhaustively on all possible cases.

* I hope that you find this useful. If you have any problems with this new 
version of the spider, please tell me, but I must warn you that I am fairly 
busy most of the time and may not be able to reply to you straight away.

Chris Humphries

Received on Mon Mar 6 10:22:56 2000