Skip to main content.
home | support | download

Back to List Archive

Re: Swish-E and HTML documents with frames

From: Chris Humphries <ChrisJMH(at)not-real.vermilion99.freeserve.co.uk>
Date: Fri Feb 25 2000 - 10:35:53 GMT
Ron,

Yes, I am concerned that there might be documents formed by this procedure that might not get indexed properly.
I will be testing the new code on a variety of different framed documents before I consider uploading it.

I have tested Swish-E to see if it can index two concatenated HTML files and it seemed quite happy.
I have also tried indexing a site with 3 or more levels of frames. It concatenated 6 files in all and appeared to index them correctly.

I would be grateful if you could send me a sample of the case you described below to see if it will make my code or Swish-E fail.

Chris Humphries


-----Original Message-----
From:	Ron Samuel Klatchko [SMTP:rsk@corpmail.brightmail.com]
Sent:	Thursday, February 24, 2000 8:59 PM
To:	Multiple recipients of list
Subject:	[SWISH-E] Re: Swish-E and HTML documents with frames

Chris Humphries wrote:
> If the spider program detects that the document is a framed HTML, it
> recursively builds content by reading through the <frame src> pointers, and
> builds up a list of all the <a href> links that it finds. It then passes
> *this* back to the C program, which indexes the document as if it were one
> big HTML. The spidering will work as if all the <a href> links found in in
> the frameset HTML files were at level 1.

Can you explain how this works a little more.  I'm thinking specifically
about something like this:

index.html: frameset
             top.html
             bottom1.html

bottom1.html: <A HREF="bottom2.html">
bottom2.html: <A HREF="bottom3.html">
etc.

It sounds like bottom1.html will be properly indexed as part of the
index.html frameset, but what about bottom2, bottom3, etc.  Also, does
the presence of a TARGET tag affect anything?

moo
------------------------------------------------------------
           Ron Samuel Klatchko - Software Jester
            Brightmail Inc - rsk@brightmail.com
Received on Fri Feb 25 05:39:31 2000