Skip to main content.
home | support | download

Back to List Archive

Swish-E Spider and Frameset HTML

From: Chris Humphries <ChrisJMH(at)not-real.vermilion99.freeserve.co.uk>
Date: Tue Feb 22 2000 - 12:18:49 GMT
This is a message for Ron Samuel Klatchko:

Ron, I have tried the following in your spider program. It seems to allow the spider to read through the links in frameset HTML documents.

# This is the whole of "sub linkcb"
# The small bit that I added runs from "Start of addition" to "End of addition"

sub linkcb {
    my($tag, %links) = @_;

    if (($tag eq "a") && ($links{"href"})) {
        my $link = $links{"href"};

        #
        # Remove fragments
        #
        $link =~ s/(.*)#.*/$1/;
        
        #
        # Remove ../  This is important because the abs() function
        # can leave these in and cause never ending loops.
        #
        $link =~ s/\.\.\///g;
        
	print LINKS "$link\n";
    }


    # Start of addition
    # Extract frameset links

    if (($tag eq "frame") && ($links{"src"})) {
        my $link = $links{"src"};

        #
        # Remove fragments
        #
        $link =~ s/(.*)#.*/$1/;
        
        #
        # Remove ../  This is important because the abs() function
        # can leave these in and cause never ending loops.
        #
        $link =~ s/\.\.\///g;
        
        print LINKS "$link\n";
    }

    # End of addition


}

Chris Humphries
Received on Tue Feb 22 07:22:32 2000