Skip to main content.
home | support | download

Back to List Archive

RE: RE: swish-e spider does not go beyond index.html

From: Christian Stalberg <Christian.Stalberg.stalberg(at)not-real.nt.com>
Date: Tue Oct 27 1998 - 15:32:56 GMT
YES!

Christian
Christian Stalberg
Web Consultant, Dept. 3187
NORTEL, Signaling Solutions Group
* Phone: (919) 905-4975 ESN 355-4975
*      Fax: (919) 905-8313 ESN 395-8313
*     Email: Christian.Stalberg.stalberg@nt.com
<mailto:Christian.Stalberg.stalberg@nt.com> 

	----------
	From:  Ron Klatchko [SMTP:ron@ckm.ucsf.edu]
	Sent:  Friday, October 16, 1998 3:14 PM
	To:  Multiple recipients of list
	Subject:  [SWISH-E] RE: swish-e spider does not go beyond index.html

	Christian Stalberg wrote:
	> Oops, someone reminded me that starting with a frames webpage will
not work.
	> I have changed the IndexDir to a TOC page for the frames and it
appears to
	> be working. Is there any special wisdom anyone can share re. using
swish-e
	> to index frames webpages?

	Frames are a tricky situation when it comes to searching.  It would
be
	simple to make the spider fall frame links as well, but what happens
on
	retrieval?  SWISH would return the URL of the individual frame that
	contained what they were searching for and the user would see only
that
	frame instead of the nicely constructed frameset you constructed.

	There might be a solution to that in some clever file layout and use
of
	ReplaceRules.  One idea is below.

	Another possibility would be to have a no frames version with
identical
	content.  SWISH can spider that currently.  This also has the nice
	benefit of opening your site to non-frames aware browsers. 
	Unfortunately, even frames aware browser would end up with the
	non-frames version when they search.

	So, going back to the clever layout/rewrite idea.  Let's assume that
	swishspider can now follow frame links.  Also assume you have a
basic
	frame set with the left side as a table of contents and the right
side
	with your various data pages.

	In order to do this, you'll need a main directory and one
subdirectory
	for each page.

	The main directory contains index.html which defines your frameset
and
	toc.html which is your table of contents.  You have a series of
	directories called page1, page2, etc. inside of which you have
	page1.html, page2.html, etc.  Also, each of these directories
contains
	index.html.  The different between this and the main index.html is
the
	starting page for the right side; the main index.html points to page
1
	where the one in the subdirectories points to their own page
	(page2/index.html has page2.html as the right hand side).  For an
	example of this structure you can check out
	http://samiam.ckm.ucsf.edu/frame/

	If you then introduce the rule:
	  ReplaceRules remove "page[0-9]+.html"

	a search that gets ../pageN/pageN.html gets rewritten to ../pageN/
which
	preserves the frame set.

	More complicated use of frames would require even more thought, but
it
	is a possibility.

	Are people interested in doing such a thing?  Should I modify
	swishspider to be able to follow framelinks?

	moo
	
----------------------------------------------------------------------
	          Ron Klatchko - Manager, Advanced Technology Group

	           UCSF Library and Center for Knowledge Management

	                           ron@ckm.ucsf.edu
Received on Tue Oct 27 09:39:43 1998