Skip to main content.
home | support | download

Back to List Archive

Re: [SWISH-E:175] Re: Fw: Re: More?

From: Paul J. Lucas <pjl(at)not-real.ptolemy.arc.nasa.gov>
Date: Thu Mar 05 1998 - 17:06:58 GMT
On Thu, 5 Mar 1998 00prneubauer@bsuvc.bsu.edu wrote:

> Paul J. Lukas wrote:  
> >
> >	The breadth-first strategy of SWISH++ rather than the depth-
> >	first one of SWISH-E is certainly better suited to web
> >	indexing.
> 
> This is an interesting proposition, but I'm afraid it's not obvious to
> me why this should be so.

	The reason is simple: depth first would take an extremely long
	time to complete and blow your stack moste likely since, every
	time you come to a link, you'd follow it to another page and
	possibly to another site.  You'd never get around to finishing
	the site you started at until months later.  Breadth-first
	would completes an entire page (and therefore an entire sitre)
	at a time before moving on.

> At first glance, a breadth-first strategy looks like it would spend
> all its time listing sites and never get around to individual
> pages/documents.

	You don't understand.

> I think it is clear that this strategy would never work for something like
> Altavista, so I'm sure that this was not what Paul Lukas was referring to.
> :-)

	The name is "Lucas" with a 'C', the far more common spelling.

> Still, even for a limited number of sites (e.g., one) it's not obvious
> to me why breadth-first is going to be faster, more efficient or
> otherwise better suited than depth-first.

	Well, what can I say.  Consult an algorithms text.

> Depth-first strikes me as more obvious/natural,

	Bubble sort strikes people as natural, but it sucks as a
	sorting algorithm.

> but I have certainly not ever thought about it in any detail.

	So let me understand: you've never though about it in any
	detail, yet you're coming to conclusions?

> Is the efficiency dependent on how broad the breadth is and/or how deep the
> depth is?

	I never said the efficiency issue was related to whether it does
	breath- or depth-first, so the answer is no.

> If not, what makes one better for web indexing?

	See above.

> Paul L. says that swish++ is "an order of magnitude faster than SWISH-E."  I
> am not going to dispute the truth of this statement, but if the factor that
> makes it true is a breadth-first algorithm rather than a depth-first
> algorithm, then there is probably a simple explanation for why this is so.

	I never cited that as a reason precicely because it isn't a
	reason.  Read the README for the real reasons why it is faster.

	- Paul J. Lucas
	  NASA Ames Research Center		Caelum Research Corporation
	  Moffett Field, California		San Jose, California
	  <pjl AT ptolemy DOT arc DOT nasa DOT gov>
Received on Thu Mar 5 09:15:31 1998