On Sat, Apr 09, 2005 at 07:29:20AM -0700, fh oregon wrote:
> As for your second point, I'm not any kind of company - just a guy with
> a fairly large personal web site who (as a hobby) hosts some email lists
> and web sites for a car club and a food club.
But http://mysite.com looks like an email/hosting service. That's not
you? Carclub.com looks like a company, too. So I'm confused. Or are
those just names you borrowed to use on this list?
> Since the root of
> "carclub.com" ("/CARS") is contained within the "mysite.com" tree, I
> would expect that it would be indexed on the same pass. It would be
> interesting to understand just how swish-e traverses the website tree -
> in looking at the log file, it appears to be jumping around and not
> following the directory structure as I would expect. Kinda makes me go
> "hmmmm".
Think about it. All a web spider can do is follow links. It has no
idea about your directory structure at all. Many web sites don't even
have any directory structure -- they are all dynamically created from
a database.
There is no way for the spider to know that http://carclub.com is
contained in http://mysite.com's web or file space. If you had a
spider that didn't limit to specific hosts then you would end up
indexing the entire Web. If you want to index a host you have to tell
the spider to index that host.
Post more details about what you want to do and we can help you get it
done.
--
Bill Moseley
moseley@hank.org
Unsubscribe from or help with the swish-e list:
http://swish-e.org/Discussion/
Help with Swish-e:
http://swish-e.org/current/docs
swish-e@sunsite.berkeley.edu
Received on Sat Apr 9 09:48:31 2005