A couple of posts ago I was wondering how swish could get
a high/top ranking for http://www.swish-e.org/docs/ when
searching for "documentation". (That kind of enraged Bill
- sorry ;-)
Anyway, this is what I've been thinking about (at a fairly
abstract level):
1) The spider indexes document DDDD.
2) It locates an URL to UUUU and stores it (so that it can
index it later on).
3) Instead of just remembering the URLs, let's also store the
words WWWWW that are located inside <A HREF="UUUU"> </A>
tags.
4) When the url UUUU is indexed, add a meta name, say,
"swishlinkedas" and assign value WWWWW to it.
5) Include the meta ("swishlinkedas") when searching
So, when indexing http://www.swish-e.org, I see a link to
http://www.swish-e.org/docs. Inside the link to
http://www.swish-e.org/docs there is word "documentation".
I store this word in a hash indexed by the url.
When the spider eventually gets http://www.swish-e.org/docs
it adds a meta "swishlinkedas" and assigns value "documentation"
to it.
One problem is that the spider knows all values for the
"swishlinkedas" meta only after it has spidered all documents.
But let's not worry about that now. Do you think the algorithm
makes sense? I'm willing to do some work, err, some time during
holiday (unless the idea is totally brain-damaged)
A.
----------------------------------------------------------------------
Znajdz swoja milosc na wiosne... >>> http://link.interia.pl/f187a
Received on Thu Jun 9 13:48:59 2005