Thanks for your ideas, Nathan.
Actually, I'm looking for a general solution to the problem--one that is
independant of the target website--mainly because I won't have control over
the content, or how it is generated.
One idea I've had since my last post is to do a difference on the *index
files*. I haven't had a chance to look at the structure of the index files,
so not sure if this is a practical solution, but if I can create an index
file that just contains NEW indexes, that would solve the problem.
-- John
On Fri, 27 Jun 2003 11:54:33 -0700 (PDT), Nathan Vonnahme
<nathan.vonnahme@bannerhealth.com> wrote:
>
> If your content is in some sort of database, it seems to me the easier
> approach would be to feed the swish index just the new content every day,
> if you can. Either with a script that queries the database directly, or
> by creating an alternative version of the site where only the new stuff
> is displayed, then spidering it and translating the urls to the real
> site.
>
> Or have your layout code automatically put <noindex> tags around old
> sections, that would save having to keep two copies and compare them.
> If you use flat files, you could use the diff tool to compare the
> different files and feed only the additions to swish.
>
> Anyway, it seems more straightforward to limit what swish is paying
> attention to when you do the indexing, rather than trying to build the
> newness sensing into the search side of things.
>
> -n
Received on Fri Jun 27 19:22:16 2003