Re: Differences

From: John Almberg <jalmberg(at)>
Date: Fri Jun 27 2003 - 19:22:12 GMT
Thanks for your ideas, Nathan.

Actually, I'm looking for a general solution to the problem--one that is 
independant of the target website--mainly because I won't have control over 
the content, or how it is generated.

One idea I've had since my last post is to do a difference on the *index 
files*. I haven't had a chance to look at the structure of the index files, 
so not sure if this is a practical solution, but if I can create an index 
file that just contains NEW indexes, that would solve the problem.

-- John

On Fri, 27 Jun 2003 11:54:33 -0700 (PDT), Nathan Vonnahme 
<> wrote:

> If your content is in some sort of database, it seems to me the easier 
> approach would be to feed the swish index just the new content every day, 
> if you can.  Either with a script that queries the database directly, or 
> by creating an alternative version of the site where only the new stuff 
> is displayed, then spidering it and translating the urls to the real 
> site.
> Or have your layout code automatically put <noindex> tags around old 
> sections, that would save having to keep two copies and compare them.   
> If you use flat files, you could use the diff tool to compare the 
> different files and feed only the additions to swish.
> Anyway, it seems more straightforward to limit what swish is paying 
> attention to when you do the indexing, rather than trying to build the 
> newness sensing into the search side of things.
> -n
