To stop the spider from indexing .cgi files:
replace "*.cgi"
Though the indexes aren't made using the same method (FS vs. HTTP) try merging them using
swish-e -M index.one index.two big.index
nila
----- Original Message -----
From: Gretchen Helms <ghelms@excitecorp.com>
To: Multiple recipients of list <swish-e@sunsite.berkeley.edu>
Sent: Tuesday, January 25, 2000 8:39 PM
Subject: [SWISH-E] Combining two types of index?
> I've got a bit of a problem I'm trying to creatively solve.
>
> I've got a collection of different servers that I'm all trying to index, using
> the HTTP method. This works great, until it hits this one server that
> is using a pile of .cgi scripts that are tools to do stuff with. Since Friday
> night the indexer has been indexing non-stop, and I just had to shoot the
> thing this morning because it was STILL RUNNING, chasing down every
> single page associated with the .cgi files...and there are a LOT of them.
>
> Since I can't seem to specify in the HTTP method that I do NOT want
> .cgi files followed, here's what I'm now trying to do:
> * Index this one specific server as type FILESYSTEM into its own index.
> * Index everything else as HTTP
> * Combine these two indexes into one, so I can search it. Or, figure out
> a way to search first one index and then the second and return all matches.
>
> Anybody tried this? I'm not having much luck digging through the docs.
>
> ----
> Gretchen Helms
> Project Manager: Csearch
> Excite@Home x2199 (pager: beepghelms@excitecorp.com)
>
>
Received on Wed Jan 26 05:39:47 2000