Skip to main content.
home | support | download

Back to List Archive

Sv: Combining two types of index?

From: Nils Lastein <nila(at)not-real.dsr.kvl.dk>
Date: Wed Jan 26 2000 - 10:36:48 GMT
To stop the spider from indexing .cgi files:

replace "*.cgi"

Though the indexes aren't made using the same method (FS vs. HTTP) try merging them using 
    swish-e -M index.one index.two big.index

nila
----- Original Message ----- 
From: Gretchen Helms <ghelms@excitecorp.com>
To: Multiple recipients of list <swish-e@sunsite.berkeley.edu>
Sent: Tuesday, January 25, 2000 8:39 PM
Subject: [SWISH-E] Combining two types of index?


> I've got a bit of a problem I'm trying to creatively solve.
> 
> I've got a collection of different servers that I'm all trying to index, using
> the HTTP method.  This works great, until it hits this one server that
> is using a pile of .cgi scripts that are tools to do stuff with.  Since Friday
> night the indexer has been indexing non-stop, and I just had to shoot the
> thing this morning because it was STILL RUNNING, chasing down every
> single page associated with the .cgi files...and there are a LOT of them.
> 
> Since I can't seem to specify in the HTTP method that I do NOT want
> .cgi files followed, here's what I'm now trying to do:
> * Index this one specific server as type FILESYSTEM into its own index.
> * Index everything else as HTTP
> * Combine these two indexes into one, so I can search it.  Or, figure out
> a way to search first one index and then the second and return all matches.
> 
> Anybody tried this?  I'm not having much luck digging through the docs.
> 
> ----
> Gretchen Helms
> Project Manager: Csearch
> Excite@Home x2199 (pager: beepghelms@excitecorp.com)
> 
> 
Received on Wed Jan 26 05:39:47 2000