if you have access to the filesystem where the files are stored, is there some
advantage to using the spider at all?
otherwise you could do:
swish-e -i /path/to/site1 -c config1
swish-e -i /path/to/site1/site2 -c config2
which would be both faster and create two different indexes for searching.
fh oregon scribbled on 4/10/05 6:02 PM:
> My goal here is to have the main site and the virtual site(s) indexed
> and searchable. The more I mull this over I came up with a way to fake
> out the indexer. As a test, I placed a (hidden) link on the main page
> directly to the /SFCC directory and !!! It looks like it is all working
> now. I need to do more testing.
> Bill Moseley wrote:
>>On Sat, Apr 09, 2005 at 11:10:12AM -0700, fh oregon wrote:
>>>The root of the site (frankhunt.com) is /web/httpd/htdocs Within that
>>>directory is the main index.html as well as a few other html documents
>>>and directorys for other parts of the site. One of those directorys is
>>>/web/httpd/htdocs/SFCC which is the root of the
>>Again, the spider has NO knowledge of your directory structure. If
>>you spider frankhunt.com and there's no pages in frankhunt.com in CFCC
>>then it won't spider them.
>>Try it yourself. Go to frankhunt.com and only click on links that
>>include frankhunt.com as the host name. That's all that will be
>>indexed. That link to CFCC is not the same host name.
>>Look, you also link to http://www.fs.fed.us/gpnf/volcanocams/msh/ --
>>do you expect that to get indexed? And everything it links to, also?
>>Sounds like you are not clear on how web servers map directories.
Peter Karman . http://peknet.com/ . peter(at)not-real.peknet.com
Received on Sun Apr 10 16:13:45 2005