On Tue, Aug 19, 2003 at 01:18:38AM -0700, Bucharow Leonard wrote:
> Hi Bill and Co.,
> first I may not understand, what you mean with:
> anyway I unfortunately can't influence the humans to create links with
> HTML/XML or th. else then java-plug-in.
for a lot of their navigation. Half the time they don't work right and
my forward and back buttons don't work as expected. And I do turn off
things are not working.
> Second I have two another questions:
> Can SWISH-E write IndexReport in a file (f.e. during executing a cron job)?
> If yes, how?
My opinion is that cron jobs are better if they only report errors.
Otherwise you start ignoring the logs.
So I use
swish-e -c config -v0
Otherwise, pipe swish-e's output to grep or awk or perl and extract out
the data you want logged. Swish writes \r to overwrite the percentage
complete, so just writing that to a file might not look too good --
which is why I suggest piping to some program to filter out the data you
want to keep.
> I'm trying to spider not the entire web server but only a web folder (f.e. I
> may not to spider the apache manual).
> In the SwishpiderConfig.pl I've set the option:
> base_url => http://host/intranet/
> But spider.pl indexes the entire web server! Do I something wrong?
> I've excluded the folder with robots.txt, but I don't understand, why can't
> I set up the folder to index?
The only limitation is that it only indexes one server (host name) at a
time (per section of the spider config file). If you set
base_url => http://host/foo_directory
there's nothing to keep it from indexing any other directory on "host".
But you can use robots.txt to limit what is indexed. You can also setup
a "test_url()" callback function to limit to, say, just the
"foo_directory" directory. See:
Received on Tue Aug 19 12:39:46 2003