The problem is that the spider.pl doesn't know anything about whether a file was
indexed or not. The spider just pulls down files via http and then handles them
as you've configured, then prints the content out, either to swish-e or a file
or some other 'thing' you've designated.
If a file fails to be indexed, the swish-e command will issue an error/warning,
depending on how you've configured it (see ParserWarnLevel and Verbose settings
in swish configuration docs). So you could parse the output of swish-e to see
what succeeds and what doesn't.
If what you're really asking is how to check if spider.pl successfully retrieved
and handled each url, then the test_response function is likely what you want.
There are some examples in the spider.pl docs.
Hope that helps.
andy rosbrook scribbled on 2/5/06 10:41 AM:
> Ok, ive been experimenting with the callback functions. I'm a bit confused
> as to what i can do through them?
> I'd just like to know if there is anyway i could write to a database with
> two fields:
> indexing_result | indexing_error
> and if the indexing is successfuly i can just write TRUE to the database, if
> the indexing is not successfull then i would like to write FALSE to the
> database and write the error that meant indexing failed.
> I've been trying to pipe out STDERR and have been trying to create an
> indexing API, just wondering if the problem above can be solved through
> callback functions? Is it possible to write out the DEBUG info to a database
> through a callback function?
> sorry for all the questions!!
>>From: Bill Moseley <email@example.com>
>>To: Multiple recipients of list <firstname.lastname@example.org>
>>Subject: [SWISH-E] Re: Callback Functions For Indexing
>>Date: Fri, 27 Jan 2006 10:23:41 -0800 (PST)
>>On Fri, Jan 27, 2006 at 09:44:44AM -0800, andy rosbrook wrote:
>>>Well i just want to know after each URL in spider.config weather the
>>>spidering was a success or a failure. I know i could just check for a
>>>complete index.swish-e but this doesnt allow me to capture any error
>>use test_response callback in your config. That's called right after
>>the HEAD or GET request has returned (and sometimes before all the
>>data has be fetched from the remote server).
>>Unsubscribe from or help with the swish-e list:
>>Help with Swish-e:
> Are you using the latest version of MSN Messenger? Download MSN Messenger
> 7.5 today! http://messenger.msn.co.uk
Peter Karman . http://peknet.com/ . peter(at)not-real.peknet.com
Received on Wed Feb 8 10:54:04 2006