Skip to main content.
home | support | download

Back to List Archive

Re: Callback Functions For Indexing

From: Peter Karman <peter(at)not-real.peknet.com>
Date: Wed Feb 08 2006 - 18:54:01 GMT
The problem is that the spider.pl doesn't know anything about whether a file was 
indexed or not. The spider just pulls down files via http and then handles them 
as you've configured, then prints the content out, either to swish-e or a file 
or some other 'thing' you've designated.

If a file fails to be indexed, the swish-e command will issue an error/warning, 
depending on how you've configured it (see ParserWarnLevel and Verbose settings 
in swish configuration docs). So you could parse the output of swish-e to see 
what succeeds and what doesn't.

If what you're really asking is how to check if spider.pl successfully retrieved 
and handled each url, then the test_response function is likely what you want. 
There are some examples in the spider.pl docs.

Hope that helps.

andy rosbrook scribbled on 2/5/06 10:41 AM:

> Ok, ive been experimenting with the callback functions. I'm a bit confused 
> as to what i can do through them?
> 
> I'd just like to know if there is anyway i could write to a database with 
> two fields:
> 
> indexing_result | indexing_error
> 
> and if the indexing is successfuly i can just write TRUE to the database, if 
> the indexing is not successfull then i would like to write FALSE to the 
> database and write the error that meant indexing failed.
> 
> I've been trying to pipe out STDERR and have been trying to create an 
> indexing API, just wondering if the problem above can be solved through 
> callback functions? Is it possible to write out the DEBUG info to a database 
> through a callback function?
> 
> sorry for all the questions!!
> andy
> 
> 
> 
>>From: Bill Moseley <moseley@hank.org>
>>Reply-To: moseley@hank.org
>>To: Multiple recipients of list <swish-e@sunsite3.berkeley.edu>
>>Subject: [SWISH-E] Re: Callback Functions For Indexing
>>Date: Fri, 27 Jan 2006 10:23:41 -0800 (PST)
>>
>>On Fri, Jan 27, 2006 at 09:44:44AM -0800, andy rosbrook wrote:
>>
>>>Well i just want to know after each URL in spider.config weather the
>>>spidering was a success or a failure. I know i could just check for a
>>>complete index.swish-e but this doesnt allow me to capture any error
>>>messages.
>>
>>use test_response callback in your config.  That's called right after
>>the HEAD or GET request has returned (and sometimes before all the
>>data has be fetched from the remote server).
>>
>>--
>>Bill Moseley
>>moseley@hank.org
>>
>>Unsubscribe from or help with the swish-e list:
>>   http://swish-e.org/Discussion/
>>
>>Help with Swish-e:
>>   http://swish-e.org/current/docs
>>   swish-e@sunsite.berkeley.edu
>>
> 
> 
> _________________________________________________________________
> Are you using the latest version of MSN Messenger? Download MSN Messenger 
> 7.5 today! http://messenger.msn.co.uk
> 

-- 
Peter Karman  .  http://peknet.com/  .  peter(at)not-real.peknet.com
Received on Wed Feb 8 10:54:04 2006