On Wed, Sep 15, 2004 at 03:49:37PM -0700, SRE <eckert@climber.org> wrote:
> >On Wed, Sep 15, 2004 at 01:37:50PM -0700, Richard Morin wrote:
> >> If I had the URL, I'd know where to start looking...
>
> At 01:57 PM 9/15/2004, Bill Moseley wrote:
> >Well that's easy: it's http://<yourserver>/robots.txt
>
> It's not my discussion, but let me make an educated guess...
> Richard is saying he thinks SWISH should print the URL of
> the file it was attempting to spider when the problem occurred.
> I tend to agree with him.
>
> That's why he said this:
>
> At 01:43 PM 9/15/2004, Richard Morin wrote:
> >The spidering script certainly knows where it's looking, at
> >any given time. Does the module not return a status code? Sigh.
>
> In the commercial software I've written I found it useful to have
> every nested level print something and return an error code as a
> low-level error ripples up to the calling tool aborting. [...]
Yes yes, we all are all in favour of proper exception handling and
debugging information, unit testing and whatnot. The point is this:
IT IS THE FARKING LIBRARY that is retrieving robots.txt, it is an
*external* product. Bill gave the reference where the mechanism is
described. So either talk to the authors of the LWP modules, wait
until someone reinvents the wheel *with* proper exception handling and
reporting, or - tadaa - do it yourself.
That's all.
bkw
Received on Thu Sep 16 03:00:37 2004