Skip to main content.
home | support | download

Back to List Archive

Re: [swish-e] SWISH::Filter module not found

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Wed Oct 27 2010 - 20:18:01 GMT
On Wed, Oct 27, 2010 at 10:31 AM, Troy Wical <troy@wical.com> wrote:

> > Warning: Unknown header line: 'ath-Name:
> http://type2.com/ezmlm-archives/index.cgi?list=type2&cmd=monthbydate&month=201009'
> from program spider.pl
> > err: External program failed to return required headers Path-Name:
> Regarding the "Unknown header line" error, I'm having a heck of a time
> finding anything related to that. Full debugging has been activated, and
> I've gone through and tried to look at the previous URL's that may be
> throwing it off, but no luck yet. Maybe a break in the garage, playing
> mechanic, will help me come back refreshed.
>

I have not looked at that code in, well, years.  Swish *should be working
with bytes, so my guess is that the spider is telling swish that the content
is one byte longer than it really is.

http://dev.swish-e.org/browser/swish-e/trunk/prog-bin/spider.pl.in#L1409

 # Re-encode the data for outside of Perl
1407    eval {
1408        # Need to only require Encode here?
1409        $$content = Encode::encode( $server->{charset}, $$content )
1410            if $server->{charset};
1411    };
1412    if ( $@ ) {
1413        print STDERR "Warning: document '", $response->request->uri, "'
could not be encoded to charset '$server->{charset}'\n";
1414        delete $server->{charset};
1415    }

$content should now be a reference to a string of bytes.


1416
1417    $server->{counts}{'Total Bytes'} += length $$content;
1418    $server->{counts}{'Total Docs'}++;
1419
1420
1421    # ugly and maybe expensive, but perhaps more portable than "use
bytes"
1422    my $bytecount = length pack 'C0a*', $$content;
1423

This is a wild guess, but what if you replace that with:

my $bytecount = length $$content;

It's probably the same but that's how I would get the length from a string
of bytes.

The other thing, if you really want to battle this, is to output the spider
to a file and then use an editor and try and figure out the length
difference -- or maybe just add an extra space character before the
Path-Name line where it's failing and then feed that to swish.


-- 
Bill Moseley
moseley@hank.org


_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Wed Oct 27 16:18:24 2010