Skip to main content.
home | support | download

Back to List Archive

Re: [swish-e] SWISH::Filter module not found

From: Troy Wical <troy(at)not-real.wical.com>
Date: Sun Oct 31 2010 - 13:23:26 GMT
On Oct 27, 2010, at 2:18 PM, Bill Moseley wrote:

> I have not looked at that code in, well, years.  Swish *should be working with bytes, so my guess is that the spider is telling swish that the content is one byte longer than it really is. 
> 
> http://dev.swish-e.org/browser/swish-e/trunk/prog-bin/spider.pl.in#L1409
> 
>  # Re-encode the data for outside of Perl
> 1407	    eval {
> 1408	        # Need to only require Encode here?
> 1409	        $$content = Encode::encode( $server->{charset}, $$content )
> 1410	            if $server->{charset};
> 1411	    };
> 1412	    if ( $@ ) {
> 1413	        print STDERR "Warning: document '", $response->request->uri, "' could not be encoded to charset '$server->{charset}'\n";
> 1414	        delete $server->{charset};
> 1415	    }
> 
> $content should now be a reference to a string of bytes.
> 
> 
> 1416	
> 1417	    $server->{counts}{'Total Bytes'} += length $$content;
> 1418	    $server->{counts}{'Total Docs'}++;
> 1419	
> 1420	
> 1421	    # ugly and maybe expensive, but perhaps more portable than "use bytes"
> 1422	    my $bytecount = length pack 'C0a*', $$content;
> 1423	
> 
> This is a wild guess, but what if you replace that with:
> 
> my $bytecount = length $$content;

That did the trick! Also, one of the items where the spider was failing was on the <, <<, >, and >> character sets that were being used as navigation links between pages. I replaced those with words like "Prev Month" instead, and most of the errors went away. I still get quite a few errors where subject lines have the & character in them. The output still needs tweaking. I'm not sure how to get the document path removed from the output. swishdocpath is not anywhere in the cgi conf file. Also 'swishdescription' I am unable to remove from the front of the excerpt.

Example @ http://type2.com/cgi-bin/search.cgi?query=westy&submit=Search!&sort=swishrank&si=3

As a side note, this will be the solution for a another previously long running thread regarding my attempt to get swish3 to handle this task.
http://swish-e.org/archive/2009-12/12787.html

Troy
_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Sun Oct 31 09:23:30 2010