Skip to main content.
home | support | download

Back to List Archive

Re: [swish-e] SWISH::Filter module not found

From: Troy Wical <troy(at)not-real.wical.com>
Date: Wed Oct 27 2010 - 14:45:59 GMT
On Oct 26, 2010, at 10:09 PM, Peter Karman wrote:

> Troy Wical wrote on 10/26/10 11:06 PM:
>> Thanks for that. It's not the first time you've mentioned to me the issues of having modules installed from different areas. I edited spider.pl to point to the CPAN version and the errors are no more. I do get the following now though after it runs for a couple minutes, I believe it is not due to the page that is being crawled. Though, I've been wrong before.
>> 
>> #############################################
>> Warning: Unknown header line: 'ath-Name: http://type2.com/ezmlm-archives/index.cgi?list=type2&cmd=monthbydate&month=201009' from program spider.pl
>> err: External program failed to return required headers Path-Name:
>> #############################################
>> 
> 
> that sounds like an encoding issue. The problem happens when the length reported
> in the previous document != the actual document length, and the leading 'P' gets
> read as part of the previous document.
> 
> Turn on the spider.pl debugging verbosity to see each URL, and check the
> accuracy of the encoding and document length of the URI *before*
> 
> http://type2.com/ezmlm-archives/index.cgi?list=type2&cmd=monthbydate&month=201009

I was using the spider.pl default, so I created a config file for the spider with the following...

###########################################
[root@purple /home/search]# more t2.spider.config
@servers = (
        {
            base_url            => 'http://type2.com/ezmlm-archives/index.cgi?list=type2',
            use_default_config  => 1,
            SPIDER_QUIET        => 1,
            email               => 'troy@wical.com',
            delay_sec           => '0',
            max_depth           => '3',
            keep_alive          => '1',
            errors              => '1',
            failed              => '1',
        },
    );
##########################################

I have a similar config file working elsewhere, but perhaps it too is having issues I didn't know about, since I am getting the following errors...

##########################################
[root@purple /home/search]# swish-e -c /home/search/t2.conf -S prog
Indexing Data Source: "External-Program"
Indexing "spider.pl"
External Program found: /usr/local/lib/swish-e/spider.pl
/usr/local/lib/swish-e/spider.pl: ** Warning: config option [errors] is unknown.  Perhaps misspelled?
/usr/local/lib/swish-e/spider.pl: ** Warning: config option [SPIDER_QUIET] is unknown.  Perhaps misspelled?
/usr/local/lib/swish-e/spider.pl: ** Warning: config option [failed] is unknown.  Perhaps misspelled?
/usr/local/lib/swish-e/spider.pl: Reading parameters from 't2.spider.config'
http://type2.com/ezmlm-archives/index.cgi?list=type2:7: error: htmlParseEntityRef: expecting ';'
" title="RSS 2.0" href="http://type2.com/ezmlm-archives/index.cgi?list=type2&cmd
                                                                               ^
http://type2.com/ezmlm-archives/index.cgi?list=type2:7: error: htmlParseEntityRef: expecting ';'
.0" href="http://type2.com/ezmlm-archives/index.cgi?list=type2&cmd=feed&feedtype
                                                                               ^
http://type2.com/ezmlm-archives/index.cgi?list=type2:8: error: htmlParseEntityRef: expecting ';'
 title="Atom 0.3" href="http://type2.com/ezmlm-archives/index.cgi?list=type2&cmd
                                                                               ^
###########################################

Perhaps syntax issues in the config file. I will try and work this out before getting back to the encoding issues.

Troy
_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Wed Oct 27 10:46:03 2010