Skip to main content.
home | support | download

Back to List Archive

Re: [swish-e] Fw: Re: Searching remote mail archive problem

From: Peter Karman <peter(at)not-real.peknet.com>
Date: Thu Mar 20 2008 - 14:48:19 GMT
On 03/20/2008 08:10 AM, Xinchun Tian wrote:
> Hi Peter,
> 
> Sorry for sending this mail again, could you please tell me any news on the
> following problems? Thanks!

There will be no news until you provide a failing test case. That means, a config and a
html file that reproduces your problem.

Otherwise, we are just making stuff up.

pek


> ---------- Forwarded Message -----------
> From: "Xinchun Tian" <tianxc@mail.ihep.ac.cn>
> To: peter@peknet.com
> Sent: Fri, 7 Mar 2008 19:31:29 +0800
> Subject: Re: [swish-e] Searching remote mail archive problem
> 
> Hi Peter,
> 
> I am sorry that I did not make me clear. What I mean is that the hypermail
> archive is private, so I just send the configure file to you and Bill instead
> of the public swish-e mail list.
> 
> Best Regards,
> 
> Xinchun
> 
> From: "Xinchun Tian" <tianxc@mail.ihep.ac.cn>
> To: peter@peknet.com
> Cc: moseley@hank.org
> Sent: Fri, 7 Mar 2008 10:49:18 +0800
> Subject: Re: [swish-e] Searching remote mail archive problem
> 
> Hi Peter,
> 
> Since the hypermail archive is private, so I can not distribute these files to
> the public mail list, sorry for that.
> 
> swish.conf:
> =============================================================================
> IndexDir spider.pl
> SwishProgParameters spider.conf
> IndexOnly .htm .html .txt .pdf .doc .ppt .xml .tex .eps .ps .log .jpg
> IndexContents TXT* .txt
> DefaultContents HTML*
> ParserWarnLevel 9
> =============================================================================
> 
> spider.conf: 
> =============================================================================
> my %eng = (
>     email       => 'tianxc@ihep.ac.cn',
>     base_url    => 'https://www.lbl.gov/lists.archives/theta13-eng.archive/',
>     delay_sec   => '0',
>     max_depth   => '1',
>     credentials => 'dayabay:3quarks'
> );
> 
> my %offline = (
>     email       => 'tianxc@ihep.ac.cn',
>     base_url    => 'https://www.lbl.gov/lists.archives/theta13-offline.archive/',
>     delay_sec   => '0',
>     max_depth   => '1',
>     credentials => 'dayabay:3quarks'
> );
> 
> @servers = ( \%eng, \%offline );
> 1;
> =============================================================================
> 
> Thanks and Best Regards,
> 
> Xinchun
> 
>> Date: Thu, 06 Mar 2008 09:10:13 -0600
>> From: Peter Karman <peter@peknet.com>
>> Subject: Re: [swish-e] Searching remote mail archive problem
>> To: Swish-e Users Discussion List <users@lists.swish-e.org>
>> Message-ID: <47D00955.6090706@peknet.com>
>> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>>
>> Tian Xinchun wrote on 3/6/08 2:16 AM:
>>> Hi Bill,
>>>
>>> Thanks for your help, See below.
>>>
>>>> ------------------------------
>>>>
>>>> Message: 6
>>>> Date: Wed, 5 Mar 2008 06:11:42 -0800
>>>> From: Bill Moseley <moseley@hank.org>
>>>> Subject: Re: [swish-e] Searching remote mail archive problem
>>>> To: Swish-e Users Discussion List <users@lists.swish-e.org>
>>>> Message-ID: <20080305141142.GA6428@hank.org>
>>>> Content-Type: text/plain; charset=utf-8
>>>>
>>>> On Wed, Mar 05, 2008 at 08:03:06PM +0800, Tian Xinchun wrote:
>>>>> Hi Peter?
>>>>>
>>>>> I am sorry that I can not quite understand what you mean. Taking a example:
>>>>>
>>>>> $swish-e -c swish.conf -S prog
>>>>> Indexing Data Source: "External-Program"
>>>>> Indexing "spider.pl"
>>>>> External Program found: /usr/local/lib/swish-e/spider.pl
>>>>> /usr/local/lib/swish-e/spider.pl: Reading parameters from 'spider.conf'
>>>>> https://www.lbl.gov/lists.archives/theta13-eng.archive/:1: error:
>>>>> htmlParseStartTag: invalid element name
>>>>> <?xml version="1.0" encoding="ISO-8859-1"?>
>>>>>  ^
>>>>> https://www.lbl.gov/lists.archives/theta13-eng.archive/:2: error: Misplaced
>>>>> DOCTYPE declaration
>>>>> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
>>>>> ^
>>>> You have two errors.  That first one above is simply saying you are
>>>> trying to index an xml document with Libxml's *html* parser.
>>>> So you need to use the XML* parser type.
>>>>
>>> Actually, I have tried using XML*, but I still got the same error messages.
>>> Thanks for the information, and any plan on fixing it.
>>>
>> If you can provide us with a small, reproduce-able test case, then 
>> we can attempt to fix the problem.
>>
>> An example document and config file is all you should need to send.
>> -- 
>> Peter Karman  .  http://peknet.com/  .  peter(at)not-real.peknet.com
>>
> ------- End of Forwarded Message -------
> 
> 
> ====================================================         
>                      Dr. Xinchun Tian
> Room A601, Mobile: 13426390768
> Experimental Physics Center, IHEP, CAS
> Beijing, 100049
> Homepage: http://viviseayu.bb.iyaya.com/index.php
> ====================================================
> 

-- 
Peter Karman  .  peter(at)not-real.peknet.com  .  http://peknet.com/

_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Thu Mar 20 10:48:19 2008