Skip to main content.
home | support | download

Back to List Archive

Re: [swish-e] Searching remote mail archive problem

From: Peter Karman <peter(at)not-real.peknet.com>
Date: Mon Mar 03 2008 - 01:32:09 GMT
Xinchun Tian wrote on 3/2/08 7:38 AM:
> Hi Peter,
> Thanks for your help, but the problem still does not resolved.
> Similiar errors also includes:
> When indexing: https://www.lbl.gov/lists.archives/theta13-offline.archive/:
> Warning: Unknown header line: 'https://www.lbl.gov/lists.archives/theta13-offline.archive/author.html' from program spider.plerr: External program failed to return required headers Path-Name:
> or https://www.lbl.gov/lists.archives/theta13-eng.archive/:
> Warning: Unknown header line: 'ive/author.html' from program spider.plerr: External program failed to return required headers Path-Name:
> and other similiar error messages. It seems to me that spider.pl does not parse the hypermail archive correctly. Any help?

The issue is that one doc breaks the indexer's sense of content length,
and swish-e can't recover its place afterwards. Often this is a case of encoding
not being reported correctly,
but it can also be other issues.

Find the first doc that reports the 'Unknown header line' and then look at the
doc that was indexed
just before it. The one before the errors start is your culprit.

-- 
Peter Karman  .  http://peknet.com/  .  peter(at)not-real.peknet.com
_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Sun Mar 2 20:32:15 2008