Skip to main content.
home | support | download

Back to List Archive

Re: [swish-e] all URLs

From: Peter Karman <peter(at)not-real.peknet.com>
Date: Mon Feb 04 2008 - 14:09:26 GMT
On 02/03/2008 12:53 PM, Alexander Dolgarev wrote:
> I have a problem with spider.pl. When I run
> /usr/local/lib/swish-e/spider.pl default <SOME_URL> | swish-e -c
> swish.conf -S prog -i stdin -f test
> I've become a lot of following messages:
> Warning: document 'XXX' has no content
> When I look at created index-file I see that only document <SOME_URL>
> was indexed, ALL other URLs (that were in this document) were not
> indexed. Log files on the HTTP server shows that spider.pl retrieves
> URLs and becomes responses, e.g:
> [03/Feb/2008:18:46:43 +0100] <XXX> GET /XXX HTTP/1.1 "200" 14758
> "swish-e http://swish-e.org/" "-"	18
> That means that 14758 bytes was sent to the spider.pl for URL <XXX>,
> but spider.pl says: Warning: document 'XXX' has no content

I assume you have turned on debugging?
http://swish-e.org/docs/spider.html#item_debug

-- 
Peter Karman  .  peter(at)not-real.peknet.com  .  http://peknet.com/

_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Mon Feb 4 09:09:29 2008