Skip to main content.
home | support | download

Back to List Archive

Re: [swish-e] all URLs

From: Alexander Dolgarev <a.dolgarev(at)not-real.gmail.com>
Date: Mon Feb 04 2008 - 14:20:08 GMT
The same, note that there are no 'MD5 Duplicates', there are a lot of
'Duplicates', but I think problem is not in duplicates.
Maybe the problem is in 'Content-Encoding: gzip' or in 'Content-Type:
chunked'? I've tryed on various sites, some of them are being indexed
successfully, others are not, the only difference I see is in these
HTTP Response headers.

On Feb 4, 2008 4:03 PM, Peter Karman <peter@peknet.com> wrote:
>
>
> On 02/04/2008 07:16 AM, Alexander Dolgarev wrote:
> > Yet again, I've reinstalled swish-e (version 2.4.5) and have the same
> > effect (or defect):
> >
> > Summary for: <SOME_URL>
> >      Connection: Close:      3  (0.0/sec)
> > Connection: Keep-Alive:    224  (1.2/sec)
> >             Duplicates:     60  (0.3/sec)
> >         Off-site links:     14  (0.1/sec)
> >            Total Bytes: 74,442  (402.4/sec)
> >             Total Docs:    226  (1.2/sec)
> >            Unique URLs:    227  (1.2/sec)
> >              text/html:      1  (0.0/sec)
> > All files are suggested by spider.pl to be duplicates. Note that now
> > I've tried also on 3rd party site. Any suggestions?
> >
>
> try setting 'use_md5' to false ?
>
> --
>
> Peter Karman  .  peter(at)not-real.peknet.com  .  http://peknet.com/
>
> _______________________________________________
> Users mailing list
> Users@lists.swish-e.org
> http://lists.swish-e.org/listinfo/users
>
_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Mon Feb 4 09:20:10 2008