Skip to main content.
home | support | download

Back to List Archive

Re: Reading the spider report

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Wed Jun 02 2004 - 18:49:01 GMT
On Wed, Jun 02, 2004 at 10:32:24AM -0700, Justin Tang wrote:
>     Duplicates: 796  (1.5/sec)
Count of links extracted that had already be seen.

> MD5 Duplicates:   1  (0.0/sec)
Count of pages that were skipped because their MD5 signature matched
another page.

> Off-site links: 164  (0.3/sec)
Off-site links that were skipped

>        Skipped: 114  (0.2/sec)
Those are links that were skipped for various reasons (may include some
of the ones listed above)

>    Unique URLs: 108  (0.2/sec)
Those are unique URLs that were processed.

>     robots.txt:   3  (0.0/sec)
And those were skipped because robots.txt told the spider to skip them

Take a look at spider.pl if others pop up.


moseley@bumby:~/swish-e/prog-bin$ fgrep '$server->{counts}{' spider.pl.in | perl -pe 's/^\s+/  /'
  my $val = commify( $server->{counts}{$_} );
  commify( $server->{counts}{$_} ),
  $server->{counts}{$_}/$start;
  $server->{counts}{'Connection: Keep-Alive'}++;
  $server->{counts}{'Connection: Close'}++;
  $server->{counts}{'Unique URLs'}++;
  if $server->{max_files} && $server->{counts}{'Unique URLs'} > $server->{max_files};
  "Cnt: $server->{counts}{'Unique URLs'}",
  $server->{counts}{Skipped}++;
  $server->{counts}{'robots.txt'}++;
  $server->{counts}{Skipped}++;
  $server->{counts}{'MD5 Duplicates'}++;
  $server->{counts}{Skipped}++;
  $server->{counts}{Skipped}++;
  $server->{counts}{Skipped}++;
  $server->{counts}{'Off-site links'}++;
  #$server->{counts}{Skipped}++;
  $server->{counts}{Duplicates}++;
  $server->{counts}{'Total Bytes'} += length $$content;
  $server->{counts}{'Total Docs'}++;
  if $server->{max_indexed} && $server->{counts}{'Total Docs'} >= $server->{max_indexed};
  $server->{counts}{'PDF transformed'}++;
  $server->{counts}{'Private Files'}++;


-- 
Bill Moseley
moseley@hank.org
Received on Wed Jun 2 11:49:03 2004