Skip to main content.
home | support | download

Back to List Archive

Re: Stats

From: John Angel <angel_john(at)not-real.hotmail.com>
Date: Sun Dec 21 2003 - 17:27:27 GMT
I did not index localhost.

What do you mean by user error?

I am using spider.config
keep_alive  => 1
delay_secs  => 0
use_md5 => 1

Tried both HTML and HTML* parser type, it is the same speed.

Is there anything else I can do to speed up things?


----- Original Message ----- 
From: "Bill Moseley" <moseley@hank.org>
To: "John Angel" <angel_john@hotmail.com>
Cc: "Multiple recipients of list" <swish-e@sunsite.berkeley.edu>
Sent: Sunday, December 21, 2003 17:26
Subject: Re: [SWISH-E] Stats


> On Sun, Dec 21, 2003 at 06:31:47AM -0800, John Angel wrote:
> > When indexing the same site, containing 600 pages, using the same
settings
> > for both indexers (persistent connection and md5 check):
> >
> > - swish-e indexing time: 70 minutes
> > - htdig indexing time: 3 minutes
> >
> > Any ideas why's that?
>
> Yes, I do.  User error.  And, a failing grade for not showing your work,
> again.
>
>
> Summary for: http://localhost/doc/
>     Duplicates:     5,193  (324.6/sec)
> Off-site links:       156  (9.8/sec)
>        Skipped:         2  (0.1/sec)
>    Total Bytes: 1,897,654  (118603.4/sec)
>     Total Docs:       600  (37.5/sec)
>    Unique URLs:       601  (37.6/sec)
> Removing very common words...
> no words removed.
> Writing main index...
> Sorting words ...
> Sorting 5,967 words alphabetically
> Writing header ...
> Writing index entries ...
>   Writing word text: Complete
>   Writing word hash: Complete
>   Writing word data: Complete
> 5,967 unique words indexed.
> 4 properties sorted.
> 600 files indexed.  1,897,654 total bytes.  122,211 total words.
> Elapsed time: 00:00:17 CPU time: 00:00:02
> Indexing done!
>
> Still, htdig is likely faster at indexing.  Thus, I would recommend that
> you use htdig.
>
>
>
> -- 
> Bill Moseley
> moseley@hank.org
>
>
Received on Sun Dec 21 17:27:33 2003