Skip to main content.
home | support | download

Back to List Archive

HTML vs. HTML2

From: Don Fike <fike(at)not-real.cs.utk.edu>
Date: Thu Aug 01 2002 - 18:24:21 GMT
Doing indexing with HTML2 I get fewer words indexed than with HTML.
Isn't HTML2 the recommended parser?  Is there a known reason for the
difference?

HTML2 results;

Sorting 3249 words alphabetically
3249 unique words indexed.
5 properties sorted.
15 files indexed.  183645 total bytes.  12468 total words.

HTML results;

Sorting 3331 words alphabetically
3331 unique words indexed.
5 properties sorted.
15 files indexed.  183645 total bytes.  13073 total words.


Thanks,

Don
Received on Thu Aug 1 18:31:37 2002