On Sun, 13 Apr 2003, Greg Ford wrote:
> over a corpus of around 2000 html files (legal decisions)
> and then search for the word "dismissed"=20
> (which should be near the end of around half of all the files) i.e.:
> swish-e -f docs.idx -w dismissed
> I get only 79 hits...
> The indexing process is apparently only indexing the start of the files?
Nobody reads to the end of legal documents, so this is an optimization.
Find a document where words are not being indexed at the end and search
for the word during indexing:
swish-e -c myconfig -i test.doc -T indexed_words | grep mainpernable
swish-e -c myconfig -i test.doc -T indexed_words -v0 > words.out
> ( Q. 2.)
> One day ... can you rename swish-e/src/string.h ?
> My application which is based on the libswishe currently has to=20
> #include "../swish-e/src/swish-e.h"
> I would rather add the swish-e/src directory to the
> global search path (and #include <swish-e.h>) - but I can't! with VC++.
Alredy done in current cvs:
moseley@bumby:~/swish-e/src$ ls *string.h
> Why do you call the -L (limit) option "experimental"?
> I'm keen to use it, I have a fairly static/well defined corpus of HTML =
It's labeled "experimental" so when someone complains about it we can say
There was debate about the command syntax, plus the way it is implemented
makes its very existence questionable. For each property limited it loads
an integer table into RAM the size of total number of files in the
database and has to sort it twice. It's not a very scalable design.
> Is anyone urgently waiting for a working ActiveX wrapper for Swish-e ?
> I've got mine working quite well now, (based on Swish-e 2.3.4).=20
> This is similar to SwishEx in principle, but an entirely new codebase.=20
> server-side ASP=20
What work happens on the client-side?
Bill Moseley email@example.com
Received on Mon Apr 14 14:07:28 2003