Thanks for your insight. After looking into the various other indexing
engines it appears that Lucene will best suit our needs. It has no native
html support but almost all of its indexing and searching functions are
accessible programmtically. Thanks for your help!
----- Original Message -----
From: "Bill Moseley" <email@example.com>
To: "Brian Mila" <firstname.lastname@example.org>
Cc: "Multiple recipients of list" <email@example.com>
Sent: Tuesday, February 04, 2003 8:17 AM
Subject: Re: [SWISH-E] Adding extra library functions
> On Mon, 3 Feb 2003, Brian Mila wrote:
> Hi Brian,
> > - Adding an individual file to an index is currently not possible =
> > programmatically. Using the command line I can add a file to a blank =
> > index and then merge that index with the "main" index. =20
> Jose is working on a way to add files to the index. He can give you more
> The current design of the database does not allow updates -- there's a lot
> of processing that happens at the end of indexing to build static tables
> of the index, so those need to be adjusted or rebuilt when new data is
> Removing or changing a file in the index presents even more problems.
> > - Adding metadata to a file is currently only possible with HTML and XML
> > types. I think it is possible though to add metadata to any file by =
> > using -S command line option and using a separate program which could =
> > supply the metadata and the file itself.
> Right, that's the best way.
> > - Searching for text in a file or in the metadata is possible =
> > programmatically and on the command line, but I'm not sure how to search
> > based on create/access dates.
> Look at the -L switch.
> > - Removing a file is currently not possible.
> > - It looks like there is a storage for metadata already reserved for =
> > every file via the StoreDescription structure. I will need to add some
> > functionality to add data to this structure using my proposed library =
> > function. =20
> No, the StoreDescription is for storing the indexing config -- and maps
> tag names to document types. There's a separate .prop file that holds the
> metadata ("Properties"). There's a table in the main index based on file
> number that points into the property table.
> > - The only function I found that removes a file is =
> > remove_last_file_from_list(SWISH *, IndexFILE *) in index.c. This only
> > removes the last file from the index and, judging by the comments, is =
> > only intended to be used to clean up an aborted index operation.
> That's right.
> > Any thoughts, ideas, or comments are appreciated,
> Have you considered other search engines? Swish is simple and very fast,
> but at the expense of some features and scalability.
> MySQL has full text searchinging, and since you have database needs
> (selecting by fields and dates) it might be a better way to go. I haven't
> tried it, but there's mnoGoSearch (http://mnogosearch.org/features.html)
> which looks interesting and provides "Continual indexing". There's many
> others, too. Java? http://jakarta.apache.org/lucene/docs/index.html
> Also see http://www.searchtools.com/tools/tools-opensource.html
> Report back what you find, OK?
> Bill Moseley firstname.lastname@example.org
Received on Fri Feb 7 02:07:37 2003