Skip to main content.
home | support | download

Back to List Archive

Re: Adding extra library functions

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Tue Feb 04 2003 - 14:18:03 GMT
On Mon, 3 Feb 2003, Brian Mila wrote:

Hi Brian,

> - Adding an individual file to an index is currently not possible =
> programmatically.  Using the command line I can add a file to a blank =
> index and then merge that index with the "main" index. =20

Jose is working on a way to add files to the index.  He can give you more
info.

The current design of the database does not allow updates -- there's a lot
of processing that happens at the end of indexing to build static tables
of the index, so those need to be adjusted or rebuilt when new data is
added.

Removing or changing a file in the index presents even more problems.

> - Adding metadata to a file is currently only possible with HTML and XML =
> types.  I think it is possible though to add metadata to any file by =
> using -S command line option and using a separate program which could =
> supply the metadata and the file itself.

Right, that's the best way.

> - Searching for text in a file or in the metadata is possible =
> programmatically and on the command line, but I'm not sure how to search =
> based on create/access dates.

Look at the -L switch.

> - Removing a file is currently not possible.

Right.

> - It looks like there is a storage for metadata already reserved for =
> every file via the StoreDescription structure.  I will need to add some =
> functionality to add data to this structure using my proposed library =
> function. =20

No, the StoreDescription is for storing the indexing config -- and maps
tag names to document types.  There's a separate .prop file that holds the
metadata ("Properties").  There's a table in the main index based on file
number that points into the property table.

> - The only function I found that removes a file is =
> remove_last_file_from_list(SWISH *, IndexFILE *) in index.c.  This only =
> removes the last file from the index and, judging by the comments, is =
> only intended to be used to clean up an aborted index operation.

That's right.

> Any thoughts, ideas, or comments are appreciated,

Have you considered other search engines?  Swish is simple and very fast,
but at the expense of some features and scalability.

MySQL has full text searchinging, and since you have database needs
(selecting by fields and dates) it might be a better way to go.  I haven't
tried it, but there's mnoGoSearch (http://mnogosearch.org/features.html)
which looks interesting and provides "Continual indexing".  There's many
others, too.  Java? http://jakarta.apache.org/lucene/docs/index.html

Also see http://www.searchtools.com/tools/tools-opensource.html

Report back what you find, OK?

Thansk,



-- 
Bill Moseley moseley@hank.org
Received on Tue Feb 4 14:18:24 2003