On Thu, 31 May 2001, Bill Moseley wrote:
> So to answer your question, yes, it will be easy to modify the script to
> read from an external file. But implementing that will be, as they say,
> left to the reader.
Well, at least the files I'm searching are plain text, so I don't have to
worry about HTML tags.
> >This way, people can use your example script no matter how the original
> >document is stored (text file, Berkeley DB, MySQL, etc.).
> Philip, are you indexing files in a MySQL database using -S prog? If so,
> I'd like to see the script you are using. There isn't an example yet in
> the swish distribution for indexing from a RDBMS.
Here you go. The attached sql.pl file is used in -S prog. It calls some
utility functions from db.pl as well: query() executes a MySQL query and
returns the statement handle, and encode() changes <>& to <>& so
that it doesn't get messed up in XML.
sql.pl executes a SELECT query on MySQL and returns each file in XML
format (these are e-mail messages being indexed):
from_name, from_email, subject and message are separately searchable
Content-Length is a tricky part; if it's even off by one, the -S prog
thing will mess up of course.
Last-Mtime is used to store the date of the e-mail message.
Path-Name is used to store the "num" of the e-mail message (the primary
key in the MySQL table for the message). When I use swish to search, I get
back a list of "num"s which I can then use to query the MySQL database to
retrieve the whole message.
-Philip Mak (email@example.com)
Received on Thu May 31 15:03:03 2001