Skip to main content.
home | support | download

Back to List Archive

RE: swish-e future

From: <jmruiz(at)>
Date: Wed Jul 19 2000 - 13:58:26 GMT
Hi Rainer,

On 18 Jul 2000, at 7:34, wrote:

> >2. Add Files to the index file
> >3. Delete Files from the index file
>  maybe usefull in some cases.
>  Add can be implemeted as "index a file and merge indexes" (?) 

Merging can be work fine with small index files. But for large
index files, merging is a very heavy memory proccess: It reads the 
original files into memory an creates a different one. 

> >4. Better XML integration
>  IMO necessary, but not only XML.
>  See other mail from me. - There are new formats upcoming - like wml (WAP).
Sure, we are also working with wml.
I like your idea of implementing something like:
IndexContents   HTML  .html .htm .shtml   .htm.  .html. .shtml.
IndexContents   XML   .xml
IndexContents   WAP   .wap
IndexContents   TXT   .txt .txt.

> >5. Multidocument Files. This will allow to write filters for
> >SQL databases. Needs to define a document separator.
>  ?? sorry, to understand this fully, I need an example.
>  Basically, you can use the filter feature to index e.g. a database.
>  But you need a search (cgi script), which generates a proper URL
>  to retrieves and display this information stored in the db.
Here is a file with two documents (using a line with '---' as a

<meta1>Doc 1</meta1>
<meta2>Some text text</meta2>
<meta1>Doc 2</meta1>
<meta2>Some text text</meta2>

Eg: This can be obtained via a SQL report but, now, using the filter 
option it will be indexed as just one file.
To store this documents in the index file we can add the start 
position in the file to the entry description. So, results can look
like this:

rank filename title start size [props]

start will be 0 for the most normal common case: One document
per file. The problem is backwards compatibility.

> >10. Option to retrive documents with words highlighted
> >in some way.
>  This is to be done in the search cgi script and the output generator...
>  Could be easily done for static html files - but not for dynamic or
>  SSI files (you would need a postprocessor for the searchengine build
>  into the webservers output stream).

You are right. I was thinking on static html or text.

> >12. Stemming modules for non english languages.
>   Ok would be nice, but how to configure?

Perhaps, something like this in config file...
Stemming german  
# For backwards compatibility Stemming Yes will be english

Received on Wed Jul 19 10:01:49 2000