On 21 Nov 2000, at 4:49, Rainer.Scherg@rexroth.de wrote:
> Hi Jose!
> you have implemented the "read_stream" routine.
> We could use this feature to "scan" the content and
> include a contenttype "MAGIC" (which should be default)
> in the config.
> MAGIC could decide on contentbase, which type of doc
> has to be indexed...
> On HTTP, we could parse the response header to determine the
> content type...
OK for me.
> > IMO summary/description means "title" for html documents. Other
> > documents can have their own summary. So, any reference to title
> > should be removed outside the countwords_HTML routine.
> I don't think so. IMO the definition could look like follow:
> title = <Title>-Tag (or path, see below...)
> Description= <META http-equiv="Description"> | first xx chars of
> title = empty
> Description= first xx chars of file
> similar to HTML (has to be defined)
> IMO we should store an empty title field, if there is no title
> (which means: don't store the filepath twice).This will save space in
> the database.
In fact, this is already done in buildFileEntry (index.c):
len_title=0; /* Flag to indicate that filename
** and title are identical */
} else *p++='\0'; /* Do no store title - Just a 0 */
> On retrieval, an empty title field should be returned as
> "real_path" (URL, or filepath).
Now I am doing an estrdup of the filepath. In readFileEntry (index.c)
uncompress3(len2,p); /* Read length of title */
/* If 0 then filename == title */
if(!len2) /* No title */
buf2 = emalloc(len2);
memcpy(buf2,p,len2); /* Read title */
Received on Tue Nov 21 16:11:49 2000