Skip to main content.
home | support | download

Back to List Archive

Re: changes...

From: <jmruiz(at)not-real.boe.es>
Date: Thu Nov 16 2000 - 19:25:38 GMT
Hi Rainer,

On 16 Nov 2000, at 6:57, Rainer.Scherg@rexroth.de wrote:

> Hi!
> 
> I' m just makeing some changes to get file properties
> (size, last modification date, etc.) and get a little bit
> confused about the structures and data types swish-e is using.
> 
> Right now the structure look like:
> 
> 
> swish.h:
> --------
> /*
>  -- FileProperties (similar to fileinfo)
>  -- Structure uses as store for information about a file to be
>  indexed... -- Unused items may be NULL (e.g. if File is not opened, f
>  == NULL)
> */
> 
> typedef struct  {
>         FILE   *fp;     /* may be also a filter stream or NULL if not
>         opened
> */
>         char   *path;           /* path to file to index (may be tmp)
>         */ char   *virt_path_url;  /* org. path/URL to indexed file */
>         long   fsize;           /* size of the original file (not
>         filtered)
> */
>         time_t mtime;           /* size of last mod of or. file */ }
> FileProp;
> 
> 
> Also included could be  DocType, indextitleonly, and some other flags
> for this file (to be discussed...).  This would make the subroutines
> interfaces leaner and less complicated to handle...
> 

Sounds good to me.

> -----------
> 
> but as im going forward to do the coding, i realize that some of the
> information is also being stored in other structures.
> 
> 
> e.g. file fs.c (only some essential parts):
> ---------------
> /* Indexes the words in the file
> */
> 
> void printfile(struct SWISH *sw, struct docentry *e)
> {
> int wordcount;
> --deleted-- FILE *fp;
> char *s;
> char *filterprog;
> char *filtercmd;
> int DocType;
> FileProp *fprop;  
> 
>  [...]
> 
>     fprop = fs_file_properties ((char *)e->filename);
>     if (! fprop) progerr ("Failed to alloc memory....");
BTW, there is no need for this line because in mem.h, if
no memory is available, the program exits.

> 
>     if ((filterprog = hasfilter (e->filename,sw->filterlist)) != NULL)
>     {
>             filtercmd=emalloc(strlen(filterprog)+3+strlen(e->filename)
>             +1); sprintf(filtercmd, "%s
>             \'%s\'",filterprog,e->filename); fprop->fp = popen
>             (filtercmd,"r");
>     } else {
>             fprop->fp = fopen(e->filename, "r" );
>     }
> 
> 
> 
>     if (fprop->fp) {
>              /* 08/00 Jose Ruiz */
>              /* get Doc Type as is in IndexContents or Defaultcontents
>              */
>        if((DocType=getdoctype(e->filename,sw->indexcontents))==
>        NODOCTYPE
>            && sw->DefaultDocType!=NODOCTYPE)
>                  DocType=sw->DefaultDocType;
> 
>                  switch(DocType)
>                  {
>                        case TXT:
>                              if(sw->verbose == 3) printf(" - Using TXT
> filter - ");
>                                    wordcount = countwords_TXT(sw, fp,
> e->filename, e->title,  (isoksuffix(e->filename, sw->nocontentslist)
> && (sw->nocontentslist != NULL)));
> 
>     [...]
>                   }
> 
> -------
> the countwords interface has to be changed to get less parameters and
> more structures types. The questions ar what should we pass to this
> routine and is this code the right one?
> 
> My proposal:
> 
>    The code within the switch-statement is similar or equal to all
>    indexing
> methods
>    (at this moment file_system and http).
> 
>    We should have a common routine "index_file ()", which call the
> appropriate
>    indexing-routine "countwords_ XML, HTML, TXT, etc."
> 

I agree, it is easier to handle and maintain.

> 
> the interface could look like
> 
> int index_file ( (struct SWISH *)sw, FileProp *fprop, whatelse...) or
> int index_file (SWISH *sw, FileProp *fprop, ....)
> 

Sounds good to me.

> 
> ----------
> 
> Routine:
>  int countwords(sw, vp, filename, title, indextitleonly)
>    struct SWISH *sw;
>    void *vp;
>    char *filename;
>    char *title;
>    int indextitleonly; 
> 
> can anybody explain, why countwords_XXX gets a "title" as parameter?
> because this could be the same as filename (if a title is missing in
> th document)?
> 

The reason is very simple: I copied countwords into 
countwords_XXX and, then I changed them (cut and paste). Anyway, 
you are right, XML and TXT do not need title.

> The "indextitleonly" could be a flag in the structure FileProp...
> 

OK for me.

BTW, now, I am changing  the struct definitions to typedefs, adding 
the fixes from Bas to compile in IRIX, and some other minor work.

cu
Jose
Received on Thu Nov 16 19:27:13 2000