Skip to main content.
home | support | download

Back to List Archive

changes...

From: <Rainer.Scherg(at)not-real.rexroth.de>
Date: Thu Nov 16 2000 - 14:57:43 GMT
Hi!

I' m just makeing some changes to get file properties
(size, last modification date, etc.) and get a little bit
confused about the structures and data types swish-e is using.

Right now the structure look like:


swish.h:
--------
/*
 -- FileProperties (similar to fileinfo)
 -- Structure uses as store for information about a file to be indexed...
 -- Unused items may be NULL (e.g. if File is not opened, f == NULL)
*/

typedef struct  {
        FILE   *fp;     /* may be also a filter stream or NULL if not opened
*/
        char   *path;           /* path to file to index (may be tmp) */
        char   *virt_path_url;  /* org. path/URL to indexed file */
        long   fsize;           /* size of the original file (not filtered)
*/
        time_t mtime;           /* size of last mod of or. file */
} FileProp;


Also included could be  DocType, indextitleonly, and some other flags for
this file
(to be discussed...).  This would make the subroutines interfaces leaner and
less
complicated to handle...

-----------

but as im going forward to do the coding, i realize that some of the
information
is also being stored in other structures.


e.g. file fs.c (only some essential parts):
---------------
/* Indexes the words in the file
*/

void printfile(struct SWISH *sw, struct docentry *e)
{
int wordcount;
--deleted-- FILE *fp;
char *s;
char *filterprog;
char *filtercmd;
int DocType;
FileProp *fprop;  

 [...]
      
    fprop = fs_file_properties ((char *)e->filename);
    if (! fprop) progerr ("Failed to alloc memory....");

    if ((filterprog = hasfilter (e->filename,sw->filterlist)) != NULL) {
            filtercmd=emalloc(strlen(filterprog)+3+strlen(e->filename)+1);
            sprintf(filtercmd, "%s \'%s\'",filterprog,e->filename);
            fprop->fp = popen (filtercmd,"r");
    } else {
            fprop->fp = fopen(e->filename, "r" );
    }



    if (fprop->fp) {
             /* 08/00 Jose Ruiz */
             /* get Doc Type as is in IndexContents or Defaultcontents */
       if((DocType=getdoctype(e->filename,sw->indexcontents))== NODOCTYPE
           && sw->DefaultDocType!=NODOCTYPE)
                 DocType=sw->DefaultDocType;

                 switch(DocType)
                 {
                       case TXT:
                             if(sw->verbose == 3) printf(" - Using TXT
filter - ");
                                   wordcount = countwords_TXT(sw, fp,
e->filename, e->title,  (isoksuffix(e->filename, sw->nocontentslist) &&
(sw->nocontentslist != NULL)));

    [...]
                  }

-------
the countwords interface has to be changed to get less parameters and more
structures types.
The questions ar what should we pass to this routine and is this code the
right one?

My proposal:

   The code within the switch-statement is similar or equal to all indexing
methods
   (at this moment file_system and http).

   We should have a common routine "index_file ()", which call the
appropriate
   indexing-routine "countwords_ XML, HTML, TXT, etc."


the interface could look like

int index_file ( (struct SWISH *)sw, FileProp *fprop, whatelse...)
or
int index_file (SWISH *sw, FileProp *fprop, ....)


----------

Routine:
 int countwords(sw, vp, filename, title, indextitleonly)
   struct SWISH *sw;
   void *vp;
   char *filename;
   char *title;
   int indextitleonly; 

can anybody explain, why countwords_XXX gets a "title" as parameter?
because this could be the same as filename (if a title is missing in th
document)?

The "indextitleonly" could be a flag in the structure FileProp...


Any sugesstions?

cu - rainer





----------------------------------------------------------------------
This Mail has been checked for Viruses
Attention: Encrypted Mails can NOT be checked !

* * *

Diese Mail wurde auf Viren ueberprueft
Hinweis: Verschluesselte Mails koennen NICHT geprueft werden !
----------------------------------------------------------------------
Received on Thu Nov 16 14:59:14 2000