Skip to main content.
home | support | download

Back to List Archive

Re: Re: Document properties

From: Ron Klatchko <ron(at)not-real.library.ucsf.edu>
Date: Thu Sep 02 1999 - 15:57:22 GMT
I wouldn't worry about the efficiency of chaging the endianness of the
data; it's a simple operation and should not add a signfiicant amount of
time to the processing.  Remember that multiple people may work on your
code in the future and optimize it for maintainability.

You also said you decided to store big-endian numbers.  Is this the same
as network order (I can never remember myself).  I would recommend using
network order for the file since that concept is univeral among OSs that
support TCP/IP (which is pretty much everything these days).

moo
----------------------------------------------------------------------
          Ron Klatchko - Manager, Advanced Technology Group           
           UCSF Library and Center for Knowledge Management           
                        ron@library.ucsf.edu                           

On Wed, 1 Sep 1999 STEPHANE_MEIER@Non-HP-Switzerland-om1.om.hp.com wrote:

> Hello all,
> 
> I do not have the time to do the correction and test it, but here is how 
> to procede:
> 
> - Either a) decide which endianness is stored in the file 
>              (=> means a performance penalty for the wrong endianness)
>   means that the byte order is swapped at save AND 
>   load time for the wrong endianness
> 
>   pros: probably makes the change simpler (only one file to modify)
>   cons: performance penalty
> 
>     b) store a flag to say which endianess is actually in the index file
>   pros: no preference for any architecture
>    no performance hit in same architecture
>   cons: need to store a flag in the file to know which architecture 
> created it.
> 
> 
> To fix the code with solution (a) - 
> - I decide that big endian is stored in files (little endian numers are 
> changed to big endian before being stored and retreived)
> - I assume the file <machine/param.h> provides a macro to decide which 
> endianness is currently used.
> - I assume no other file and no other routines than the 2 below have to be 
> fixed.
> - my changes are between /* +FIX */ /* -FIX */
> 
> See below the changes to perform.
> 
> Let me know how it goes if anyone implements it. 
> 
> Regards,
> 
> Stephane
> 
> file docproc.c, beginning of file, add:
> 
> #include <machine/param.h>
> 
> file docprop.c, lines 62 -
> 
> void storeDocProperties(docProperties, fp)
>      struct docPropertyEntry *docProperties;
>      FILE *fp;
> {
>  /*
>   * Dump the document properties into the index file 
>   * The format is:
>   * <PropID:int><PropValueLen:int><PropValue:null-terminated>
>   *   ...
>   * <PropID:int><PropValueLen:int><PropValue:null-terminated>
>   * <EndofList:int>
>   *
>   * The list is terminated with a PropID with a value of zero
>   */
>  short int propID;
>  short int len;
> 
>  while (docProperties != NULL)
>  {
>   /* the length of the property value */
>   len = (short int) strlen(docProperties->propValue);
>   if (len > 0)
>   {
>    /* the ID of the property */
>    propID = (short int) docProperties->metaName;
> 
> /* +FIX */
> #ifndef _BIG_ENDIAN
>    { short int temp;
>     swab(&propID, &temp, sizeof(temp));
>     fwrite(&temp, sizeof(temp), 1, fp); 
>     /* THIS is one statement that causes the problem */
>    }
> #else
>    fwrite(&propID, sizeof(propID), 1, fp); 
>     /* THIS is one statement that causes the problem */
> #endif
> /* -FIX */
>    
> 
> /* +FIX */
> #ifndef _BIG_ENDIAN
>    { short int temp;   
>      swab(&len, &temp, sizeof(len));
>       /* including the length will make retrieval faster */
>       fwrite(&temp, sizeof(temp), 1, fp);
>     /* THIS is one statement that causes the problem */
>    }
> #else
>    /* including the length will make retrieval faster */
>    fwrite(&len, sizeof(len), 1, fp);
>    /* THIS is one statement that causes the problem */
> #endif
> /* -FIX */
> 
>    fwrite(docProperties->propValue, len+1, 1, fp);
> 
>   }
> 
>   docProperties = docProperties->next;
>  }
> 
>  /* set is terminated by a "zero" ID  */
>  propID = 0;
>  fwrite(&propID, sizeof(propID), 1, fp);
> }
> 
> line 102 - 
> static char* readNextDocPropEntry(fp, metaName, targetMetaName)
>       FILE* fp;
>       int* metaName;
>       int targetMetaName;
> {
>  /* read one entry and return it; also set the metaName.
>    * if targetMetaName is zero then return values for all entries.
>   * if targetMetaName is non-zero than only return the value
>   * for the property matching that value.
>   * In all cases, metaName will be zero when the end of the
>   * set is reached.
>   */
>  static char* propValueBuf = NULL;
>  static int propValueBufLen = 0;
> 
>  short int tempPropID;
>  short int len;
>  long propPos; /* file pos */
> 
>  fread(&tempPropID, sizeof(tempPropID), 1, fp);
> /* +FIX */
> #ifndef _BIG_ENDIAN
>  { short int temp = tempPropID;
>   swab(&temp, &tempPropID, sizeof (temp));
>  }
> #endif
> /* -FIX */
>  *metaName = (int) tempPropID;
> 
>  if (tempPropID == 0)
>   return NULL;  /* end of list */
> 
>  /* grab the string length */
>  fread(&len, sizeof(len), 1, fp);
> 
> /* +FIX */
> #ifndef _BIG_ENDIAN
>  { short int temp = len;
>   swab(&temp, &len, sizeof (temp));
>  }
> #endif
> /* -FIX */
>  if ((targetMetaName != 0) && (tempPropID != (short int) targetMetaName))
>  {
>   /* we were looking for something specific, and this is not it */
>   /* move to the next property */
>   propPos = ftell(fp);
>   fseek(fp, propPos+len+1, 0);
>   return "";
>  }
>  else
>  {
>   /* return the value */
>   if (propValueBufLen < len+1)
>   {
>    /* allocate buffer for prop value */
>    /* the buffer will be reused on the next call */
>    propValueBufLen = len+100;
>    propValueBuf = (char *) emalloc(propValueBufLen);
>   }
>   fread(propValueBuf, len+1, 1, fp);
>   return propValueBuf;
>  }
> }
> 
> 
> -----Original Message-----
> From: maxi@chim1.unifi.it [mailto:maxi@chim1.unifi.it]
> Sent: 1 septembre 1999 09:52
> To: swish-e@sunsite.berkeley.edu
> Cc: maxi@chim1.unifi.it
> Subject: [SWISH-E] Re: Document properties
> 
> 
> > > this discussion around the capability to port indexes between NT and
> > > UNIX platforms, if they are created using Document properties feature.
> > > not platform independent (byte order issues)
> 
> It's correct, David... my Unix platforms are Sun Solaris 2.6 (SPARC) and  
> IBM
> AIX (PowerPC) ... both "big endian" systems. Full compatability between 
> those
> computers in any situation, and full compatability with NT (Intel) if 
> "Document
> Properties" is not used. I'm sorry but I am a poor coder and I cannot say 
> more
> about.
> 
> > If it is a byte-order issue, then it should be possible to fix without 
> much
> > headache.
> 
> And it will be GREAT!
> 
> \__Maxi
> 
> 
> 
> 
Received on Thu Sep 2 08:53:54 1999