Skip to main content.
home | support | download

Back to List Archive

Re: Document properties - code sample

From: Mark Gaulin <gaulin(at)not-real.globalspec.com>
Date: Tue Sep 07 1999 - 17:20:45 GMT
Hi
Great! It looks like the problem I introduced into the index formats has a
good fix. 

For the "official" release I would like to suggest something a little
easier to maintain... why not create a small set of routines to read &
write the basic data types to & from a file stream.  The might look like this:

void indexStreamWriteShortInt(fp, val)
FILE* fp;
short int val;
{
	short int safeVal = htons(val);
	fwrite(&safeVal, sizeof(safeVal), 1, fp);
}

and also

short int indexStreamReadShortInt(fp)
FILE* fp;
{
	short int safeVal;
	short int realVal;
	fread(&safeVal, sizeof(safeVal), 1, fp);

	realVal = ntohs(safeVal); /* Convert from network format to host */
	return realVal;
}

These two functions would be used in the DocProperties.c file but could
also be used in other parts of the program.  Also, it means you could
change the way endian-ness is handled and only a couple of functions have
to change.

Ever wonder why the rest of the index seems to be endian-safe? The answer
is because the printindex() function uses a pair of read/write functions
(decompress() and compress()) to store (long?) integer values in the file
stream.  I don't exactly know what those functions do but the name suggests
that it does some kind of compression, which it looks like it may do based
the magnitude of the number being written.)  To care my idea forward, these
two function could be renamed to indexStreamWriteCompressedInt() and
indexStreamReadCompressedInt().  (You could even leave the "Compressed"
part our of the name and then all of the Ints written to the index file
would be automatically compressed, for "free".)

The point is that the idea of an "index stream" as something that is
written to and read from using a small set of standard functions is a good
thing, especially when you care about 1) cross-platform support and 2)
maintenance of the code.

(This is where object-oriented languages come in real handy, but even
without C++ good file and function naming will let you get much of the
benefit using just regular C.)

My 2 cents.

	Mark

At 05:56 AM 9/3/99 -0700, you wrote:
>Hello  all,
>
>Here is the piece of code from docprop.c (ver 1.3.2) that was modified to 
>store and restore the non-portable shorts in network order. It's way more 
>elegant than the peice before. Thanks to moo. It was compiled but not 
>tested. A file must be included for it to work:
>
>#include <netinet/in.h>
>
>docprop.c lines 64 - 161
>
>void storeDocProperties(docProperties, fp)
>     struct docPropertyEntry *docProperties;
>     FILE *fp;
>{
> /*
>  * Dump the document properties into the index file 
>  * The format is:
>  * <PropID:int><PropValueLen:int><PropValue:null-terminated>
>  *   ...
>  * <PropID:int><PropValueLen:int><PropValue:null-terminated>
>  * <EndofList:int>
>  *
>  * The list is terminated with a PropID with a value of zero
>  */
> short int propID;
> short int len;
> short int net_len; /* length ordered in network order  */
>
> while (docProperties != NULL)
> {
>  /* the length of the property value */
>  len = (short int) strlen(docProperties->propValue);
>  if (len > 0)
>  {
>   /* the ID of the property */
>   propID = htons((short int) docProperties->metaName);
>    /* Convert from host to network format */
>
>   fwrite(&propID, sizeof(propID), 1, fp);
>   /* including the length will make retrieval faster */
>   net_len = htons(len); /* Convert from host to network format */
>   fwrite(&net_len, sizeof(net_len), 1, fp);
>
>   fwrite(docProperties->propValue, len+1, 1, fp);
>  }
>
>  docProperties = docProperties->next;
> }
>
> /* set is terminated by a "zero" ID  */
> propID = 0;
> fwrite(&propID, sizeof(propID), 1, fp);
>}
>
>static char* readNextDocPropEntry(fp, metaName, targetMetaName)
>      FILE* fp;
>      int* metaName;
>      int targetMetaName;
>{
> /* read one entry and return it; also set the metaName.
>   * if targetMetaName is zero then return values for all entries.
>  * if targetMetaName is non-zero than only return the value
>  * for the property matching that value.
>  * In all cases, metaName will be zero when the end of the
>  * set is reached.
>  */
> static char* propValueBuf = NULL;
> static int propValueBufLen = 0;
>
> short int tempPropID;
> short int len;
> long propPos; /* file pos */
>
> fread(&tempPropID, sizeof(tempPropID), 1, fp);
> tempPropID = ntohs(tempPropID); /* Convert from network format to host */
>
> *metaName = (int) tempPropID;
>
> if (tempPropID == 0)
>  return NULL;  /* end of list */
>
> /* grab the string length */
> fread(&len, sizeof(len), 1, fp);
>  len = ntohs(len); /* Convert from network format to host */
>
> if ((targetMetaName != 0) && (tempPropID != (short int) targetMetaName))
> {
>  /* we were looking for something specific, and this is not it */
>  /* move to the next property */
>  propPos = ftell(fp);
>  fseek(fp, propPos+len+1, 0);
>  return "";
> }
> else
> {
>  /* return the value */
>  if (propValueBufLen < len+1)
>  {
>   /* allocate buffer for prop value */
>   /* the buffer will be reused on the next call */
>   propValueBufLen = len+100;
>   propValueBuf = (char *) emalloc(propValueBufLen);
>  }
>  fread(propValueBuf, len+1, 1, fp);
>  return propValueBuf;
> }
>}
>
>-----Original Message-----
>From: kg9ae@geocities.com [mailto:kg9ae@geocities.com]
>Sent: 3 septembre 1999 04:35
>To: swish-e@sunsite.berkeley.edu
>Cc: kg9ae@geocities.com
>Subject: [SWISH-E] Re: Document properties
>
>
>>> big endian without a formal proof - anyone to confim this?
>> I don't know, it's not in the man page, and no one I ask knows.  If you
>> want to write portable code, use network order.  It will be easier to
>
>Here is documentation on the network byte-order conversion functions in the
>GNU C Library.
>
>http://www.gnu.org/manual/glibc-2.0.6/html_node/libc_200.html
>
>_BIG_ENDIAN and _LITTLE_ENDIAN are macros provided by many compilers.  They
>shouldn't be necessary if the htonl, htons, ntohl, and ntohs functions are
>usable across platforms.
>
>As a note: The network byte order is big endian.  Intel 16/32-bit 80x86 
>CPUs
>are little endian; SUN SPARCs and Compaq Alpha are big endian; Motorola/IBM
>PowerPC, Intel ArmStrong (8096x), Intel IA-64 (64-bit Pentium or whatever),
>and other newer CPUs are swappable between big or little endian in hardware
>depending on the operating system's requirements.  (Furthermore, 64-bit
>Linux runs as little endian on IA-64 while it runs as big endian on Alpha
>and SPARC.)
>
>,David Norris
>
>World Wide Web - http://www.webaugur.com/dave
>Page via mail - 412039@pager.mirabilis.com
>ICQ Universal Internet Number - 412039
>E-Mail - dave@webaugur.com
> 
Received on Tue Sep 7 10:17:09 1999