Skip to main content.
home | support | download

Back to List Archive

Extracting titles and correct authors out of Excel docs

From: Kissman, Paul (BLC) <Paul.Kissman(at)not-real.state.ma.us>
Date: Thu Dec 16 2004 - 15:09:43 GMT
Is there a way to get Swish-e to find and index the document properties
in Excel spreadsheets?
We are putting titles and authors in the document properties, but they
are inaccessible.

I am using the XLtoHTML filter, which doesn't appear to have a way of
doing any of these things. 

TITLES:

Because we are spidering with -S prog all our titles in Swish-e look
like

Library Chart - /tmp/sG36QWw6iW v.1536

which I gather is from this line in XLtoHTML

$ExcelFirstWorksheetName - $ExcelFilename v.$ExcelVersion

and since the filter is reading a temporary copy of the file, it loses
the actual path and filename.

I would really like to get the title out of the document properties; it
differs quite a bit from the filename.

AUTHORS:

Author appears to be coming for the "Last Saved by" value in the
spreadsheets document properties (statistics), not the Author in
Properties | Document Summary. Thes are clearly not going to be the same
people much of the time.

Some threads recommend using the Win32::OLE Perl module to grab some of
this data, but we are on Unix, and apparently that modlule only works
for Windows versions of Perl. So I have not tried to see if this would
work.

pjk

Paul J. Kissman
Library Information Systems Specialist
Massachusetts Board of Library Commissioners
648 Beacon St.
Boston, MA  02215
paul.kissman@state.ma.us
mass.gov/mblc or mblc.state.ma.us
617-267-9400 * 800-952-7403 (in-state)
Fax: 617-421-9833
Received on Thu Dec 16 07:09:56 2004