Skip to main content.
home | support | download

Back to List Archive

Re: Indexing XLS Files

From: <Jeffrey.Grunstein(at)not-real.ny.frb.org>
Date: Fri Jul 18 2003 - 18:13:31 GMT
We are using the prog method, so that's -S prog.
And it is now indexing the spreadsheets so we're not
getting "wrong content type" any more.

We do have Spreadsheet::ParseExcel installed
but I don't think we're using SWISH::Filter.

Here's the config file we're using (I took out a bunch
of servers to maintain security).

It is indexing the XLS files but it's only indexing 2 words
for each of them.  Why?

Thanks.

- Jeff

(See attached file: SwishSpiderConfig.pl)



|---------+---------------------------->
|         |           Bill Moseley     |
|         |           <moseley@hank.org|
|         |           >                |
|         |                            |
|         |           07/18/2003 12:28 |
|         |           PM               |
|         |                            |
|---------+---------------------------->
  >------------------------------------------------------------------------------------------------------------------------------|
  |                                                                                                                              |
  |       To:       Jeffrey.Grunstein@ny.frb.org                                                                                 |
  |       cc:       swish-e@sunsite.berkeley.edu                                                                                 |
  |       Subject:  Re: [SWISH-E] Indexing XLS Files                                                                             |
  >------------------------------------------------------------------------------------------------------------------------------|




[back to the list]

On Fri, Jul 18, 2003 at 11:57:25AM -0400, Jeffrey.Grunstein@ny.frb.org
wrote:
>
> Sorry.  I made a mistake.  It's "wrong content type", not invalid content
> type.

Ok, so a grep finds that in the example SwishSpiderConfig.pl file.
Does that make sense for what you are doing?  Are you using -S prog with
spider.pl?

Or are you using -S http method?  If that's the case then the message
is:

   "Skipping %s:  Wrong content type: %s.\n"

My guess that's it.

And that's because the content type doesn't begin with "text/" or
there is not a filter selected for the URL (which is based on file
extension not content-type which doesn't work very well).

So back to your original problem.

I'll assume you are using -S http and that you are also using the
SWISH::Filter set of modules (since you mentioned
the SWISH::Filter::XLtoHTML module).

Yes, that was correct to add the mime type to that file, although I'm
not sure what you added.  Your message said:

  We got it to work by modifying the XLtoHTML function to include
  both application/vnd.ms-excel and application/vnd.ms-excel.

which look the same to my eyes.

But if you don't have Spreadsheet::ParseExcel installed then
that code will never run.  Do you have that module installed?

--
Bill Moseley
moseley@hank.org







*********************************************************************
Due to deletion of content types excluded from this list by policy,
this multipart message was reduced to a single part, and from there
to a plain text message.
*********************************************************************
Received on Fri Jul 18 18:13:43 2003