Skip to main content.
home | support | download

Back to List Archive

Re:

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Fri Feb 11 2005 - 19:03:24 GMT
On Fri, Feb 11, 2005 at 10:52:14AM -0800, Shaffer, Chris wrote:
> Hi...  I've gotten swish-e (using spider.pl) to crawl a couple of our
> intranet sites.  The filters seem to be working okay for excel.  And it
> seems to be looking at word documents.  However, (using swish.cgi), I
> don't get any descriptions for those word docs.

..

> Any idea where I can look?  I have no idea where to begin digging.

Sure.  spider.pl just writes to stdout, so you can run it on a few
test docs and see what it outputs.  Do it on a file that generates
a description and then another that doesn't and compare.

> StoreDescription HTML* <body> 200000

Make sure in the spider.pl output that the document's header is indeed
HTML*

$ SPIDER_QUIET=1 /usr/local/lib/swish-e/spider.pl default http://localhost/apache/test.doc  | head
Path-Name: http://localhost/apache/test.doc
Content-Length: 1713
Last-Mtime: 1108148269
Document-Type: TXT*

That's saying the document is TXT*, so you would need to add another
StoreDescription line for TXT*

-- 
Bill Moseley
moseley@hank.org

Unsubscribe from or help with the swish-e list: 
   http://swish-e.org/Discussion/

Help with Swish-e:
   http://swish-e.org/current/docs
   swish-e@sunsite.berkeley.edu
Received on Fri Feb 11 11:03:25 2005