Skip to main content.
home | support | download

Back to List Archive

Re: <swishdecription> returning blank?

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Thu Oct 28 2004 - 14:29:38 GMT
On Thu, Oct 28, 2004 at 10:13:31AM -0400, Antonio Barrera wrote:
> Would this apply similarly to using xpdf to parse PDF docs?
> 
> IndexContents HTML* .htm .html .shtml .php
> IndexContents TXT*  .txt .log .text .pdf
> IndexContents XML*  .xml
> 
> StoreDescription TXT* 10000
> StoreDescription HTML* <body>

Maybe.  Depends on how the PDF files are indexed.  If you are using
spider.pl (with SWISH::Filter) then the document type is passed
directly to swish:

    $ spider.pl default http://localhost/apache/test.pdf 2>/dev/null | head -5
    Path-Name: http://localhost/apache/test.pdf
    Content-Length: 12589
    Last-Mtime: 1064946675
    Document-Type: HTML*

So that tells swish what type of file is being indexed:

    $ spider.pl default http://localhost/apache/test.pdf 2>/dev/null | swish-e -v9 -i stdin -S prog
    Indexing Data Source: "External-Program"
    Indexing "stdin"
    http://localhost/apache/test.pdf - Using HTML2 parser -  (2301 words)
    [...]

See how it says using HTML2 parser.  Now if you just index a file
without telling the parser type it says:

    $ swish-e -i 1.html -v9
    Indexing Data Source: "File-System"
    Indexing "1.html"

    Checking file "1.html"...
      1.html - Using DEFAULT (HTML2) parser -  (12 words)

So it's saying "DEFAULT" there.

If you are not using spider.pl or some -S prog program that passes in
the Document-Type: header then, yes, you would need to use
DefaultContents or IndexContents to set the content type.

I guess the reasoning is that storedescription works differently for
different types of documents, so it needs to be told what the document
is.

Here's my comment from many years ago:

 http://swish-e.org/current/docs/SWISH-3.0.html#Switch_to_Content_Types

-- 
Bill Moseley
moseley@hank.org

Unsubscribe from or help with the swish-e list: 
   http://swish-e.org/Discussion/

Help with Swish-e:
   http://swish-e.org/current/docs
   swish-e@sunsite.berkeley.edu
Received on Thu Oct 28 07:29:38 2004