Re: Novice question: unknown MetaNames error

From: Bill Moseley <moseley(at)>
Date: Fri Jan 16 2004 - 16:56:48 GMT
[I'm cc'ing back to the list]

On Fri, Jan 16, 2004 at 11:39:06AM -0500, Julie Wetherill wrote:
> Followed your instructions. Wouldn't ya know, with just MetaNames and no 
> PropertyNames, I can retrieve on the metaname "description". Just FYI, the 
> instruction

Yep.  That would have been an obvious bug, I think.

> -T index_metanames (note that my executable is "" 
> rather than swish-e)
> causes a segmentation fault. Don't know why. Seems like this could be a 
> helpful command if I could get it to work.

Dave, can you test this on Windows?

> Anyway, I do have a related problem that maybe you can explain. I need to 
> retrieve on metadata imbedded in PDFs. Adobe uses Dublin Core tags 
> (dc:description, dc:title, dc:creator). I can't get swish-e to recognize 
> these as metanames (whether these are in PDFs or in HTML).

$ cat c
MetaNames dc:description

$ cat 1.html     
<meta name="dc:description" content="=foo">

$ swish-e -c c -i 1.html -v0 -T indexed_words
    Adding:[1:swishdefault(1)]   'b'   Pos:2  Stuct:0x7 ( HEAD TITLE FILE )
    Adding:[1:swishdefault(1)]   'title'   Pos:3  Stuct:0x7 ( HEAD TITLE FILE )
    Adding:[1:dc:description(10)]   'foo'   Pos:6  Stuct:0x85 ( META HEAD FILE )
    Adding:[1:swishdefault(1)]   'hello'   Pos:9  Stuct:0x9 ( BODY FILE )

$ swish-e -w dc:description=foo
# SWISH format: 2.4.1
# Search words: dc:description=foo
# Removed stopwords: 
# Number of hits: 1
# Search time: 0.001 seconds
# Run time: 0.043 seconds
1000 1.html "<b>title" 125

> Warning: Substituted possible embedded null character(s) in file 
> '/home/hul/htdocs/ois/systems/aleph/docs/test/serial_claiming_in_Aleph.pdf'

Looks like you are not filtering the pdf files.

Bill Moseley
Received on Fri Jan 16 16:57:38 2004