Skip to main content.
home | support | download

Back to List Archive

Re: Novice question: unknown MetaNames error

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Fri Jan 16 2004 - 16:56:48 GMT
[I'm cc'ing back to the list]

On Fri, Jan 16, 2004 at 11:39:06AM -0500, Julie Wetherill wrote:
> 
> Followed your instructions. Wouldn't ya know, with just MetaNames and no 
> PropertyNames, I can retrieve on the metaname "description". Just FYI, the 
> instruction

Yep.  That would have been an obvious bug, I think.

> swish-e.new -T index_metanames (note that my executable is "swish-e.new" 
> rather than swish-e)
> 
> causes a segmentation fault. Don't know why. Seems like this could be a 
> helpful command if I could get it to work.

Dave, can you test this on Windows?


> Anyway, I do have a related problem that maybe you can explain. I need to 
> retrieve on metadata imbedded in PDFs. Adobe uses Dublin Core tags 
> (dc:description, dc:title, dc:creator). I can't get swish-e to recognize 
> these as metanames (whether these are in PDFs or in HTML).

$ cat c
MetaNames dc:description

$ cat 1.html     
<html>
<head><title>&lt;b&gt;title</title>
<meta name="dc:description" content="=foo">
</head>
<body>
hello
</body>
</html>

$ swish-e -c c -i 1.html -v0 -T indexed_words
    Adding:[1:swishdefault(1)]   'b'   Pos:2  Stuct:0x7 ( HEAD TITLE FILE )
    Adding:[1:swishdefault(1)]   'title'   Pos:3  Stuct:0x7 ( HEAD TITLE FILE )
    Adding:[1:dc:description(10)]   'foo'   Pos:6  Stuct:0x85 ( META HEAD FILE )
    Adding:[1:swishdefault(1)]   'hello'   Pos:9  Stuct:0x9 ( BODY FILE )

$ swish-e -w dc:description=foo
# SWISH format: 2.4.1
# Search words: dc:description=foo
# Removed stopwords: 
# Number of hits: 1
# Search time: 0.001 seconds
# Run time: 0.043 seconds
1000 1.html "<b>title" 125
.


> Warning: Substituted possible embedded null character(s) in file 
> '/home/hul/htdocs/ois/systems/aleph/docs/test/serial_claiming_in_Aleph.pdf'

Looks like you are not filtering the pdf files.


-- 
Bill Moseley
moseley@hank.org
Received on Fri Jan 16 16:57:38 2004