Greg Keith wrote on 3/11/09 4:14 PM:
I want the document title returned as the first link, if there
> is one - most of the documents I'm indexing are HTML, so there should
> be a <title> tag for most of them. I am not clear on how to do this -
> it looks like it should be the proper combination of specifying the
> title_property in swish.cgi and the MetaNames directive in my
> swish.conf. However, I don't know what the proper combination is - I
> tried not having any MetaNames directive in the swish.conf, and
> having title_property set to "title" rather than "swishtitle", but
> this just produces a "(null)" result for each document found. My
> swish.conf and swish.cgi are below.
> Can anyone enlighten me?
The MetaNames config option is irrelevant in this case. MetaNames are for
limiting a query to certain *contexts*. PropertyNames are for returning
*contents* of hits.
The best thing to do is find a document you think *should* be returning a title
and isn't, and then make a test case with it. Here's an example:
[karpet@pekmac:~/tmp]$ swish-e -i title.html
Indexing Data Source: "File-System"
Removing very common words...
no words removed.
Writing main index...
Sorting words ...
Sorting 6 words alphabetically
Writing header ...
Writing index entries ...
Writing word text: Complete
Writing word hash: Complete
Writing word data: Complete
6 unique words indexed.
4 properties sorted.
1 file indexed. 94 total bytes. 6 total words.
Elapsed time: 00:00:00 CPU time: 00:00:00
[karpet@pekmac:~/tmp]$ swish-e -w hello
# SWISH format: 2.5.6
# Search words: hello
# Removed stopwords:
# Number of hits: 1
# Search time: 0.000 seconds
# Run time: 0.007 seconds
1000 title.html "this is the title" 94
[karpet@pekmac:~/tmp]$ cat title.html
<title>this is the title</title>
What you'll probably find, in the case of your HTML anyway, is that the swish-e
HTML parser isn't finding your <title> tagset for some reason: it isn't there,
or is named slightly differently, or...
Peter Karman . http://peknet.com/ . peter(at)not-real.peknet.com
Users mailing list
Received on Thu Mar 12 22:39:13 2009