Thanks to Bill for the upgrade hint! Got swish-e 2.1-dev-25 to index (built
*without* libxml2).
However, I am now saddled with this problem: I have to index a series of
jscript source files and the MetaNames function within swish-e does not seem
to be catching.
#######
The Conf File:
IndexFile /path/to/search.index
IndexDir /path/to/test_dir
IndexReport 3
DefaultContents HTML
EnableAltSearchSyntax yes
SwishSearchOperators AND OR NOT
SwishSearchDefaultRule AND
FollowSymLinks no
FileRules filename contains "\.jsp$"
ConvertHTMLEntities no
# replace rules to pass js file name to the "some_func" js function
ReplaceRules prepend "javascript: some_func('" "
ReplaceRules remove "/path/to/test_dir/"
ReplaceRules replace "server-jsp/" "', '"
ReplaceRules replace "/" "', '"
ReplaceRules append "');"
# Our meta names
MetaNames meta_description meta_author
UndefinedMetaTags ignore
PropertyNames meta_author
PreSortedIndex
# StoreDescription works, but MetaName based searches do not... hmmmm
StoreDescription HTML <meta_description> 50
MinWordLimit 3
MaxWordLimit 15
IgnoreWords File: /home/apps/swish-stop-words/english
########
The Content:
I have to index a series of jscript source files. Each file would contain
something like:
// <title>Guns and Butter</title>
globalPackage.description = '<meta_description>Some indexable words like
supply and demand, guns and butter.</meta_description>';
globalPackage.author = '<meta_author>Gordon Jessop</meta_author>';
globalPackage.foo = '1';
globalPackage.bar = 'checked';
globalPackage.blah = '123456';
Note: The comment (// <title>...) is there so that swish-e captures the
title properly (and it does).
Note: Due to imposed constraints, I am unable to use the proper <META
Name="name" CONTENT="content"> syntax and have settled for the option
described in the 2.2 docs (i.e. <meta_description>...</meta_description>)
#######
The Problem:
The content is indexed and is searchable, but not by MetaName. For
instance:
$ swish-e -w 'meta_description=butter' -f /path/to/search.index
# SWISH format: 2.1-dev-25
# Search words: meta_description=butter
err: no results
yet searching for 'butter' without MetaName results in a match:
$ swish-e -w 'butter' -f /path/to/search.index
# SWISH format: 2.1-dev-25
# Search words: butter
# Number of hits: 1
# Search time: 0.001 seconds
# Run time: 0.103 seconds
1000 javascript: some_func('f', '4', '000004'); "Guns and Butter" 523
I can even see the StoreDescription element working:
$ swish-e -c /path/to/search.conf -i /path/to/test_dir/file.js -T
properties
Indexing Data Source: "File-System"
Indexing "/path/to/test_dir/file.js"
Checking file "/path/to/test_dir/file.js"...
file.js - Using HTML parser - (45 words)
swishdocpath: 6 ( 38) S: "javascript: pumsw('f', '4',
'000004');"
swishtitle: 7 ( 21) S: "Guns and Butter"
swishdocsize: 8 ( 4) N: "0000000000523"
swishlastmodified: 9 ( 4) D: "2002-01-29 10:34:37"
swishdescription:14 ( 20) S: "Some indexable words like supply and
demand, guns "
Removing very common words...
no words removed.
Writing main index...
Sorting words ...
Sorting 28 words alphabetically
Writing header ...
Writing index entries ...
Writing word text: Complete
Writing word hash: Complete
Writing word data: Complete
28 unique words indexed.
No properties sorted.
1 file indexed. 523 total bytes. 45 total words.
Elapsed time: 00:00:00 CPU time: 00:00:00
Indexing done!
So it would seem that the StoreDescription function can see and act on the
meta_description tag. Why can't the MetaName function see it?
Any ideas would be helpful.
--
Advansis: http://www.advansis.com/
Received on Tue Jan 29 17:29:40 2002