Skip to main content.
home | support | download

Back to List Archive

MetaName search not working, yet StoreDescription is

From: Gordon Jessop <gjessop(at)>
Date: Tue Jan 29 2002 - 17:28:53 GMT
Thanks to Bill for the upgrade hint!  Got swish-e 2.1-dev-25 to index (built
*without* libxml2).

However, I am now saddled with this problem: I have to index a series of
jscript source files and the MetaNames function within swish-e does not seem
to be catching.

The Conf File:

    IndexFile /path/to/search.index
    IndexDir /path/to/test_dir
    IndexReport 3
    DefaultContents HTML
    EnableAltSearchSyntax yes
    SwishSearchOperators   AND OR NOT
    SwishSearchDefaultRule AND
    FollowSymLinks no
    FileRules filename contains "\.jsp$"
    ConvertHTMLEntities no

  # replace rules to pass js file name to the "some_func" js function
    ReplaceRules prepend "javascript: some_func('" "
    ReplaceRules remove "/path/to/test_dir/"
    ReplaceRules replace "server-jsp/" "', '"
    ReplaceRules replace "/" "', '"
    ReplaceRules append "');"

  # Our meta names
    MetaNames meta_description meta_author
    UndefinedMetaTags ignore
    PropertyNames meta_author

  # StoreDescription works, but MetaName based searches do not... hmmmm
    StoreDescription HTML <meta_description> 50
    MinWordLimit 3
    MaxWordLimit 15
    IgnoreWords File: /home/apps/swish-stop-words/english

The Content:

I have to index a series of jscript source files.  Each file would contain
something like:

// <title>Guns and Butter</title>

globalPackage.description = '<meta_description>Some indexable words like
supply and demand, guns and butter.</meta_description>'; = '<meta_author>Gordon Jessop</meta_author>'; = '1'; = 'checked';
globalPackage.blah = '123456';

Note: The comment (// <title>...) is there so that swish-e captures the
title properly (and it does).

Note: Due to imposed constraints, I am unable to use the proper <META
Name="name" CONTENT="content"> syntax and have settled for the option
described in the 2.2 docs (i.e. <meta_description>...</meta_description>)

The Problem:

The content is indexed and is searchable, but not by MetaName.  For

    $ swish-e -w 'meta_description=butter' -f /path/to/search.index
    # SWISH format: 2.1-dev-25
    # Search words: meta_description=butter
    err: no results

yet searching for 'butter' without MetaName results in a match:

    $ swish-e -w 'butter' -f /path/to/search.index
    # SWISH format: 2.1-dev-25
    # Search words: butter
    # Number of hits: 1
    # Search time: 0.001 seconds
    # Run time: 0.103 seconds
    1000 javascript: some_func('f', '4', '000004'); "Guns and Butter" 523

I can even see the StoreDescription element working:

    $ swish-e -c /path/to/search.conf -i /path/to/test_dir/file.js -T
    Indexing Data Source: "File-System"
    Indexing "/path/to/test_dir/file.js"

    Checking file "/path/to/test_dir/file.js"...
      file.js - Using HTML parser -  (45 words)
              swishdocpath: 6 ( 38) S: "javascript: pumsw('f', '4',
                swishtitle: 7 ( 21) S: "Guns and Butter"
              swishdocsize: 8 (  4) N: "0000000000523"
         swishlastmodified: 9 (  4) D: "2002-01-29 10:34:37"
          swishdescription:14 ( 20) S: "Some indexable words like supply and
demand, guns "

    Removing very common words...
    no words removed.
    Writing main index...
    Sorting words ...
    Sorting 28 words alphabetically
    Writing header ...
    Writing index entries ...
      Writing word text: Complete
      Writing word hash: Complete
      Writing word data: Complete
    28 unique words indexed.
    No properties sorted.
    1 file indexed.  523 total bytes.  45 total words.
    Elapsed time: 00:00:00 CPU time: 00:00:00
    Indexing done!

So it would seem that the StoreDescription function can see and act on the
meta_description tag.  Why can't the MetaName function see it?

Any ideas would be helpful.

Received on Tue Jan 29 17:29:40 2002