Zhou Xiang wrote on 3/31/09 4:09 PM:
> Hi Peter,
> Thank you!
> It works well now.
> Another question:
> When i tried to index only one page:
> What if I only want to index a specific meta tag (or meta tags) in the
> source file? And I do not want to index what is shown on the page.
> Say, i just want to index the meta name with "last name", so that if i
> search for "lnameAche", the page will be returned. (Pls see the source file
> of the page)
> I included the following line to the swish.config file:
> # Specify which meta names to include in the index
> MetaNames employer
> It does not work.
> (What about the tag "last name"? It has two words.)
> Any ideas? Thanks!
If you just want to index the metadata from the page and not the content, you'll
have to filter your input (content) before passing to swish-e.
If your web pages are being generated dynamically, why not just generate
index-ready content instead?
Or alternately, spider your pages as-is, then pass them through a filter to
swish-e. A simple regex in a Perl script should strip out the <body> content:
and then pass to swish-e -S prog.
As for MetaNames with spaces in them, you'll have to filter those too.
s,<meta name="(\S+)\ +(\S+)",<meta name="$1.$2",g;
etc., in your swish-e config.
Peter Karman . http://peknet.com/ . peter(at)not-real.peknet.com
Users mailing list
Received on Tue Mar 31 20:54:34 2009