Zhou Xiang wrote on 3/31/09 4:09 PM:
> Hi Peter,
>
> Thank you!
> It works well now.
>
> Another question:
> When i tried to index only one page:
> http://rust.cc.lehigh.edu/beyondsteel/swish-title.php
>
> What if I only want to index a specific meta tag (or meta tags) in the
> source file? And I do not want to index what is shown on the page.
> Say, i just want to index the meta name with "last name", so that if i
> search for "lnameAche", the page will be returned. (Pls see the source file
> of the page)
>
> I included the following line to the swish.config file:
> # Specify which meta names to include in the index
> MetaNames employer
>
> It does not work.
>
> (What about the tag "last name"? It has two words.)
>
> Any ideas? Thanks!
If you just want to index the metadata from the page and not the content, you'll
have to filter your input (content) before passing to swish-e.
If your web pages are being generated dynamically, why not just generate
index-ready content instead?
Or alternately, spider your pages as-is, then pass them through a filter to
swish-e. A simple regex in a Perl script should strip out the <body> content:
s,<body.*?>.*</body>,,sgi;
and then pass to swish-e -S prog.
As for MetaNames with spaces in them, you'll have to filter those too.
s,<meta name="(\S+)\ +(\S+)",<meta name="$1.$2",g;
and put:
MetaNames last.name
etc., in your swish-e config.
--
Peter Karman . http://peknet.com/ . peter(at)not-real.peknet.com
_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Tue Mar 31 20:54:34 2009