Skip to main content.
home | support | download

Back to List Archive

Re: [swish-e] exclude line of text from indexing

From: Rob Lingelbach <rob(at)>
Date: Fri Oct 09 2009 - 19:42:52 GMT
An example might serve better to explain.

I want to search my mailman-generated message archives,
which consist of individual html files, and are indexed by
swish-e according to my swish configuration file, for a particular
string, let's say "Fuji TAF" .

Because that string is in the Subject: line of at least one file, there
will be two other files that contain the string: the file that
corresponds to the message archived immediately before, and the
message archived immediately after, according to mailman's
date, thread, subject, author, etc. indexing.   All I really need to do
is exclude from indexing the *lines* that start:

<LI>Previous message:  <some variable text regarding msg Subject>


<LI>Next message:  <some variable text regarding msg Subject>

.   Because otherwise any search for text that is in the Subject:
header of a message will return 3 matching files- the one desired,
and the one after (which will have the "<LI>Previous message:
<Subject text here>") and the one before  (which will have the line
"<LI>Next message: <Subject text here>").


On Oct 9, 2009, at 4:23 PM, Rob Lingelbach wrote:

> On Oct 9, 2009, at 4:16 PM, Peter Karman wrote:
>> Rob Lingelbach wrote on 10/09/2009 01:54 PM:
>>> I need to exclude from swish-e indexing lines such as:
>>> "Next message: <some text>"
>>> and
>>> "Previous message: <some text>"
>> config.html#obeyrobotsnoindex
> Thanks for the answer Peter, but in this case perhaps I wasn't clear:
> every file- every message, some 100 thousand files or messages, has
> pipermail
> or mailman markup that includes the previous message's and the next
> message's
> Subject:  <text> .   What these lines have in common is the string at
> the head of
> the line such as:
> <LI>Previous message: <A HREF="016805.html">[Tig] DVNR 1000 4X4
> chasis  (etc.)......
> and
> <LI>Next message: <A HREF="016809.html">[Tig] Fuji TAF
> so you see, the lines can be found via their beginning text and not
> indexed by that,
> ---  it's not something I can do with meta tags per file, I don't  
> think?
> regards
> Rob
> --
> Rob Lingelbach
> _______________________________________________
> Users mailing list

Rob Lingelbach

Users mailing list
Received on Fri Oct 9 15:43:36 2009