An example might serve better to explain.
I want to search my mailman-generated message archives,
which consist of individual html files, and are indexed by
swish-e according to my swish configuration file, for a particular
string, let's say "Fuji TAF" .
Because that string is in the Subject: line of at least one file, there
will be two other files that contain the string: the file that
corresponds to the message archived immediately before, and the
message archived immediately after, according to mailman's
date, thread, subject, author, etc. indexing. All I really need to do
is exclude from indexing the *lines* that start:
<LI>Previous message: <some variable text regarding msg Subject>
<LI>Next message: <some variable text regarding msg Subject>
. Because otherwise any search for text that is in the Subject:
header of a message will return 3 matching files- the one desired,
and the one after (which will have the "<LI>Previous message:
<Subject text here>") and the one before (which will have the line
"<LI>Next message: <Subject text here>").
On Oct 9, 2009, at 4:23 PM, Rob Lingelbach wrote:
> On Oct 9, 2009, at 4:16 PM, Peter Karman wrote:
>> Rob Lingelbach wrote on 10/09/2009 01:54 PM:
>>> I need to exclude from swish-e indexing lines such as:
>>> "Next message: <some text>"
>>> "Previous message: <some text>"
> Thanks for the answer Peter, but in this case perhaps I wasn't clear:
> every file- every message, some 100 thousand files or messages, has
> or mailman markup that includes the previous message's and the next
> Subject: <text> . What these lines have in common is the string at
> the head of
> the line such as:
> <LI>Previous message: <A HREF="016805.html">[Tig] DVNR 1000 4X4
> chasis (etc.)......
> <LI>Next message: <A HREF="016809.html">[Tig] Fuji TAF
> so you see, the lines can be found via their beginning text and not
> indexed by that,
> --- it's not something I can do with meta tags per file, I don't
> Rob Lingelbach
> Users mailing list
Users mailing list
Received on Fri Oct 9 15:43:36 2009