At 12:19 PM 02/26/02 -0800, Fred Toth wrote:
>Bill,
>
>>So you can see how the words are nested. Everything has the "META"
>>structure flag set since everything in within <html>.
>
>Uh. Maybe. I wonder if you could explain this output a little:
Oh, I guess it's a bit cryptic!
Adding:[1:html(10)] 'titletext' Pos:2 Stuct:0x87 ( META HEAD TITLE
FILE )
Ok, adding to file 1, meta "html" (metaID number = 10) the word
"titletext". Its word position 2 (used for phrase matching). It's
"structure" is 87 hex. "structure" is a bit flag that indicates where in a
HTML document that word is located, and in this case it's in " META HEAD
TITLE FILE"
META = because we said <html> was a metaname (so everything is inside of a
metaname)
HEAD = inside the <head> section
TITLE = inside the <title> section
FILE = is something I don't understand. I think it says it's in a file,
which everything is in a file, so I'm not sure about that one.
Adding:[1:html(10)] 'meta1text' Pos:6 Stuct:0x85 ( META HEAD FILE )
Only difference is that this is not in a <title> section any more. The
position "6" is bumped from "2" because of the </title><meta> tags --
that's to keep phrases from matching across tag boundaries. But, that's
configurable per metaname if you want to match phrases across metanames. A
classic example might be:
<person>
<first>Bill</first>
<last>Moseley</last>
</person>
where you would still want to be able to search -w person="bill moseley".
Those -T options were added to make debugging easier in development, but I
find that they are really useful to make sure my configuration is working
like I think it should work.
Anyway, does that help?
--
Bill Moseley
mailto:moseley@hank.org
Received on Tue Feb 26 20:42:43 2002