On 09/10/2007 12:03 AM, firstname.lastname@example.org wrote:
> On 8 Sep 2007 at 19:36, Peter Karman wrote:
>> If you're using the spider.pl or DirTree.pl with -S prog, then yes, you
>> can filter the content with a regex and output additional <meta> tags
>> with the content.
> I'm planning to do a -prog thing that would do its own xml-parsing
> and pass just plain text for swish to index. Is it possible to
> produce meta-fields in this scenario? The text would not have any
> tags.. no "<" or ">" .. well, of course I could write them, but seems
> like a waste to have swish parse it for xml a second time,
> Something like outputting:
> Path-Name: MYPATH
> Content-Lines: NUBWER_OF_LINES
> Last-Mtime: $mtime
> Document-Type: TEXT
> Meta: Subject=MYSUBJECT
> Meta: AUTHOR=MYAUTHOR
If you want to add meta information, you must parse documents either as HTML or
XML. So you'd need to do something like:
It's necessary for the content to be XML or HTML -- swish-e has no other way of
parsing MetaNames or PropertyNames.
> (I changed the content-length -header wishfully to content-lines,
> as calculating the number of bytes swish thinks the file contains can be a
> bit tedios if I have lines ending in crlf, and others with just cr or lf..
> number of lines would be much easier. Also for swish, i think, if it reads
> the input line-by-line. But this is not so important)
Number of lines is something swish-e knows nothing about -- it just reads N
bytes into a buffer, parses them, and then reads another N bytes.
Peter Karman . peter(at)not-real.peknet.com . http://peknet.com/
Users mailing list
Received on Mon Sep 10 09:20:38 2007