Bill,
Many thanks for adding IndexName to the set of headers accessible from
the Perl module, and making them consistent. I downloaded
swish-e-2.1-dev-24-2001-11-02.tar.gz today and it seems to work fine
(after I added -lz to Makefile.PL !).
> How does the perl module make it more portable?
Probably not a big issue on most modern platforms, but it avoids having
to fork/exec the swish-e program: the Perl documentation describes how
to do this while avoiding a shell, but I haven't tried it on anything
other than Unix.
> I do hope you validate the path.
Hmm... What validation does SwishOpen do? Surely it doesn't allow a
shell to see the index file name? I had a simple -r check when I was
only allowing a single index but I took it out in preparation for
allowing multiple indexes like index=file1+file2 Perhaps I'll put it
back.
Some other comments:
The TXT2 parser couldn't cope with empty files returned using the "prog"
method: my Perl spider returns empty files (actually Content-Length: 1
containing a single newline) if No-Contents: 1 is set. I had to revert
to TXT in this case. The error showed up as a broken pipe, presumably
caused by swish-e aborting.
It would be useful if the "prog" spider could tell swish-e what parser
(TXT,HTML,XML,TXT2, etc) to use for each file sent: then I wouldn't need
all that IndexContents stuff in my conf file, and sometimes there is no
filename suffix anyway (eg: spider-generated directory indexes don't end
in ".html"). How about adding a "Swish-Parser:" header (or use the
standard MIME "Content-Type:" if you plan to eventually remove the
distinction between TXT and TXT2, etc, by moving completely to the
libxml2 parser)
With the introduction of the libxml2 parser and the resulting increase
in size of the executable and/or Perl DLL (.so) (I eventually did as
suggested and compiled swish-e twice, with and without libxml2, but what
a hassle!), I would suggest that the time has probably come to split
swish-e into an "indexer" and a smaller "searcher" that doesn't need all
the parsing stuff. In fact, the indexer probably doesn't need the
built-in directory/web crawling facilities now that you have the "prog"
method and a range of Perl spiders that seem to do the job.
Hope these comments help.
Alex Lyons.
This e-mail and any attachments may contain confidential and/or
privileged material; it is for the intended addressee(s) only. If you
are not a named addressee, you must not use, retain or disclose such
information.
Serco cannot guarantee that the e-mail or any attachments are free
from viruses.
Serco Group plc. Registered in England and Wales. No: 2048608
Registered Office: Dolphin House, Windmill Road, Sunbury-on-Thames,
TW16 7HT, United Kingdom.
Received on Fri Nov 2 16:27:23 2001