Note -- searching the swish-e list archive for windows and pdf2html will
find similar tips for using pdf2html.
On Thu, Nov 06, 2003 at 11:28:47PM -0800, David L Norris wrote:
> I probably should install the example scripts (index_hypermail.pl,
> _pdf2html.pl, MySQL.pl, file.pl, DirTree.pl) somewhere other than
> lib\swish-e by default. I think Bill made that suggestion at some
> point. I can see how that would be extremely confusing.
They should go in share/doc/swish-e/examples.
One exception is DirTree.pl. That's a tiny little example script, but
perhaps it needs to be developed into something that provides the
functionality of -S fs indexing mode, but with SWISH::Filter built in.
I was going to post with a modified DirTree.pl program that included
SWISH::Filter for the original poster, but I have not had time to get to
my Windows machine to test (I keep it locked away for health reasons).
Adding SWISH::Filter support is not hard, and one can use the example in
the spider.cgi example config SwishSpiderConfig.pl as a template.
Now, I have not tried this yet, but there's also the utility called
swish-filter-test. That program is modified at install time to find the
SWISH::Filter module (which is also designed to find the binaries.
So, I suppose one could do this if using -S fs indexing method:
FileFilter .pdf swish-filter-test '-quiet -content "%p"'
Now, I don't recommend that in general because of the cost of Perl
loading and compiling all those modules for every document. That's why
I'm suggesting modifying DirTree.pl -- it would only load the
SWISH::Filter modules one time.
BTW - swish-filter-test is installed in the same place as the swish-e
binary, so it may be in your path. If not, then specify the path, of
course. Oh, I suppose Windows would need to know that was a Perl script
and thus you would need to rename it to swish-filter-test.pl (on some
versions of Windows or run it as "perl /path/to/swish-filter-test" on
other versions of Windows).
Hey Dave. How long does it take you to install Linux these days?
> > Set correct path to pdf converters in _pdf2html.pl. e.g.
> > $ENV{PATH} ='D:/Program Files/SWISH-E4/lib/swish-e;'. $ENV{PATH};
>
> Yes, that's a good workaround. I'll see if we can get that fixed in the
> next release. We don't seem to be handling FileFilter correctly. My
> intention is that everything in {prefix}\lib\swish-e should be directly
> executable using it's base name.
Basically, those filters have been left behind. They never had the
right path in them unless you happened to install pdftotext and other
binaries in your PATH. _pdf2html.pl says:
This filter requires two programs "pdfinfo" and "pdftotext".
These programs are part of the xpdf package found at
http://www.foolabs.com/xpdf/xpdf.html.
These programs must be found in the PATH when indexing is run, or
explicitly set the path in this program:
$ENV{PATH} = '/path/to/programs'
It's just that we made other things work automatically but not this.
I don't really want to maintain two different sets of filters, but those
files in the filter-bin directory are nice as examples, I think. That's
why I'd install them in the documentation directory.
> Bill, looks like we never fixed FilterOpen() in filter.c, line 298.
> It's not using the new PATH stuff. It's simply doing a popen(). I
> think it needs to be doing a get_env_path_with_libexecdir() beforehand.
We have talked about that a few times, and I've looked into it, too. I
wish I could remember exactly my reasoning for not making that change.
I think I felt that FileFilter was more of a general hook and thus you
(the user of FileFilter) would setup paths as needed.
That said, remember that there was code in there for a while that added
libexecdir() to the path at startup of swish. That would make the
filters work better.
http://cvs.sourceforge.net/viewcvs.py/swishe/swish-e/src/swish.c
I think the problem was that setting PATH was not portable, but I think
there was another reason I removed it. That the part I can't remember.
--
Bill Moseley
moseley@hank.org
Received on Fri Nov 7 15:05:55 2003