PDF transformed: 1 (1.0/sec)
Skipped: 1 (1.0/sec)
Unique URLs: 1 (1.0/sec)
# file test.html
test.html: empty
>>> <moseley@hank.org> 07/23/03 07:58AM >>>
On Wed, Jul 23, 2003 at 07:54:51AM -0700, Erik Lyons wrote:
> Thanks Bill,
>
> Run this way, spider.pl appears to expect perl, so given the "f.conf"
> example (list of directives) it fails in a bountiful blossom of
syntax
> errors.
Right, sorry I wasn't clear:
> spider.pl your_config_file.name > test.html
should be:
spider.pl your_SPIDER_config_file.name > test.html
>
> >>> Bill Moseley <moseley@hank.org> 07/22/03 07:07PM >>>
> On Tue, Jul 22, 2003 at 04:38:13PM -0700, Erik Lyons wrote:
> > After several weeks of exclaiming joyful praise to the initial "S"
> in
> > SWISH, I stumbled across the example quoted below. It runs and
> reports
> > "PDF transformed: 2,009 (19.7/sec)", but no PDF files can be
> > returned in any search results. As an added bonus, all document
> titles
> > that are in the search results appear as "(NULL)". Are these
> problems
> > related, or do I have 2 different gleaming horizons of delight to
> > explore?
>
> Hard to say, but probably not hard to debug.
>
> Edit the spider's config file to point to a single PDF file. Then
just
>
> run the spider like:
>
> spider.pl your_config_file.name > test.html
>
> and look at test.html and make sure it has a title and content.
>
> Then you can index that one PDF with:
>
> cat test.html | swish-e -c your_config -S prog -i stdin -T
> properties
>
> the -T properties will show you if the title is being stored.
>
>
>
>
> --
> Bill Moseley
> moseley@hank.org
>
--
Bill Moseley
moseley@hank.org
Received on Wed Jul 23 15:25:43 2003