Skip to main content.
home | support | download

Back to List Archive

Re: [swish-e] partial indexing

From: Zhou Xiang <xiz407(at)not-real.gmail.com>
Date: Fri Mar 27 2009 - 16:50:42 GMT
Thank you for your reply!

I just tried the spider.pl method you suggested and I added an external link
"http://www.amazon.com" to the list, but the spider still does not index it.

What's more, it still does not index any webpages outside the local server,
digital.lib.lehigh.edu.

My spider config file:
@servers = (
{
  base_url    => '
http://digital.lib.lehigh.edu/beyondsteel_test/admin/index.php',
  email       => 'abc@gmail.com',

  # other spider settings described below
  max_depth   => 1,
},
);

My swish config file:
# Use spider.pl as the external program:
IndexDir spider.pl

IndexFile /usr/local/swish-e-2.4.5/prog-bin/index-temp.swish-e

# And pass the name of the spider config file to the spider:
SwishProgParameters spider.config


Any advice?
Thank you!

- Dennis

On Thu, Mar 26, 2009 at 5:23 PM, Peter Karman <peter@peknet.com> wrote:

> Zhou Xiang wrote on 03/26/2009 03:29 PM:
> > Hi David,
> >
> > Thank you for your reply!
> > I tested it again today. It shows that the crawler can only index the
> > webpages within "http://digital.lib.lehigh.edu". It cannot crawl the
> pages
> > on "rust.cc.lib.lehigh.edu" or any other websites, even though i used
> real
> > URLs instead of queries.
> > Any ideas about it?
>
> don't use the old spider.
>
> Use spider.pl instead with -S prog.
>
> See this documentation:
>
>  http://swish-e.org/docs/spider.html
>
> and
>
>  http://swish-e.org/docs/swish-faq.html#spidering
>
> Note that with spider.pl there are 2 config files: 1 for swish-e, and 1
> for spider.pl.
>
> Your swish-e config file can remain unchanged with the exception of
> dropping:
>
> MaxDepth 2
> TmpDir /usr/local/swish-e-2.4.5/tmp
>
> since those are ignored with the -S prog method.
>
> --
> Peter Karman  .  peter(at)not-real.peknet.com  .  http://peknet.com/
>
> _______________________________________________
> Users mailing list
> Users@lists.swish-e.org
> http://lists.swish-e.org/listinfo/users
>


_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Fri Mar 27 12:50:43 2009