Skip to main content.
home | support | download

Back to List Archive

Re: [swish-e] partial indexing

From: Zhou Xiang <xiz407(at)not-real.gmail.com>
Date: Fri Mar 27 2009 - 20:12:22 GMT
Thank you for your help!
Now it is still strange that when I tried to index the following page:
http://digital.lib.lehigh.edu/beyondsteel_test/admin/templist.htm
Although I set max_depth to be 1, it still cannot dig deeper into each link.
That means it can only index the text appears on the above page, but none of
the contents in each link, .
Can you figure it out?

My spider.config file:
@servers = (
{
  base_url    => '
http://digital.lib.lehigh.edu/beyondsteel_test/admin/templist.htm',
  email       => 'abc@gmail.com',

  # other spider settings described below
  max_depth   => 1,
},
);

Best,
Dennis


On Fri, Mar 27, 2009 at 1:11 PM, Peter Karman <peter@peknet.com> wrote:

> Zhou Xiang wrote on 03/27/2009 11:50 AM:
> > Thank you for your reply!
> >
> > I just tried the spider.pl method you suggested and I added an external
> link
> > "http://www.amazon.com" to the list, but the spider still does not index
> it.
> >
> > What's more, it still does not index any webpages outside the local
> server,
> > digital.lib.lehigh.edu.
> >
> > My spider config file:
> > @servers = (
> > {
> >   base_url    => '
> > http://digital.lib.lehigh.edu/beyondsteel_test/admin/index.php',
>
> you must add all the base names you want included, either in base_url or
> same_hosts (depending on how you want them indexed).
>
> Read the docs:
>
>  http://swish-e.org/docs/spider.html#configuration_options
>
> the default behaviour is to remain only on the same host.
>
> --
> Peter Karman  .  peter(at)not-real.peknet.com  .  http://peknet.com/
>
> _______________________________________________
> Users mailing list
> Users@lists.swish-e.org
> http://lists.swish-e.org/listinfo/users
>



-- 

Sent from: Bethlehem PA United States.


_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Fri Mar 27 16:12:23 2009