Skip to main content.
home | support | download

Back to List Archive

Re: following xml relative links with spider.pl

From: Cas Tuyn <cas.tuyn(at)not-real.gmail.com>
Date: Tue Jan 16 2007 - 14:54:09 GMT
Brian,

The spider stays within the start-domain localhost/svn, otherwise it
could go on and index the whole Internet. There is a setting
(follow-hosts or something) that allows you to say that links to
subversion.tigris.org may be followed. Also look at same-hosts if
these two hosts are actually equal but have a different domain (like
www.tigirs.org and tigris.org).

Regards,

Cas


On 1/16/07, Brian Ling <brian_ling_gandj@yahoo.com> wrote:
> Hi all,
>
> I've just started using swish-e so sorry if this is a
> bit newbie.
>
> I want to index a subversion repository via it's
> web/apache front end, but I can't seem to get
> spider.pl to follow the links in the default
> subversion output.
>
> I'm calling the spider directly with:
> /usr/local/lib/swish-e/spider.pl ./spider.conf it
> finds and outputs the main subversion page (output at
> end of mail) but doesn't follow any of the links.
> Everything appeared to install OK. I'm on OS X 10.4.8
> What am I missing?
>
> spider.conf:
>     @servers = (
>         {
>                 email       => 'test@test.co.uk',
>                 base_url    =>
> 'http://localhost/svn/',
>                 same_hosts  => [ '127.0.0.1' ],
>                 use_default_config  => 1,
>                 link_tags   => [qw/ a frame dir /],
>         },
>     );
>     1;
>
> output from spider.pl:
>
> /usr/local/lib/swish-e/spider.pl: Reading parameters
> from './spider.conf'
> Path-Name: http://localhost/svn/
> Content-Length: 1232
> Document-Type: xml*
>
> <?xml version="1.0"?>
> <?xml-stylesheet type="text/xsl"
> href="/xslt/svnindex.xsl"?>
> <!DOCTYPE svn [
>   <!ELEMENT svn   (index)>
>   <!ATTLIST svn   version CDATA #REQUIRED
>                   href    CDATA #REQUIRED>
>   <!ELEMENT index (updir?, (file | dir)*)>
>   <!ATTLIST index name    CDATA #IMPLIED
>                   path    CDATA #IMPLIED
>                   rev     CDATA #IMPLIED>
>   <!ELEMENT updir EMPTY>
>   <!ELEMENT file  EMPTY>
>   <!ATTLIST file  name    CDATA #REQUIRED
>                   href    CDATA #REQUIRED>
>   <!ELEMENT dir   EMPTY>
>   <!ATTLIST dir   name    CDATA #REQUIRED
>                   href    CDATA #REQUIRED>
> ]>
> <svn version="1.3.0 (r17949)"
>      href="http://subversion.tigris.org/">
>   <index rev="170" path="/">
>     <dir name="SubversionNotes"
> href="SubversionNotes/" />
>     <dir name="altirsCustomInventory"
> href="altirsCustomInventory/" />
>     <dir name="appsMan" href="appsMan/" />
>     <dir name="artwork" href="artwork/" />
>     <dir name="bootDVD-CD" href="bootDVD-CD/" />
>     <dir name="docs" href="docs/" />
>     <dir name="dtupdates" href="dtupdates/" />
>     <dir name="localMachine" href="localMachine/" />
>     <dir name="netlogon" href="netlogon/" />
>     <dir name="tools" href="tools/" />
>   </index>
> </svn>
>
> Summary for: http://localhost/svn/
> Connection: Close:     1  (1.0/sec)
>       Total Bytes: 1,232  (1232.0/sec)
>        Total Docs:     1  (1.0/sec)
>       Unique URLs:     1  (1.0/sec)
>
> Thanks for any pointer,
>
> Brian
>
>
>
> ____________________________________________________________________________________
> Now that's room service!  Choose from over 150,000 hotels
> in 45,000 destinations on Yahoo! Travel to find your fit.
> http://farechase.yahoo.com/promo-generic-14795097
>


-- 
Bookmark  http://kayakfun.info/salsagids/  voor de beste salsafeestjes!
Received on Tue Jan 16 06:54:10 2007