I am also wondering if there is a way to get Swish-e's spider to
automatically follow links to subdomains of the same domain, without having
it follow off-site links to other domains. Do you know what I mean? I
would rather not have to individually spider each subdomain separately in
the code and I would like Swish-e and/or it's spider to keep track of the
links related to each sub-domain.
By the way, it appears that my original post on this topic was the first on
the Swish-e discussion group for 2007! So, Happy New Year, everyone!
On 1/4/07, James <swish.enhanced@gmail.com> wrote:
>
> I have been trying to spider/crawl an off-site sub-domain several times
> and
> it doesn't seem to be working. I also seem to have a problem trying to
> spider/crawl a certain regular domain. I can't figure out the problem. I
> know there is a redirect, from the www to the non-www. The spider picks
> up
> the robots.txt and nothing more. Are there things I need to be aware of
> about the spider that are not in the documentation? Also, when will the
> spider be updated next? And when will Swish-e be updated for UTF-8?
>
> Also, I am concerned about something I read in the documentation about
> spidering sub-domains, that the index may point the links to the pages
> without the sub-domain. In other words, sub.domain.com/mypage.html would
> be
> indexed as domain.com/mypage.html, unless some tweaking of the code is
> done. Is this true?
>
> I know that though the questions are specific, some of the details are
> vague. I apologize. I would rather not post the actual URL's I am trying
> to crawl.
*********************************************************************
Due to deletion of content types excluded from this list by policy,
this multipart message was reduced to a single part, and from there
to a plain text message.
*********************************************************************
Received on Thu Jan 4 10:07:30 2007