Hello all and Mr.Klatchko,
Let me clarify my statements to dispel any confusion. I used the word
"intranet" to mean a collect of sites(let me use the word 'sites' for
now)that are only visible/accesible from within the company premises and are
NOT visible to the external world.In other words,these sites are not a part
of the world wide web(for reasons of security,of course).
What I would like SWISH-E to therefore do is to index these internal
sites.Most of the links on the main page of this company 'intranet' lead to
other sites/pages within the 'intranet' and VERY FEW of the links lead to
pages that are part of the World Wide Web(and that are not a part of the
company 'intranet'). I would like SWISH-E to access the 'intranet'
sites/pages and ignore the WWW sites.
I must at this point mention that,as a part of testing, when I ran the
HTTP spidering on my own web-site( which IS a part of the WWW and NOT a part
of the company intranet) it worked like a charm. From the company, we would
like SWISH-E to do exactly the opposite.
Thanking one and all,
Sincerely,
Ashok Rajamani
On Tue, 30 May 2000 16:54:43 -0700 (PDT), rsk@corpmail.brightmail.com wrote:
>
> > I meant that the web pages I want indexed are part of a company
> > INTRANET,in that, there definitely are links on the main page( page to
start
> > spidering from) only to web sites that are within the company intranet
>
> Okay, in this sentence you talk about pages and sites. I need to know
> whether your are exactly describing your environment or whether you are
> being inexact in your phrasing. What is your definition of "web site"
> as used in the above question? For that matter, what is your definition
> of an "intranet"?
>
> moo
>
> arajamani@excite.com wrote:
> >
> > Hello everone and Mr.Klatchko,
> > I agree with you, Mr.Klatchko, when you say that in the HTTP method
we
> > dont look at the file system at all. Also, when I said "NOT visible to
the
> > WWW" I meant that the web pages I want indexed are part of a company
> > INTRANET,in that, there definitely are links on the main page( page to
start
> > spidering from) only to web sites that are within the company intranet
and
> > not to any WWW sites.I want these internal links to be spidered. I
really
> > appreciate your taking time out to answer my questions.
> > Sincerely,
> > Ashok
> >
> > On Fri, 26 May 2000 15:56:32 -0700 (PDT), rsk@corpmail.brightmail.com
wrote:
> >
> > > arajamani@excite.com wrote:
> > > > Thanks for pointing out the errors. I have gone ahead and
changed the
> > > > config file and the HTTP indexing works just fine!( I have
enclosed the
> > > > modified config file ) However,it is unable to spider down the the
> > links and
> > > > index them too. All the links are a part of intra-net and are NOT
> > visible to
> > > > the WWW. Is this what's preventing the spider from spidering
down.
> > > > THanks once again for your help.
> > >
> > > The spider works by indexing the first page (depth 1). It then
finds
> > > all links on that page that are on the same (or equivalent as
defined in
> > > the config file) server. It then indexes each of those pages (depth
2)
> > > and follows those links. It does this until it reaches it's max
depth
> > > or all file on a server are indexed.
> > >
> > > The most important thing is that it can only find pages that you
tell it
> > > to index or that it can find a URL on one of the pages it indexes.
If
> > > your comment that they are "NOT visible to the WWW" means there are
no
> > > links to the pages, then no, they won't be indexed. How would the
> > > spider know they exist (and don't suggest that it look at the file
> > > system, the HTTP method was built to index foreign sites where it
has no
> > > access to the fs).
> > >
> > > moo
> > > ------------------------------------------------------------
> > > Ron Samuel Klatchko - Senior Software Jester
> > > Brightmail Inc - rsk@brightmail.com
> >
> > _______________________________________________________
> > Get 100% FREE Internet Access powered by Excite
> > Visit http://freelane.excite.com/freeisp
>
> --
> ------------------------------------------------------------
> Ron Samuel Klatchko - Senior Software Jester
> Brightmail Inc - rsk@brightmail.com
_______________________________________________________
Get 100% FREE Internet Access powered by Excite
Visit http://freelane.excite.com/freeisp
Received on Wed May 31 12:36:52 2000