Skip to main content.
home | support | download

Back to List Archive

Re: Web Page listing software necessary when

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Wed Jun 13 2001 - 14:25:40 GMT
At 04:44 AM 06/13/01 -0700, AusAqua wrote:
>What is the address from which I could download the "lite" version please.
>A pointer to any associated information on this "lite" version could also be
>very helpful.

The scripts are part of the current development version of swish.  Daily
snapshots are currently at http://sunsite.berkeley.edu:4444/swish-daily/.
There's a warning about it being development code at the top of the page.
Individual files can be found at 
http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/swishe/swish-e/


>OK.  The prospective ISP says I would have shell access so installation of
>the modules seems "do-able".  However I'm waiting for a response as to
>whether the ISP would add the 'use' lines to their perl program.

Their perl programs?  You would only need to add the use line to *your*
programs.


>OK.  So is it enough to have both my local computer (containing the Web Site
>copy and on which I would build swish-e) and the ISP's server on Linux
>platform, or do I further need to ensure that the same Version of Linux (say
>5.6.1) and the same server software is on both my local computer and the
>ISP's server ?

5.6.1 is the perl version.  I cannot answer about binary compatibility.  My
guess is if you ISP is running Linux and so are you that you could copy the
binary.  But, I'd build from source whenever possible.

The "pure" perl programs (and modules) will work on different version of
perl, within reason.  But some modules have C extensions that need to use
the perl header files when compiling, and that may be platform specific.
I'd build modules on the target system.

>Almost.  I think the issue might be: "Do I need to use spidering to achieve
>my aims".  Hopefully if I can answer this, then I might be able to determine
>whether what the prospective ISP is able to offer will be sufficient to
>enable me to work swish-e.  I'm still not quite clear as to whether the
>spidering function is needed for what I want to do

Spidering allows you to index the web pages people can get to from links
on, say, your home page.

Often times a web site will map one-to-one with physical paths on the
computer.  That is, every file in a directory tree is part of the web site.
 So /usr/local/apache/htdocs/some/page/here/index.html is
http://localhost/some/page/here/index.html.
In that case you can often use the file system to index your web site.

But, more likely, is that the "web space" doesn't map directly, or that you
have some content that is dynamically created (such as by a CGI script) so
that the only way to read that content is via the web server.  In this case
you must spider.

The only advantage of using the file system over the spider is *maybe* a
little speed.  But spidering a local web server should be rather fast.


>So the specific question I'm asking is "Is spidering required to create and
>search the indices and dictionaries created at the various directory levels
>shown above ?"

Again, it's just a matter of what kind of access you want to provide.

A simple example.  Say in directory "public_html" you have the "home" page
called index.html.  Also in that directory you have ten other pages, 1.html
through 10.html.  But, on the index.html page you only have links to 1.html
through 5.html.  So there's no way to get from index.html to 6.html ->
10.html by clicking on a link.  But, someone could type in the url for,
say, 7.html and view that page.

So, do you want to index 6.html through 10.html?  If so, then you need to
use the file system to index.  But if you only want to index what people
can find from following links on your web site then you use the spider.

Again, it gets more complicated with dynamically created content.




Bill Moseley
mailto:moseley@hank.org
Received on Wed Jun 13 14:44:50 2001