Zaheed, that's an interesting idea, but I don't see how it can be
implemented w/o a major rewrite of Swish. I'm using wget & maintaining
the archives - not for updates, but for my search results screen. As
Swish displays each results line, a perl scripts reads-in text from the
wget archive and builds a search-results page Altavista style (with a few
lines of text from each returned document).
a 2GB SCSI drive costs about $300.00.
Brian Rankin Phone: 415-565-3096
Telecommunications Director Fax: 415-565-3012
730 Harrison Street http://www.WestEd.org
San Francisco, CA 94107
On Mon, 27 Jul 1998, Zaheed Haque wrote:
> I am new and still learning to operate Swish and Wget..So here we go..
> I use WGET to collect info from about 50 Web sites these sites are
> Universities.. and then I use Swish to index them.
> 1. Due to limited disk space WGET fills up my disk and I have no room
> for indexing and index.
> 2. After the indexing process is done I delete my resource/collected
> files.. so when I do update I have to do all the thing from start
> again.. which is a pain!
> Well the solution is more disk space offcourse but I don't have any
> money :-)
> What I wonder is ..
> 1. I want to run WGET and Swish in a sequence .. where..
> a. WGET gets a file from the external site and then saves it to a temp
> b. SWISH starts indexing from the temp directory
> c. WGET/Swish deletes the temp file
> d. Swish fixes up the relative linking
> e. Do a stamp/MD5/mark on the index so when I update the index it will
> not add a old documents which I have already index last week.
> 2. Swish uses some protocol and do crawling and indexing at the same
> What do I do any help!! Thanks for your help
> Zaheed Haque
> DO YOU YAHOO!?
> Get your free @yahoo.com address at http://mail.yahoo.com
Received on Mon Jul 27 22:26:53 1998