Skip to main content.
home | support | download

Back to List Archive

Swish/Wget

From: Zaheed Haque <zaheed(at)not-real.yahoo.com>
Date: Mon Jul 27 1998 - 20:19:40 GMT
Hi,

I am new and still learning to operate Swish and Wget..So here we go..

I use WGET to collect info from about 50 Web sites these sites are
Universities.. and then I use Swish to index them. 

Problems:

1. Due to limited disk space WGET fills up my disk and I have no room
for indexing and index.

2. After the indexing process is done I delete my resource/collected
files.. so when I do update I have to do all the thing from start
again.. which is a pain!

Well the solution is more disk space offcourse but I don't have any
money :-)

What I wonder is ..

1. I want to run WGET and Swish in a sequence .. where..

a. WGET gets a file from the external site and then saves it to a temp
diectory..

b. SWISH starts indexing from the temp directory

c. WGET/Swish deletes the temp file

d. Swish fixes up the relative linking

e. Do a stamp/MD5/mark on the index so when I update the index it will
not add a old documents which I have already index last week.

or 

2. Swish uses some protocol and do crawling and indexing at the same
time..

What do I do any help!! Thanks for your help

Cheers
Zaheed
==
Regds
Zaheed Haque
zaheed@yahoo.com
_________________________________________________________
DO YOU YAHOO!?
Get your free @yahoo.com address at http://mail.yahoo.com
Received on Mon Jul 27 13:26:58 1998