I am new and still learning to operate Swish and Wget..So here we go..
I use WGET to collect info from about 50 Web sites these sites are
Universities.. and then I use Swish to index them.
1. Due to limited disk space WGET fills up my disk and I have no room
for indexing and index.
2. After the indexing process is done I delete my resource/collected
files.. so when I do update I have to do all the thing from start
again.. which is a pain!
Well the solution is more disk space offcourse but I don't have any
What I wonder is ..
1. I want to run WGET and Swish in a sequence .. where..
a. WGET gets a file from the external site and then saves it to a temp
b. SWISH starts indexing from the temp directory
c. WGET/Swish deletes the temp file
d. Swish fixes up the relative linking
e. Do a stamp/MD5/mark on the index so when I update the index it will
not add a old documents which I have already index last week.
2. Swish uses some protocol and do crawling and indexing at the same
What do I do any help!! Thanks for your help
DO YOU YAHOO!?
Get your free @yahoo.com address at http://mail.yahoo.com
Received on Mon Jul 27 13:26:58 1998