Skip to main content.
home | support | download

Back to List Archive

Re: [SWISH-E:389] Swish/Wget

From: Brian Rankin <brankin(at)not-real.wested.org>
Date: Tue Jul 28 1998 - 05:15:04 GMT
Zaheed, that's an interesting idea, but I don't see how it can be 
implemented w/o a major rewrite of Swish.  I'm using wget & maintaining 
the archives - not for updates, but for my search results screen.  As 
Swish displays each results line, a perl scripts reads-in text from the 
wget archive and builds a search-results page Altavista style (with a few 
lines of text from each returned document).

a 2GB SCSI drive costs about $300.00.

Brian Rankin                                   Phone: 415-565-3096
Telecommunications Director                      Fax: 415-565-3012
WestEd                                          brankin@WestEd.org
730 Harrison Street                          http://www.WestEd.org
San Francisco, CA  94107                     


On Mon, 27 Jul 1998, Zaheed Haque wrote:

> Hi,
> 
> I am new and still learning to operate Swish and Wget..So here we go..
> 
> I use WGET to collect info from about 50 Web sites these sites are
> Universities.. and then I use Swish to index them. 
> 
> Problems:
> 
> 1. Due to limited disk space WGET fills up my disk and I have no room
> for indexing and index.
> 
> 2. After the indexing process is done I delete my resource/collected
> files.. so when I do update I have to do all the thing from start
> again.. which is a pain!
> 
> Well the solution is more disk space offcourse but I don't have any
> money :-)
> 
> What I wonder is ..
> 
> 1. I want to run WGET and Swish in a sequence .. where..
> 
> a. WGET gets a file from the external site and then saves it to a temp
> diectory..
> 
> b. SWISH starts indexing from the temp directory
> 
> c. WGET/Swish deletes the temp file
> 
> d. Swish fixes up the relative linking
> 
> e. Do a stamp/MD5/mark on the index so when I update the index it will
> not add a old documents which I have already index last week.
> 
> or 
> 
> 2. Swish uses some protocol and do crawling and indexing at the same
> time..
> 
> What do I do any help!! Thanks for your help
> 
> Cheers
> Zaheed
> ==
> Regds
> Zaheed Haque
> zaheed@yahoo.com
> _________________________________________________________
> DO YOU YAHOO!?
> Get your free @yahoo.com address at http://mail.yahoo.com
> 
> 
Received on Mon Jul 27 22:26:53 1998