Please excuse my ignorance about SWISH-E, but I have just joined the
list because I am looking for a specific type of tool and hope that
SWISH-E can do, or help do, what I need.
I need a system that will gather information from websites and then
categorize the site into subject groups.
What I currently do is work with a customized version of Squid to cache
all my sites, but I do not let anyone accessing the system have access
to the Internet, just the information stored in Squid. If someone tries
to access a site or information not stored in cache, the system makes
note of the unfound request. At night, all unfound requests are passed
to a system that uses the google api to gather URLs related to the
unfound requests. The URLs found at google are then turned over to wget
for downloading. wget is pointed at my Squid proxy and runs as the only
priviledged user who has access to the Internet through Squid. So all
the hits from wget are fed into the cache, and then the 'unfound
request' should be available the next day for users in the system.
Now, what I would like to do is organize the information stored in the
cache (or before it gets to the cache) into subject groups.
Is SWISH-E capable of organizing by subjects, or are there other leads
that someone might have to point me in the right direction. This is my
most difficult task to date, trying to organize the data by subjects,
and I am out of my league with intelligent systems, but I am a quick
learner if I can see some examples.
Thanks, and hope this isn't too off topic for SWISH-E,
Received on Mon Mar 8 18:34:04 2004