[Moved to the swish-e list]
At 06:15 PM 10/22/02 +0200, Dominik Marti wrote:
>Some online-journals are protected by username and password which will
>also be saved in our database. So when indexing some protected urls
>using a recursion depth more than 1, username and password are always
>needed. Could you tell me if swish-e can handle sessions or cookies, so
>it wouldn't be necessary to always ask the database for the username and
>password. It would be a better solution working with either sessions or
>cookies. I wasn't able to find this information in the docs.
Swish-e indexes files, it doesn't really know anything about passwords or
Obviously, there's different ways to use "passwords" on the web, so your
question can't really be answered. A session is something else, too, but
Does the spider script that comes with swish-e (spider.pl) support cookies?
Does the spider script ask for a password when fetching a doc that is
Could the spider script be easily modified to lookup username and passwords
in at database based on the URL? Sure, that would not be difficult.
Could passwords looked up in a database be stored in a cookie? That all
depends on the web server.
>Some articles from several online journals could be the same. Can
>swish-e handle this situation so it filters similar articles?
Not swish, but the spider can use a MD5 checksum to filter out duplicate docs.
But the docs would have to be exactly the same for that to work.
Received on Tue Oct 22 17:21:07 2002