thanks for the detailed posting. Glad you found something that works.
FWIW, the Perl-centric nature of some Swish-e components is due to who took the
time to write them. Since the main swish-e program is in C, and the API
well-documented, we encourage other scripting language advocates to contribute
code in their favorite languages.
cheers,
pek
Gertjan Hofman scribbled on 8/6/06 8:42 PM:
> Dear Swish user,
>
> I posted a message a while ago trying to see how to
> get NTLM authentication to work in conjunction with
> swish-e. It was pointed out to me that there is a Perl
> module for this. It probably works, but unless some
> one tells me step-by-step how to use it, it's a time
> consuming route for someone who doesnt speak Perl - i
> tried Perl a few years ago and decided there are many
> better scripting languages around and stopped learning
> and using it. Not trying to start a flaming war here,
> but just an observation that if Swish becomes very
> Perl focused in it's set-up, it might lose some users.
>
>
> Anyway, for anyone else who is Perl-ignorant and wants
> to do this, I got around the NTLM authentication
> required to get to our intranet by using the ntlmap
> proxy server. I downloaded the package from
> sourceforge. My edited config file is attached below.
> Test it by running:
>
> /main.py &
>
> and fetching something with wget:
>
> setenv http_proxy=localhost:5865
> and use wget to get a file wget --proxy=on
> myintranet_serve/index.htm
>
>
> To use with swish-e, in the spider config file
> (spider.conf.pl), switch on proxy-ing:
>
> # start of spider.conf.pl
>
> my ($filter_sub, $response_sub) = swish_filter();
>
> my %main_site = (
>
>
> base_url =>
> 'http://hmx-bi35-s6/sitemap.html',
> email =>
> 'root@bi35-sensorinfo.iac.honeywell.com',
> keep_alive => 1, #
> Try to keep the connection open
> filter_content => $filter_sub, #
> use SWISH filter
> test_url => sub {
> my ($uri, $server) = @_;
> # enable proxy requests
> unless ($::proxy_set++) {
> my $ua = $server->{ua};
> $ua->proxy('http',
> 'http://localhost:5865');
> }
> # return true if not an image,
> otherwise false
> return $uri->path !~
> /\.(gif|jpeg|png)$/;
>
> },
>
> );
>
> @servers = ( \%main_site);
>
> # end of spider.conf.pl
>
> And in the main config file I have my usual stuff
> plus:
>
> IndexDir spider.pl
> SwishProgParameters spider.conf.pl
> IndexFile "web.index"
>
>
> The formatting in the above test might be wonky
> because I am using yahoo's poor mail composer.
>
> Cheers
>
> Gertjan
>
>
> (proxy server config file).
>
> [GENERAL]
> LISTEN_PORT:5865
> PARENT_PROXY:
> PARENT_PROXY_PORT:8080
> PARENT_PROXY_TIMEOUT:15
> ALLOW_EXTERNAL_CLIENTS:0
> FRIENDLY_IPS:
> URL_LOG:0
> MAX_CONNECTION_BACKLOG:5
> [CLIENT_HEADER]
> [NTLM_AUTH]
> NT_HOSTNAME:
> NT_DOMAIN:honeywell
> USER:e191564
> PASSWORD:
> LM_PART:1
> NT_PART:0
> NTLM_FLAGS: 06820000
> NTLM_TO_BASIC:0
>
>
>
>
>
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam? Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
>
--
Peter Karman . http://peknet.com/ . peter(at)not-real.peknet.com
Received on Mon Aug 7 06:30:36 2006