Skip to main content.
home | support | download

Back to List Archive

Re: Output from indexing

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Mon Apr 04 2005 - 06:34:51 GMT
On Sun, Apr 03, 2005 at 11:16:24PM -0700, S C wrote:
> 1. rose_1.log - Log file being generated

Unfortunately, it seems like your output file got removed by the email
list.

> my %ROSE_CONFIG;
> %ROSE_CONFIG = (
> 	'email'			=> 'test@testmail.com',
> 	'delay_min'		=> .005,
> 	'use_md5'			=> 1,
> 	'keep_alive'		=> 1,
> 
> 	'use_cookies'	=> 1,
> 
> 	'link_tags'		=> [qw/a frame imagemap/],
> 	'max_files'		=> 1000,
> 
> 	'test_url'		=> sub{
> 									my $uri = shift;
> 									my $server = shift;
> 
> 									# Skip requesting files that are probably not text
> 									return if $uri->path =~ m[\.(?:\.gif|\.jpeg|\.jpg|\.png|\.ppt|\.xls|\.au|\.mov|\.mpg|\.mpeg|\.css|\.js|\.class|\.zip|\.gz|\.tar)$]i;
> 									# TEMPORARILY SKIPPING PDFS AND DOCS
> 									#return if $uri->path =~ m[\.(?:\.pdf|\.doc|\.gif|\.jpeg|\.jpg|\.png|\.ppt|\.xls|\.au|\.mov|\.mpg|\.mpeg|\.css|\.js|\.class|\.zip|\.gz|\.tar)$]i;

Sure a lot of tabs.

My guess is those regular expressions are not what you want.  You
don't have "foo..pdf", right?

> 	@servers = (
> 	{
> 		'base_url'		=> 'http://fmg.lse.ac.uk/publications',
> 		'same_hosts'	=> [ qw!http://www.fmg.lse.ac.uk/publications! ],
> 
> 		'email'			=> $ROSE_CONFIG{'email'},

You can also just do this to save your self a bit of typing:

    @servers = (
        {
            %ROSE_CONFIG,
            base_url => 'http://....',
        },



-- 
Bill Moseley
moseley@hank.org

Unsubscribe from or help with the swish-e list: 
   http://swish-e.org/Discussion/

Help with Swish-e:
   http://swish-e.org/current/docs
   swish-e@sunsite.berkeley.edu
Received on Sun Apr 3 23:34:51 2005