I am still having trouble getting the link right on the results page.
I'm now spidering the pages, and the link associated with the pages is like:
http://www.domain.comhttp://www.domain.com/index.html
That is, the base URL is getting appended to the front of the actual
page URL. I'm sure there's a simple config fix for this, but I can't
figure it out.
My swish.conf file:
IndexDir ./spider.pl ./MySQL.pl
SwishProgParameters spider.conf
DefaultContents HTML
StoreDescription HTML <body> 200000
IndexContents HTML .htm .html .phtml
MetaNames swishdocpath swishtitle
My spider.conf file:
@servers = (
{
base_url => 'http://www.domainname.org/index23.phtml',
same_hosts => [ qw/domainname.org/ ],
email => 'jalmberg@identry.com',
# limit to only .html-like files
test_url => sub { $_[0]->path =~ /\.(phtml|shtml|html|htm)$/ },
delay_min => .0001, # Delay in minutes between requests
max_time => 10, # Max time to spider in minutes
max_files => 1000, # Max Unique URLs to spider
max_indexed => 1000, # Max number of files to send to swish for indexing
keep_alive => 1, # enable keep alives requests
# debug => DEBUG_URL,
},
);
# Must return true...
1;
Received on Thu Jan 23 22:38:08 2003