I have two installations of Swish-e here at work.
One indexes PDF files read directly from the filesystem and works very well.
The second is attempting to index via the "prog" method, which we have to use as it is indexing a full PHP website with all sorts of code and passwords etc that we don't want searchable of course!
I have been unable to get indexing working, so after going over everything more times than I care to remember I'm resorting to the mailing list:
I run the following command:
Which uses the following spider.conf file:
my %main_site = (
base_url => 'http://someurl',
email => 'someemail',
ignore_robots_file => 1,
debug => 'info'
@servers = ( \%main_site );
This produces the following output:
[root@wiki cgi-bin]# /usr/libexec/swish-e/spider.pl spider.conf
/usr/libexec/swish-e/spider.pl: Reading parameters from 'spider.conf'
-- Starting to spider: http://someurl/index.php --
Summary for: http://someurl/index.php
Connection: Close: 1 (1.0/sec)
Unique URLs: 1 (1.0/sec)
So the process seems to start OK but never gets past the first page.
Any suggestions as to what is (not) happening and what I've done wrong?
Users mailing list
Received on Wed Jul 2 00:11:59 2008