Skip to main content.
home | support | download

Back to List Archive

Re: Indexing remote documents

From: Thomas Nyman <thomas(at)not-real.teg.pp.se>
Date: Sun Jun 05 2005 - 13:58:07 GMT
Thanks.. had just figured it out.. your answer is much appreciated.  
No I just need to make sure swish uses to different indexes.

Once again..many thanks!!

Thomas

5 jun 2005 kl. 15.52 skrev Peter Karman:

>
>
> Thomas Nyman scribbled on 6/5/05 8:24 AM:
>
>
>> Sorry about the encrypted mail .. my mistake... which file contains
>> the necessary parameters used when spidering. I found on the site the
>> following
>>
>> #    my %ccenter = (
>>
>> #            email       => 'Lance.Perry(at)not-real.ourdomain.com',
>> #            base_url    => 'http://our.domain.com/ccenter/',
>> #            delay_sec   => '0',
>> #            max_depth   => '1',
>> #            credentials => 'username:password'
>>
>> #   );
>>
>> #    @servers = ( \%ccenter );
>>
>> the question is where should this go?
>>
>>
>
>
> it goes in a config file, by default SwishSpiderConfig.pl. You can  
> name it
> anything you want (e.g., myconfig), if you call it by name from the  
> command
> line. Be sure to take out the leading # signs -- those "comment  
> out" the lines
> in a Perl script.
>
> called like (for example):
>
>   $ spider.pl myconfig | swish-e -S prog -i stdin
>
> to test it, just do:
>
>   $ spider.pl myconfig
>
>
> which will print to stdout.
>
>
>
>
>
>
>
>
>
>
>
>>
>>
>>
>> 5 jun 2005 kl. 13.54 skrev Thomas Nyman:
>>
>>
>>
>>> Hi
>>>
>>> I have created a conf file that contains
>>>
>>> IndexDir http://192.168.1.2/archive/
>>>
>>> I wish to index all files found in the "archive" on the remote
>>> machine. The remote machine uses htpasswd to access it, so one  
>>> need a
>>> password to surf to the machine.
>>>
>>> When running swish i  recieve the following messages
>>>
>>> Indexing Data Source: "HTTP-Crawler"
>>> Indexing "http://192.168.1.2/archive/"
>>> Removing very common words...
>>> no words removed.
>>> Writing main index...
>>> err: No unique words indexed!
>>>
>>> It seems that its not indexing any documents.
>>>
>>> I have not made any particular changes to any other file than my  
>>> conf
>>> file.
>>>
>>> I can successfully index on the same machine that swish is
>>> installed on.
>>>
>>> I'm guessing I'm missing something here but I'm not sure what. I
>>> would appreciate any pointers. If someone wants me to send  
>>> additional
>>> info I will.
>>>
>>> Thanks
>>>
>>> Thomas
>>>
>>>
>>>
>
> -- 
> Peter Karman  .  http://peknet.com/  .  peter(at)not-real.peknet.com
>
Received on Sun Jun 5 06:58:08 2005