Skip to main content.
home | support | download

Back to List Archive

HTTP spidering - zero results

From: Angel Parn <angel(at)>
Date: Mon Jun 12 2000 - 10:04:59 GMT
Hi Everyone!

I have problem getting HTTP method indexing to work, at the
same time FS method works great. Symptoms of the problem
look silimar to old case from "swish-e archive":

when running following script:
cd /home/web/search
    -S http
    -f /home/web/day.swe
    -c /home/web/search/pp.cfg

following response will be generated:
Indexing Data Source: "HTTP-Crawler"
retrieving (0)...
Removing very common words... no words removed.
Writing main index... no unique words indexed.
Writing file index... no files indexed.
Running time: 21 seconds.
Indexing done!

File /home/web/2000/06/10/day.swe will be created, but without
any keywords.

When I found thread
from archives I thought that PERL needs reconfiguring. I have to say that I
not the owner of the server, and cannot configure server software. But, I
to my surprise that when running helper script:

/home/web/search/ ./ss

I get the files ss.response, ss.links and so on with status code 200
So it works, but I can't understand why I cannot index this through
swish-e (-S http). Maybe my config file is not correct
(I've double-triple-checked it but who knows):

I give the config options of http method which are turned on:
MaxDepth 2
Delay 20
TmpDir /home/tmp
SpiderDirectory /home/web/search
Other parameters are given at command line - IndexDir, IndexFile.

TmpDir is perm 777, for debugging I set the /home/web/search
dir perms to 777 too.

Can this be still PERL fault? Helper script works.

Uh, long posting, but I hope anyone who has more experience than
me will help.

Desperately waiting for hints,
Angel Parn
Received on Mon Jun 12 06:07:41 2000