Skip to main content.
home | support | download

Back to List Archive

Problems with cron

From: Tajoli Zeno <tajoli(at)not-real.cilea.it>
Date: Thu Apr 17 2003 - 16:33:29 GMT
Hi to all,

I have problem to index.
I want to launch all by cron
But the process starts and immediately it dies.
It seems that the process wants an output on a terminal.
I haven't problemen if I send the command on the command line.
If I send this command on the command line: "swish-e -S prog -c swish.conf 
-v 2 >log.sw &", I don't  everything in the file "log.sw". Some pieces are 
on terminal. On terminal there are logs like that:

bash-2.05a$ swish-e -S prog -c swish.conf -v 2 > log.l &
[1] 16703
bash-2.05a$ ./spider.pl: Reading parameters from 'spider.conf'


Summary for: http://meneghetti.univr.it/CATDOP99.doc
DOC transformed:     1  (0.1/sec)
     Total Bytes: 7,925  (660.4/sec)
      Total Docs:     1  (0.1/sec)
     Unique URLs:     1  (0.1/sec)

Summary for: http://www.cilea.it/Virtual_Library/bibliot/doppi/prova_pdf.pdf
PDF transformed:      1  (0.1/sec)
     Total Bytes: 16,072  (1236.3/sec)
      Total Docs:      1  (0.1/sec)
     Unique URLs:      1  (0.1/sec)

[1]+  Done                    swish-e -S prog -c swish.conf -v 2 >log.sw


INFO:


My crontab is so:
20 17 * * *    swish-e -S prog -c swish.conf -v 2 >log.sw &



In swish.conf I write:
# Program to read documents
IndexDir ./spider.pl
# Define the config file for the spider to use
SwishProgParameters spider.conf
# Use libxm2 for parsing documents
DefaultContents HTML*
IndexContents TXT* .txt .text
# Cache document contents in the index for context display
StoreDescription HTML <body>
StoreDescription HTML2 <body>

I don't modified anything in spider.pl
My spider.conf is so:

my %Server1 = (
base_url        => 
'http://wwwbiblio.polito.it/it/documentazione/bcadoppi.html',
email           => 'tajoli@cilea.it',
delay_min       => .2,
max_size        => 1_000_000,
max_depth       => 0,
keep_alive      => 1,
);
my %Server2 = (
base_url        => 
'http://www.cilea.it/Virtual_Library/bibliot/doppi/doppivalli
sneri.html',
email           => 'tajoli@cilea.it',
delay_min       => .2,
max_size        => 1_000_000,
max_depth       => 0,
keep_alive      => 1,
);
[...]
@server=( \%Server1, \%Server2, ....);

In fact I want to index many single web pages on differents sites
I work with swish-e 2.2.3 on Linux  2.4.18-19.7.xsmp (Red Hat)


Any ideas ?

Thanks for all.



Zeno Tajoli
tajoli@cilea.it
CILEA - Segrate (MI)
02 / 26995321
Received on Thu Apr 17 16:33:34 2003