Bill Moseley wrote:
> Not sure what the issue is yet, but I'll read on. Let me know if any
> of this doesn't make sense. The short answer, I think, is to not use
> a spider.pl config file at first, and let it use the default config
OK - I've moved to the default spider.pl setup. Seems to be much better.
Still having issues with PDFs but these issues do not appear to be
related to swish.
> Also, you might find it less load on the web server to use keep_alive
> than using a one second delay. And faster indexing, too.
I'm confused about how to tweak the default setup to use the keep_alive
versus only the delay.
>>Common to both examples, the StoreDescription does not appear to be acted
>>on. I have no descriptions available via <swishdescription>, I get some
>>Date Time String (e.g " Local Time : 1:12:01 PM PT") instead.
> Oh, you were asking about storing the descriptions:
> $ cat c
> DefaultContents HTML*
> StoreDescription HTML* <body> 50
> $ swish-e -e -S prog -i stdin -c c -v0 < pport
> $ swish-e -w port -m1 -p swishdescription -H0
> 1000 http://test.portofoakland.com/pdf/boar_shee_040622.pdf "boar_shee_040622.pdf" 124467 "C JOHN PROTOPAPPAS President PATRICIA A. SCATES Fi"
> Not sure where that first "C" (before John) comes from, but that's a separate issue.
> But that's the 50 chars stored in the description.
OK - I see that now. Appears that the PDFs are gettting descriptions but
my html/asp pages are not. I think this might be becuase my body tags
<body leftmargin="0" topmargin="0" rightmargin="0"
Would this interfer with the
StoreDescription HTML* <body> 320
directive? I'm currently running a test with the directive like this:
StoreDescription HTML* '<body leftmargin="0" topmargin="0"
rightmargin="0" marginwidth="0" marginheight="0">' 320
Will be interesting to see if that does anything or not.
Received on Thu Sep 23 10:44:55 2004