On Tue, Nov 25, 2003 at 01:45:05PM -0800, Kissman, Paul (BLC) wrote:
> I am using the Swish-E (version 2.4.0) prog method along with the
> supplied spider script and the Filters Option to index both html and pdf
> files (and Word files) on our web site. I've got my SwishSpiderConfig
> file working; everything is fine in all regards but one. Most of my
> website's html pages use server side includes, and I am not getting any
> lastmodified date information for these shtml files.
http://httpd.apache.org/docs/howto/ssi.html.html
look for xbithack
> =20
>
> After some digging around I find out that the LWP package can't find the
> file modification date because the full page is generated dynamically
> through http and the filesystem modification timestamp for the main part
> of the page is not available to it.
Kind of. See above.
>
> =20
>
> I was thinking that one could insert a function in spider.pl that would
> quickly map the URL to the actual file, then go out and grab the actual
> file's timestamp if the web page were on the local server, and then
> stuff it in as the "Last-Mtime" value in the $headers string that gets
> returned to the indexer.
Sure if your httpd.conf is simply a one-to-one mapping to DocumentRoot.
>
> =20
>
> Is this a reasonable approach? Has anyone done this or solved this
> problem a different way?
Well (for fun), you could try this, but I would not recommend it. Say your
DocumentRoot is /var/www
spider.pl default file:///var/www/index.html
--
Bill Moseley
moseley@hank.org
Received on Tue Nov 25 22:23:59 2003