At 05:28 AM 07/12/01 -0700, Chris Blackstone wrote:
>On Mac OS X 10.0.4, if I try to index a web page that has a link to
>another page whose name contains an apostrophe, swish-e crashes with a
>"Bus Error". Is this supposed to happen, or is it a bug? I realize web
>page names shouldn't have apostrophes in them, but I work for a school
>district and have many teachers posting web pages who don't know/care
>about proper web page naming.
The apostrophe is an ok character in a URL according to RFC 2396.
I don't see the "Bus Error" running under Linux, but I do see the problem.
(I had reported back to you that it "hangs" on my machine, but that was an
error as I forgot about the default delay of 60 seconds when spidering with
But there's a potential security issue here.
One of the reasons I added -S prog was I don't really like the current -S
http method -- I think it's slow, and there is really good LWP and
robots.txt support available in Perl. Hence, I have not looked at the
swish spidering code (swishspider, http.c and httpserver.c) much.
The way -S http works in swish is by running a perl helper script for every
URL spidered. It passes each URL to spider to the swishspider perl helper
script via the system() call. The system() call passes its parameters
through the shell, and that is a security risk.
Chris, your problem, I assume, was that the apostrophe in the URL was
ending the quoted parameter (the URL) and beginning a new parameter. That
just gives me a shell error:
sh: -c: line 1: unexpected EOF while looking for matching `''
sh: -c: line 2: syntax error: unexpected end of file
The dangerous part is that someone might be able to construct a link on a
web page that could end up running code on the machine where swish is
Luckily, the swishspider code uses Perl's LWP library for the actual link
extraction, so it should escape most dangerous characters. BUT, I would
always assume that someone could figure out a way to get around that and
end up running malicious code on your server.
What does this all mean?
- never run indexing as root (don't run any thing as root unless you know
why you are doing so!)
- I you don't have full control over the files you are indexing, then you
need to update swishspider. An updated version can be found at:
- Or upgrade to the development version of swish and use the -S prog with
spider.pl which doesn't pass data through the shell.
Merge is currently broken in the development version -- but indexing is so
much faster with much lower memory requirements that you may find merge is
not necessary in many cases.
Received on Mon Jul 16 17:57:24 2001