Skip to main content.
home | support | download

Back to List Archive

Re: unescaping of output.

From: Bill Moseley <moseley(at)>
Date: Fri Jan 17 2003 - 22:06:48 GMT
On Fri, 17 Jan 2003, Andrew Smith wrote:

> I've noticed in using Swish-e that the swishdescription output is 
> different depending on whether you get it with the "-x ..." extended 
> output format or the default output format: "-x ..." extended output 
> format HTML-unescapes, e.g. &quot; -> ", etc. and the default output 
> format does not. I know the "-x ..." extended output allows you to specify 
> printf-style format strings and put in C-style escapes, and so the string 
> is processed, and that is probably why it is HTML-unescaped too. So this 
> is an inconsistency, but is it a bug? Maybe it should be documented 
> better.

Ya, I guess I'd call it a bug.  More of a design problem, really.

> cat 1.html
Hello &#65;ndrew "hi"

> ./swish-e  -w andrew -p swishdescription -H0
1000 1.html "Howdy" 78 "Hello Andrew &quot;hi&quot;"

I almost never use -p any more, so I never see that.  That was the
original way to display properties, and since -p places things in quotes
by default swish-e seems to want to escape any quotes.

I don't know what the correct behavior should be, though.  That's the flaw
in the design.

a) 1000 1.html "Howdy" 78 "Hello Andrew &quot;hi&quot;"
b) 1000 1.html "Howdy" 78 "Hello Andrew \"hi\""
c) 1000 1.html "Howdy" 78 "Hello Andrew "hi""

Of those, it would seem like "a" would be the easiest to parse (which I
assume that's why it was used).  There are scripts around that parse that
output and use quotes as the field delimiter when parsing.

I'd say that "c" is really the correct way, and that it's up to the user
to use a good delimiter, like a tab:

> ./swish-e  -w andrew -p swishdescription -H0 -d'\t'
1000    1.html  Howdy   78      Hello Andrew "hi"

which is easy to parse.

Or better use -x.  -x is a lot easier to work with in a perl program:

my $format = join "\t", map { "<$_>" } @display_properties;

   ... -x $format

Then when parsing results:

  my %properties;
  @properties{ @display_properties } = split /\t/;

I agree that it's wrong for swish to encode into an entity because it's
assuming that the output will be HTML, which is not always the case, and
it also doesn't escape all possible chars.

Bill Moseley
Received on Fri Jan 17 22:07:04 2003