On Wed, Oct 13, 2004 at 11:53:14PM -0700, email@example.com wrote:
> Swish is counting the carriage return+linefeed line-endings in dos-textfiles
> as only one character.
No, not swish. Swish is opening the input (with a popen() call) as
F_READ_TEXT. Swish is reading HTML, XML or text, so that's the
correct way to read the file. It's the C library that is
converting the DOS line endings to \n.
If you have a C program that fetched data from a data base and you
did: printf("%s\n", first_name ) then that \n is converted to a DOS
line ending when written to the file if you write in text mode.
This makes the programs portable across different platforms. The
programs don't need to worry about the different line endings. A new
line is the same internally on any platform.
> If I do (in Delphi)
> newtext:=StringReplace(oldtext,^M^J,^J,[rfreplaceall]); before
> calculating the length, the indexing works. No matter whether I
> write out the original content, or the one converted to
I don't know anything about Delphi. I would think it uses the same C
library functions, but maybe that's not true. How do you represent
line endings in Delphi? Maybe Delphi is reading your input in
binary mode and not converting DOS line endings to \n.
The problem is if you are writing in binary mode and swish is reading
in text mode then the content-length value you pass to swish won't
match what swish-e reads.
> I think that swish is behaving badly here. The stringreplace can be
> quite slow, and the behavior is not very logical. And if some of the
> files were macintosh-files, a more complicated conversion would be
No, same deal on Mac -- the files are read/written in text mode and
the line breaks are converted.
> Btw, I dont know how servers and browsers handle linebreaks. If the
> content comes from a web-server via http from normal static
> html-pages, isn't it delivered as-is? So the problem would be the
> same on unix-boxes if the content is spidered stuff originating from
> dos / mac.
No, not really. The spider puts the content (what ever it is) into a
Perl string and then asks for its length and uses that for the
Unsubscribe from or help with the swish-e list:
Help with Swish-e:
Received on Thu Oct 14 07:09:58 2004