Skip to main content.
home | support | download

Back to List Archive

Re: Problem with foreign characters

From: Zambra - Michael <michael(at)not-real.zambra.com>
Date: Mon Dec 03 2001 - 13:55:18 GMT
Dear Bill,

thanks for all your efforts. I have followed your instructions, bar one
thing. It is funny, but I'm no longer able to type in extended characters,
although I had been able until yesterday. I've been playing with RC_LANG and
RC_LC_ALL, but I have not been able to type in accented vowels in my Telnet
window.

I think the page is being indexed correctly. Instead of the command line
search I have done a search through your script:

http://www.zambra.com/cgi-bin/sw/swish.cgi

The problem is there.

===================log of session
bash$ cat swish-e.conf

IndexFile /opt2/zambra/httpd/cgi-bin/sw/idx/index.swish-e
IndexDir ../../../htdocs/camaron.html
ReplaceRules replace "../../../htdocs" "http://www.zambra.com"
FollowSymLinks yes

WordCharacters  .abcdefghijklmnopqrstuvwxyz
BeginCharacters abcdefghijklmnopqrstuvwxyz
EndCharacters   abcdefghijklmnopqrstuvwxyz
IgnoreFirstChar .
IgnoreLastChar  .


bash$ cat ../../../htdocs/camaron.html
<html>
<head><title>Camarn</title>
</head>
<body>

Test page for search term <b>Camarn</b>

</body>
</html>

bash$ ./swish-e -c swish-e.conf -T parsed_words indexed_words -v 0
Indexing Data Source: "File-System"
White-space found word 'Camarn'
    Adding:[swishdefault:1]   'camarn'   Pos:1  Stuct:0x7 ( HEAD TITLE
FILE )
White-space found word 'Test'
    Adding:[swishdefault:1]   'test'   Pos:2  Stuct:0x9 ( BODY FILE )
White-space found word 'page'
    Adding:[swishdefault:1]   'page'   Pos:3  Stuct:0x9 ( BODY FILE )
White-space found word 'for'
    Adding:[swishdefault:1]   'for'   Pos:4  Stuct:0x9 ( BODY FILE )
White-space found word 'search'
    Adding:[swishdefault:1]   'search'   Pos:5  Stuct:0x9 ( BODY FILE )
White-space found word 'term'
    Adding:[swishdefault:1]   'term'   Pos:6  Stuct:0x9 ( BODY FILE )
White-space found word 'Camarn'
    Adding:[swishdefault:1]   'camarn'   Pos:7  Stuct:0x49 ( EM BODY FILE )
Indexing done!
Received on Mon Dec 3 13:55:45 2001