Character Sets - maemo.org

Active Topics

Is there a section to talk about Java ME? Apps, etc. (0)
to General by Kalatti - 1 day, 19 hrs ago
Extra softwares in Sailfish using CLI, repositories, etc (117)
to SailfishOS by nieldk - 3 days, 16 hrs ago
Firefox with Leste (6)
to Maemo 7 / Leste by teroyk - 4 days, 13 hrs ago
more...

Thread Tools

dmphzhopjrbffx	2008-02-05 , 03:12
Posts: 54 \| Thanked: 2 times \| Joined on Jun 2007	#1

Does anybody understand the device's character sets? I'm using a 770 but it probably applies to all.

I'm trying to use what the virtual keyboard calls "Accents a-m". They're (at least in a Windows context) Unicode C0 and up or something (somewhere in the range 80-FF). But when I type them (say in Notes) and view the subsequent file in hex format (say in Midnight Commander), I see that each one character is actually two characters. It looks like a control character, either C3 or C4, followed by another character.

How do I fix this or override it? I see in Control Panel's "Language & region" settings that I can choose English (USA) as my Device language but how do I find info on Unicode or an extended character set or anything like that? Thanks.

Quote & Reply |

dmphzhopjrbffx	2008-02-06 , 00:34
Posts: 54 \| Thanked: 2 times \| Joined on Jun 2007	#2

Doing a little more research, I found Nokia 770 Internet Tablet: Input Methods. I don't really understand it, though. But it gives me other things to research.

I looked in /usr/share/keyboards. There's 18 .vkb files which seem to correspond to the 17 choices in Control Panel's "Text input settings" under 1st language. The one extra .vkb file would be latin.special.vkb

Interesting observation: if the Notes file I created on the 770 containing "Accents a-m" is opened in Windows, I see the warning...

WARNING: "ABC.txt" contains characters that do not exist in code page 1252 (ANSI - Latin I). They will be converted to the system default character, if you click OK.

Experts sought. Thanks.

Quote & Reply |

jethro.itt	2008-02-06 , 08:22
Guest \| Posts: n/a \| Thanked: 0 times \| Joined on	#3

Originally Posted by dmphzhopjrbffx

I'm trying to use what the virtual keyboard calls "Accents a-m". They're (at least in a Windows context) Unicode C0 and up or something (somewhere in the range 80-FF).

You seem to confuse Unicode and its numerous encodings. If you use a one-byte-per-character encoding, such as ISO-8859-1 or its Microsoft variant Windows-1252, you indeed end up with bytes ranging from 0x80 to 0xFF when you enter accented characters. With these encodings it is not possible to use characters outside of Unicode's first 256 codepoints.

Originally Posted by dmphzhopjrbffx

But when I type them (say in Notes) and view the subsequent file in hex format (say in Midnight Commander), I see that each one character is actually two characters. It looks like a control character, either C3 or C4, followed by another character.

This, on the other hand, is UTF-8, a variable-length encoding of the full Unicode range. It has become the de-facto standard of representing Unicode text on Unix-like systems.

Couple of examples:

"Ä": 0xC4 in ISO-8859-1, 0xC3 + 0x84 in UTF-8
"Ö": 0xD6 in ISO-8859-1, 0xC3 + 0x96 in UTF-8
"ä": 0xE4 in ISO-8859-1, 0xC3 + 0xA4 in UTF-8
"ö": 0xF6 in ISO-8859-1, 0xC3 + 0xB6 in UTF-8
"∞": not possible in ISO-8859-1, 0xE2 + 0x88 + 0x9E in UTF-8

Originally Posted by dmphzhopjrbffx

How do I fix this or override it?

I don't think the Notes application can be forced to use any other character encoding besides UTF-8. Maybe someone will correct me?

None of this has anything to do with the virtual keyboard or its configuration files. The Internet Tablet uses Unicode (and UTF-8) everywhere and that's what you're seeing.

Quote & Reply |

dmphzhopjrbffx	2008-02-06 , 20:24
Posts: 54 \| Thanked: 2 times \| Joined on Jun 2007	#4

Thanks, Jethro. Great expertise and generosity with the links!

Notes isn't really the issue per se. I'm working in an environment with many WLANs, and the administrators have used some 0x80 to 0xFF characters within the key. I've spoken to them about the problem but they'd prefer not changing the key for various reasons if a solution on my end is feasible. Probably foremost they just don't want to notify all the other users if they were to change it, LOL.

So I set up the 770's Connectivity in Control Panel.

First I used the virtual keyboard to fill in the proper dialog with the accented characters, but when I inspected the corresponding gconf file I saw the two-byte-per-character mess (after I tried and failed to connect).

Next I repeated the set up but used a 0 in the key where it needed an accented character, and then used a hex editor to change the 0 (0x30) to Ä (0xC4). That didn't work -- the connection no longer even appeared in the Connections list. I think whatever is used to parse the XML of the gconf, kind of crashed when it came to a character it didn't know what to do with or something.

The relevent XML looked like...
<stringvalue>12305</stringvalue>
...and I wanted...
<stringvalue>123Ä5</stringvalue>

I think it's going to take some brainstorming before I even know if my goal is achievable. Who has more ideas? Thanks.

Quote & Reply |

dmphzhopjrbffx	2008-02-07 , 00:49
Posts: 54 \| Thanked: 2 times \| Joined on Jun 2007	#5

Oops, forgot to say the connectivity gconf files are in...
var/lib/gconf/system/osso/connectivity/IAP/<name of IAP>

And as I'm learning, I see we have tools like...
gconftool
gconftool-2
...via xterm that I need to research, too.

Quote & Reply |

jethro.itt	2008-02-07 , 11:31
Guest \| Posts: n/a \| Thanked: 0 times \| Joined on	#6

Originally Posted by dmphzhopjrbffx

Notes isn't really the issue per se. I'm working in an environment with many WLANs, and the administrators have used some 0x80 to 0xFF characters within the key. I've spoken to them about the problem but they'd prefer not changing the key for various reasons if a solution on my end is feasible. Probably foremost they just don't want to notify all the other users if they were to change it, LOL.

I was under the impression that preshared keys cannot contain characters outside the ASCII range. For example, see this page (look at the Javascript source):
http://www.xs4all.nl/~rjoris/wpapsk.html

It simply rejects all characters where the character code is larger than 126, for both the SSID and the passphrase.

Originally Posted by dmphzhopjrbffx

Next I repeated the set up but used a 0 in the key where it needed an accented character, and then used a hex editor to change the 0 (0x30) to Ä (0xC4). That didn't work -- the connection no longer even appeared in the Connections list. I think whatever is used to parse the XML of the gconf, kind of crashed when it came to a character it didn't know what to do with or something.

The relevent XML looked like...
<stringvalue>12305</stringvalue>
...and I wanted...
<stringvalue>123Ä5</stringvalue>

There's also a 256-bit preshared key in integer form on the beginning of the XML file (OS2008). Maybe that's what's the device really uses and the passhprase is only stored for editing purposes? If you can somehow determine the mapping between non-ASCII-characters and the 256-bit key, it's perhaps possible to modify the key on the XML file and leave the passphrase intact.

EDIT: I looked at the relevant RFC and it seems you can save the page pointed by the link above and modify it to check for character codes upto 255 instead of 126. This way you can easily (but slowly...) generate the key from your SSID and passphrase, even if they contain ISO-8859-1 characters. The code uses Javascript's String.charCodeAt(), which will do the right thing in this case.

Last edited by jethro.itt; 2008-02-07 at 12:01.

Quote & Reply |

« Previous Thread | Next Thread »

Forum Jump

All times are GMT. The time now is 22:26.

Active Topics

Is there a section to talk about Java ME? Apps, etc. (0)

Extra softwares in Sailfish using CLI, repositories, etc (117)

Firefox with Leste (6)