VASmalltalk – CodePages …..

users of VASmalltalk can query its system, which code page they use. Running VA under a german Windows results in an answer: code page ibm-819.

The native operating system is running (as you might know) under code page windows-1252.

I’ve never understood why VA tells me that different code page, but in normal life I’ve seldom found this to be a problem.

When working with the parse C-APIs of ICU I tried to parse “1234,34€” and got lots of problems with that string containing the EURO sign.

The ICU API is very often UTF16 oriented. That means, that single byte (code page) oriented strings have to be converted to UTF-16 before calling the API.

Now one should know, that code page 819 does not contain the EUR sign and converting this string (seen as 819 string) to UTF16 does not convert the code point of this visible EUR sign (the numeric value remains the same after UTF16 conversion: 0x80) and the meaning of “€” gets simply lost.

The right way is to convert the string from code page 1252 to UTF16 and then you get the EUR sign also under UTF16.

I therefore believe, that one should not hope, that under Windows the code page informations returns the correct value. The system gets the code points from the operating system and they deliver 1252 code points to VASmalltalk (and VASmalltalk of course uses 1252 oriented fonts).

This entry was posted in Smalltalk and tagged , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s