VASmalltalk – Conversion to UTF8

While working on character UTF8 conversion for Seaside/VAST I was looking for a fast conversion method doing this job. I took the code from Pharo and found, that they had a real speedy method. Then I implemented a way using my ICU wrapper and then I took some numbers: The testing code was in a…

While working on character UTF8 conversion for Seaside/VAST I was looking for a fast conversion method doing this job.

I took the code from Pharo and found, that they had a real speedy method. Then I implemented a way using my ICU wrapper and then I took some numbers:

The testing code was in a way like:

'aString' asUtf8String

and within each run this statement was executed 100.000 times. Here is a table with times for each run:

'aString'           Smalltalk-only ICU (single byte)   ICU (UTF16)
1x     'a'        ->     55ms          167ms          
10x    'a'        ->    125ms          174ms
1x     'ä'        ->     70ms          170ms             80ms
10x    'ä'        ->    290ms          179ms
50x    'ä'        ->   1200ms          208ms            107ms

For short strings the Smalltalk-only code is faster. For larger strings the ICU solution is faster.

It also shows, that the external C call is expensive. The ICU single byte call consists of 2 “C” calls: one to convert it to UTF16 and the next one to convert it to UTF8. This of course can be done in ONE call, perhaps even within a primitive.

If the string is available in UTF16 we only have to do one call – as the numbers show.

Schrievkrom

VASmalltalk – Conversion to UTF8

Leave a comment Cancel reply

VASmalltalk – Conversion to UTF8

Share this:

Leave a comment Cancel reply