VASmalltalk – Conversion to UTF8

While working on character UTF8 conversion for Seaside/VAST I was looking for a fast conversion method doing this job.

I took the code from Pharo and found, that they had a real speedy method. Then I implemented a way using my ICU wrapper and then I took some numbers:

The testing code was in a way like:

'aString' asUtf8String

and within each run this statement was executed 100.000 times. Here is a table with times for each run:

'aString'           Smalltalk-only ICU (single byte)   ICU (UTF16)
1x     'a'        ->     55ms          167ms          
10x    'a'        ->    125ms          174ms
1x     'ä'        ->     70ms          170ms             80ms
10x    'ä'        ->    290ms          179ms
50x    'ä'        ->   1200ms          208ms            107ms

For short strings the Smalltalk-only code is faster. For larger strings the ICU solution is faster.

It also shows, that the external C call is expensive. The ICU single byte call consists of 2 “C” calls: one to convert it to UTF16 and the next one to convert it to UTF8. This of course can be done in ONE call, perhaps even within a primitive.

If the string is available in UTF16 we only have to do one call – as the numbers show.

Advertisement
This entry was posted in Smalltalk and tagged , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.