While working on character UTF8 conversion for Seaside/VAST I was looking for a fast conversion method doing this job.
I took the code from Pharo and found, that they had a real speedy method. Then I implemented a way using my ICU wrapper and then I took some numbers:
The testing code was in a way like:
'aString' asUtf8String
and within each run this statement was executed 100.000 times. Here is a table with times for each run:
'aString' Smalltalk-only ICU (single byte) ICU (UTF16) 1x 'a' -> 55ms 167ms 10x 'a' -> 125ms 174ms 1x 'ä' -> 70ms 170ms 80ms 10x 'ä' -> 290ms 179ms 50x 'ä' -> 1200ms 208ms 107ms
For short strings the Smalltalk-only code is faster. For larger strings the ICU solution is faster.
It also shows, that the external C call is expensive. The ICU single byte call consists of 2 “C” calls: one to convert it to UTF16 and the next one to convert it to UTF8. This of course can be done in ONE call, perhaps even within a primitive.
If the string is available in UTF16 we only have to do one call – as the numbers show.