06-14-2019, 09:18 PM
(06-13-2019, 03:12 PM)undeath Wrote: since utf-16 uses a fixed number of two bytes per character.
I am sorry, I may missunderstand you.
UTF-16 save one Unicode element in two or four bytes, e.g. "𝓮𝔁𝓪𝓶𝓹𝓵𝓮" is build out of seven Unicode elements / characters and in UTF-16 it is saved in 28 bytes. This should be true for UTF-8 and UTF-32, too.
Code:
$ cat example_.txt
𝓮𝔁𝓪𝓶𝓹𝓵𝓮
$ cat example_.txt | hexdump -C
00000000 f0 9d 93 ae f0 9d 94 81 f0 9d 93 aa f0 9d 93 b6 |................|
00000010 f0 9d 93 b9 f0 9d 93 b5 f0 9d 93 ae 0d 0a |..............|
$ cat example_.txt | iconv -f utf8 -t utf16le > example_u16.txt
$ cat example_u16.txt | hexdump -C
00000000 35 d8 ee dc 35 d8 01 dd 35 d8 ea dc 35 d8 f6 dc |5...5...5...5...|
00000010 35 d8 f9 dc 35 d8 f5 dc 35 d8 ee dc 0d 00 0a 00 |5...5...5.......|
Of cause you can combine multiple Unicode elements into a single visible one, e.g. "🧛🏽♀️". But this is not what I mean.
side note: I am aware of the fact that the correct visualisation of my examples depend on different factors. The first example should be ok on most systems while the last one seems to be broken on most systems.