Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Are we talking putting a sequence of 8-bit bytes (octets) into Unicode characters by mapping all 256 possible byte values to the first 256 Unicode code points?

If so, if I'm not mistaken, this is actually less efficient than just using Base64 encoding.

When you put those Unicode characters into UTF-8, the first 128 code points are going to require one byte (with a leading 0 bit). The other 128 of them are going to require two bytes. So that's 50% overhead (assuming the blob's bytes are evenly distributed) because half of the values have 0% overhead and the other half have 100% overhead.

Meanwhile, Base64 sticks 6 bits in each encoded character. In 4 characters, you can fit 3 bytes of your raw info. So that's only 33% overhead.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: