To be pedantic, the Unicode standard *disrecommends* the use of a BOM in UTF-8-e...

kevin_thibedeau · on Oct 1, 2019

It makes zero sense to specify a byte order for an encoding in which it is irrelevant. It only persists because of a lazy vendor that can't encode Unicode correctly.

erik_seaberg · on Oct 1, 2019

It would have been nice if every well-encoded Unicode document started with BOM and every legacy doc did not, instead of having to guess whether a doc is more likely UTF-8 or Latin-1.

account42 · on Oct 2, 2019

Then concatenating to valid Unicode documents would no longer be valid Unicode. That is bad. And ASCII text would no longer be a valid UTF-8 encoded Unicode document. That is bad. And even when everything has finally switched to UTF-8 every tool ever will still need to handle the BOM. That is bad.

Guessing between valid UTF-8 and Latin-1 is only ever ambiguous when there are multiple non-ASCII characters in a row and all those sequences are made up of a lead byte with the correct number of trailing bytes. How often is that a problem for you in practice?