Some background not covered in an otherwise pretty good article:<p>"In general, don’t save a Byte Order Mark (BOM) — it’s not needed for UTF-8, and historically could cause problems."<p>This attitude comes from agony in processing from UTF-16 files. I interface with a group that finds it hilarious to send me textual data in UTF-16 format and the first hard won lesson you learn with UTF-16 is superficially the default order should be correct 50% of the time if guessed randomly but somehow its always wrong. So say you read one line of a UTF-16 text file and process it accordingly after passing it thru a UTF-16 decoder. OK no problemo, it had a BOM as the first glyph/byte/character/whatever and was converted and interpreted correctly. Then you read another line, just like you'd read a line process a line with ASCII or UTF-8. However they only give me a BOM at the start of a file not a start of line, so invariably I translate that to garbage because the bytes are swapped.<p>Now there are program methods to analyze the BOM and memorize it. Or read the whole blasted multi-gig file into memory all at once and then de-UTF-16 it all at once and then line by line the file. But fundamentally its a simple one liner sysadmin type job to just shove the file thru a UTF-16 to UTF-8 translator program before it hits my processing system. I already had to unencrypt it, and unzip it, and verify its hash so I know they sent the whole file to me (and correctly), so adding a conversion stage is no big deal.<p>And this kind of UTF-16 experience is what leads people to do things like say "oh, its unicode? That means I should squirt out BOMs as often as possible" even though that technically only applies to unicode UTF-16 and is not helpful for UTF-8.