Should UTF-8 CSV files contain a BOM (byte order mark)

csvfile formatsstandardsunicode

Our line-of-business software allows the user to save certain data as CSV. Since there are a lot of different formats (all called "CSV") in use in the wild, we are tying to decide what the "default format" should look like.

  • Regarding line/field separators and escaping, there is a standard we can use: RFC 4180.

  • Regarding text encoding, UTF-8 seems to have emerged in the last decade as the "default text file format", so we will use that.

The one question left open is: Should we add a BOM at the start or not? I have read multiple opinions and pros/cons on the use of BOMs in general, but is there an "official" recommendation or at least some kind of community consensus on the use of BOMs in CSV files?

Best Answer

Not for UTF-8, but see the various caveats in the comments.

It's unnecessary (UTF-8 has no byte order) unlike UTF-16/32 and not recommended in the Unicode standard. It's also quite rare to see UTF-8 with BOM "in the wild", so unless you have a valid reason (e.g. as commented, you'll be working with software that expects the BOM) I'd recommend the BOM-less approach.

Wikipedia mentions some mainly Microsoft software that forces and expects a BOM, but unless you're working with them, don't use it.

Related Topic