DEV Community 👩‍💻👨‍💻

Omar Dulaimi
Omar Dulaimi

Posted on

How to properly handle UTF-8 BOM files in Nodejs?

Did you know that 𝗝𝗦𝗢𝗡.𝗽𝗮𝗿𝘀𝗲 will crash when you try to parse a 𝗨𝗧𝗙-𝟴 𝗕𝗢𝗠 encoded string?

This happens because 𝗡𝗼𝗱𝗲𝗷𝘀 does not strip 𝗕𝗢𝗠 when reading files with the 𝗳𝘀 methods. We have to do this in our code.

So instead of parsing the file string, you should use the decode method from the 𝗧𝗲𝘅𝘁𝗗𝗲𝗰𝗼𝗱𝗲𝗿 class. This class constructor provides a flag to strip 𝗕𝗢𝗠 characters, that is enabled by default. The decode method takes a buffer as an input, so when reading the file, don't set the encoding for the operation so that it remains as a buffer.

𝗕𝗢𝗠 stands for 𝗯𝘆𝘁𝗲 𝗼𝗿𝗱𝗲𝗿 𝗺𝗮𝗿𝗸; 𝖺 𝗌𝖾𝗊𝗎𝖾𝗇𝖼𝖾 𝗈𝖿 𝖻𝗒𝗍𝖾𝗌 𝗎𝗌𝖾𝖽 𝗍𝗈 𝗂𝗇𝖽𝗂𝖼𝖺𝗍𝖾 𝖴𝗇𝗂𝖼𝗈𝖽𝖾 𝖾𝗇𝖼𝗈𝖽𝗂𝗇𝗀 𝗈𝖿 𝖺 𝗍𝖾𝗑𝗍 𝖿𝗂𝗅𝖾.

Did you learn something new today?

Like and share this post, and follow me for more!


Top comments (0)

In defense of the modern web

I expect I'll annoy everyone with this post: the anti-JavaScript crusaders, justly aghast at how much of the stuff we slather onto modern websites; the people arguing the web is a broken platform for interactive applications anyway and we should start over;

React users; the old guard with their artisanal JS and hand authored HTML; and Tom MacWright, someone I've admired from afar since I first became aware of his work on Mapbox many years ago. But I guess that's the price of having opinions.