DEV Community

Omar Dulaimi
Omar Dulaimi

Posted on

How to properly handle UTF-8 BOM files in Nodejs?

Did you know that ๐—๐—ฆ๐—ข๐—ก.๐—ฝ๐—ฎ๐—ฟ๐˜€๐—ฒ will crash when you try to parse a ๐—จ๐—ง๐—™-๐Ÿด ๐—•๐—ข๐—  encoded string?

This happens because ๐—ก๐—ผ๐—ฑ๐—ฒ๐—ท๐˜€ does not strip ๐—•๐—ข๐—  when reading files with the ๐—ณ๐˜€ methods. We have to do this in our code.

So instead of parsing the file string, you should use the decode method from the ๐—ง๐—ฒ๐˜…๐˜๐——๐—ฒ๐—ฐ๐—ผ๐—ฑ๐—ฒ๐—ฟ class. This class constructor provides a flag to strip ๐—•๐—ข๐—  characters, that is enabled by default. The decode method takes a buffer as an input, so when reading the file, don't set the encoding for the operation so that it remains as a buffer.

๐—•๐—ข๐—  stands for ๐—ฏ๐˜†๐˜๐—ฒ ๐—ผ๐—ฟ๐—ฑ๐—ฒ๐—ฟ ๐—บ๐—ฎ๐—ฟ๐—ธ; ๐–บ ๐—Œ๐–พ๐—Š๐—Ž๐–พ๐—‡๐–ผ๐–พ ๐—ˆ๐–ฟ ๐–ป๐—’๐—๐–พ๐—Œ ๐—Ž๐—Œ๐–พ๐–ฝ ๐—๐—ˆ ๐—‚๐—‡๐–ฝ๐—‚๐–ผ๐–บ๐—๐–พ ๐–ด๐—‡๐—‚๐–ผ๐—ˆ๐–ฝ๐–พ ๐–พ๐—‡๐–ผ๐—ˆ๐–ฝ๐—‚๐—‡๐—€ ๐—ˆ๐–ฟ ๐–บ ๐—๐–พ๐—‘๐— ๐–ฟ๐—‚๐—…๐–พ.


Did you learn something new today?

Like and share this post, and follow me for more!

UTF-8 BOM

Top comments (0)