DEV Community

James Moberg
James Moberg

Posted on • Updated on

Convert Unicode strings to ASCII with ColdFusion & JUnidecode

I’ve struggled for years attempting to identify the best solution for converting unicode accents and other characters using ColdFusion. I’ve used regex, java.text.Normalizer, ICU4J Transliterate and Apache.Lang3.StringUtils.StripAccents and recently scrapped them all in favor of using JUnidecode. JUnidecode is a Java port of Text::Unidecode perl module. The JUnidecode Java library only has one method and it takes a string and transliterates it to a valid 7-bit ASCII String (obviously it also strips diacritic marks).

Examples:

  • Москвa becomes Moskva.
  • čeština becomes cestina.
  • Հայաստան becomes Hayastan.
  • Ελληνικά becomes Ellenika.
  • 北亰 becomes Bei Jing
  • Häuser Bäume Höfe Gärten becomes Hauser Baume Hofe Garten
  • daß becomes dass

WARNING: Please be aware that Junidecode doesn't like emojis. You may need to sanitize (or convert to aliases) using cf-emoji-java prior to using converting to ASCII7.

Here's a demo script I've written that has some generic test cases:
https://gist.github.com/JamoCA/6565bd4e2526b7c177a5f0cde3980d1c

Discussion (0)