loading...

Convert Unicode strings to ASCII with ColdFusion & JUnidecode

gamesover profile image James Moberg ・1 min read

I’ve struggled for years attempting to identify the best solution for converting unicode accents and other characters using ColdFusion. I’ve used regex, java.text.Normalizer, ICU4J Transliterate and Apache.Lang3.StringUtils.StripAccents and recently scrapped them all in favor of using JUnidecode. JUnidecode is a Java port of Text::Unidecode perl module. The JUnidecode Java library only has one method and it takes a string and transliterates it to a valid 7-bit ASCII String (obviously it also strips diacritic marks).

Examples:

  • Москвa becomes Moskva.
  • čeština becomes cestina.
  • Հայաստան becomes Hayastan.
  • Ελληνικά becomes Ellenika.
  • 北亰 becomes Bei Jing
  • Häuser Bäume Höfe Gärten becomes Hauser Baume Hofe Garten
  • daß becomes dass

WARNIRG: Please be aware that Junidecode doesn't like emojis. You may need to sanitize (or convert to aliases) using cf-emoji-java prior to using converting to ASCII7.

Here's a demo script I've written that has some generic test cases:
https://gist.github.com/JamoCA/6565bd4e2526b7c177a5f0cde3980d1c

Posted on by:

gamesover profile

James Moberg

@gamesover

I’m a ColdFusion web application developer at SunStar Media located in Monterey, CA. I am a fan of technology, music and web development.

Discussion

pic
Editor guide