This is a follow-up to a January 11, 2019 article I wrote on my old Tumber blog.
The Hacker News posted an article regarding the ability to bypass Microsoft Office 365's "Safe Links" security feature by adding zero-width spaces (ZWSPs). I generated some sample URLs to test against my ColdFusion-based URL blacklist script and discovered that it was also vulnerable. Using isValid("url") returns TRUE for strings with ZWSPs. Clicked links also redirect to the correct destination as the browser (or DNS) appears to automatically ignore the ZWSP characters. This makes it possible for cybercriminals and email scammers to send malware and phishing links through our servers. It's impossible to filter using a blacklist without removing the characters.
For example, this function permanently redirects GET requests for "/shop/?" to "/shop/".
I've developed a ColdFusion CFC with various methods to trim, sanitize (replace/remove) & identify these invisible characters by both character and HTML entities. I'm still evaluating the library by testing it in a couple of smaller projects and am hoping to post more in the very near future. (I've actually been meaning to write this for a while in order to trim non-breaking spaces from data imported by user-generated Excel files.)
I just realized that I never shared the
whitespace.cfc that I developed back in 2019. Public methods include:
- dumpRegex (performs a CFDump of regex rules)
- getConfig (Lists all rules used when a tag is specified)
- getRegex (Generates pipe-delimited REGEX list of whitespace/ZWSP characters. Ex. 'chr(32)|A_Space')
- hasWhiteSpace (Checks if string contains any whitespace)
- hasUnsafeSpace (Checks if string contains unsafe-ish whitespace)
- identifyUnsafeSpace (Provides a array of shortcodes, names, decimal or hex values of identified whitespace and their regex positions)
- leftTrim (Performs a left trim and strips all whitespace)
- rightTrim (Performs a right trim and strips all whitespace)
- fullTrim (Performs a left/right trim and strips all whitespace)
- sanitize (Performs a left/right trim and strips control characters)
- compressText (Santizes, reduces multiple space characters to a single character)
- compressHtml (Replaces a huge amount of unnecessary whitespace from your HTML code)
- singleLine (Modifies content to output on a single line (for logging))