Adam Nathaniel Davis

Posted on Mar 11, 2023 • Edited on Mar 12, 2023

Stahhp Screening for TLDs in Your Email Fields

#webdev #ux #javascript #programming

I've had it. I've been quiet on this subject for far too long. And now I feel compelled to finally crank out this angry diatribe. If you're a web developer (and most of the people reading this article are web developers), then for the love of all that is holy, please STAHHP screening email fields against an "approved" set of Top Level Domains (TLDs).

You may think that you're really clever with your super-advanced email validation. You may think that you're forcing users to enter a "valid" email address. But I've seen this done incorrectly sooooo many times that, by this point, I'd bet there's a good chance that you're screwing it up - and pissing off some subset of your users.

History

Almost as soon as you start learning web development, you also learn that you should be validating user inputs. Ideally, you're validating those inputs on the backend and the frontend. (Because it's clunky as hell to submit a form to the server, only to have it spit everything back at you because something didn't look "right".)

And even though frontend validation is only half of the equation, there is a ton of value that can be provided to the user by giving them immediate feedback, in the browser, about fields that don't pass muster. It's elementary to warn a user that a required field is empty or that a given input is too short/long. But for as long as I've been doing this (a quarter-century), it's always been something of a challenge to properly validate email fields.

Originally, email validation was fairly straightforward. Sometimes it was done with regular expressions. Sometimes it was done with more "manual" checks. But the basic validation went something like this:

Ensure that there are no invalid characters in the email address. (For example: a copyright mark - ©. There is no valid email address that contains ©.)
Ensure that even the allowed "special" characters do not repeat. (For example: . is acceptable - and commonly used - in email addresses. But there is no valid email address that contains ...)
Ensure that there's one - and only one - @ character in the email address - and that there are non-empty strings on both sides of the @ character.
Ensure that the portion to the right of the @ character contains at least one . character. (The portion to the right of the last . character - after the @ character - is assumed to be the TLD.)
Ensure that the email address's TLD is "valid".

But it's that last point that causes all sorts of problems...

I remember the early regular expressions that I'd see for email validation. They usually took everything after the last . and checked it against a list of "known" TLDs. And, for a little while at least, this was... workable. Because there was a finite - and fairly static - list of valid TLDs.

In the "early days", nearly all valid emails ended in .com or .net or .edu or .gov or .org or any of the country-specific TLDs (e.g., .uk). So most email validation scripts tried to check the last portion of the email address against these "known-good" TLDs.

The TLD boom

But nowadays, there's a huge proliferation of valid, working TLDs that have nothing to do with the old stalwarts like .com or .net. Your website can have a perfectly valid/functional TLD like .pizza or .health or .voyage. And of course, if your web presence can use those TLDs, then it's entirely possible that your email address may also use those TLDs.

Granted, the vast majority of all websites (and hence, all email addresses) still end in a "common" TLD like .com or .net or .org. But every single day there are new websites - and new email addresses - coming online that do NOT use those common TLDs.

There are still sooooo many sites out there that try to do a strict validation of your email address - and they attempt to do this by checking the TLD against a list of "known-good" TLDs. The problem arises because almost none of these sites are fastidious about ensuring that their list of "known-good" TLDs are truly up-to-date with the actual list of real, live TLDs that are available.

My own private hell

My CV site is at https://adamdavis.codes. My email address is also hosted under adamdavis․codes. Obviously, it doesn't have a "common" TLD. I've done that for two specific reasons:

When I first setup my site, adamdavis.com simply wasn't available.
Even if adamdavis.com was available, I'm extremely happy with adamdavis.codes. I'm a coder. The .codes TLD is a perfect choice for my CV. And as such, it only makes sense that I'd have an email address under the same TLD.

This isn't the only time I've delved into "uncommon" TLDs. My latest project is https://paintmap.studio. I also previously had an email address with a .voyage TLD.

When I first started using these "uncommon" TLDs, I'd find that my email address would frequently get rejected from all sorts of online forms. The form would give me a validation error, stating that my email address isn't "valid". But... it absolutely IS valid!

To be fair, I have found that email addresses, like my personal adamdavis․codes address, are indeed "passing" many more form validations nowadays. But it's still far-too-often that I'm trying to submit an online form - and yet I'm stopped when the website tells me that my perfectly-valid email address is... "invalid".

You know what happens when a website rejects my perfectly-valid email address? Well, if the activity I'm trying to complete is in any way optional, I simply QUIT the process. I've abandoned shopping carts that had hundreds of dollars of items merely because the jank-ass website claimed that my email address was invalid. I've abandoned job applications for the same reason.

Yes, I do have a Gmail account. And in those scenarios where I feel compelled to complete the process, I switch out my preferred .codes email address with my Gmail address. But I don't do this unless I feel that I simply must complete the process. And whether I abandon the process or switch to my Gmail address, the whole failed-validation process simply infuriates me.

When the "new" TLDs first started rolling out, I found this process to be annoying - but understandable. It was easy to see how the web teams supporting these features simply weren't keeping up-to-date with the latest TLD specs. But today? In 2023?? I'm sorry, but it's downright unacceptable.

What do you think you're accomplishing?

Frontend (i.e., JavaScript) form validation is, for the most part, a good thing. The last thing you want to do is give the user a form that allows nearly any completely-illogical value to be submitted. But there's a point where strict validation undermines the user experience. And in some cases, it can downright alienate your users.

Take email validation for example. When I'm implementing email address validation in my forms, I tend to use this NPM package: https://www.npmjs.com/package/@toolz/looks-like-email. (HINT: I wrote this package.) It does exactly what the title implies: It tells me if a given value looks like an email address.

No, it's not an acid test designed to strictly filter out any potential string that could possibly be a bogus email. It doesn't try to match against all known-good TLDs. When I use this package, it's entirely possible that someone may still enter an invalid email address. And you know what? In most cases, I couldn't care less.

Because, if someone manages to sneak an invalid email address past my @toolz/looks-like-email package, they're usually just hurting themselves. For most systems that I build/maintain, an "invalid" email address will simply mean that they don't get the notices they might otherwise expect to receive. But those edge cases would only occur if someone's trying, very hard, to find an invalid address that will pass my filter. And if they're trying that hard to subvert the filter - I don't care. Let them.

There are also some times when you may not need to validate an email field at all. (Or, at a maximum, simply validate that some value's been entered.) We've all seen (or worked on) sites where you must verify your email address before the app will allow you to do anything meaningful. In those kinda scenarios, is it really a tragedy if someone puts BS data in the email address field? The only result will be that they won't be able to properly verify their account (and begin using the app) until they do enter a valid email.

Of course, there are many other form elements that can suffer from being overly strict. For example, I once worked at a company where the user was expected to enter first name and last name values. In the interest of trying to provide "complete" frontend validation, someone set those fields to be invalid if either one contained less-than-three characters. You probably know where this is going...

Although it's fairly uncommon in the US to have a first-or-last name that consists of fewer than three characters, those names do exist. In particular, there are many people, especially those of Asian descent, who have first-or-last names that consist of only two characters. Once the app went live, we immediately started receiving complaints that some people could not complete the online form.

I understand that we commonly set first-and-last name fields as being required - meaning that they must contain some type of value. But if someone only puts, say, their first initial in the first-name field, is that really hurting anyone? Is it really gonna crash the system? Or is it just an overly-fastidious frontend developer deciding - on their own - that every user must enter first/last name values that are over a certain length?

Another good example is phone fields. I've seen an increasing number of phone fields that try to tightly restrict the purely-numeric values that you can enter. But what happens if you don't have a direct phone line? What happens if your phone number looks like this:



+1 904-555-1234 ext. 42

Yes, I have seen some online forms that provide a separate field for Extension. But most don't. So if the only way to reach this person is by entering an extension, and you don't let them enter an extension, you're forcing them to only enter the "main" number - which might connect the caller to the company's main line - and the person who answers may not even know how to forward the call to the user who completed the online form.

Here's another example of user input for a phone field that many online forms will try to block:



+1 904-555-1234 (ONLY BEFORE 5PM)

In this example, the user is trying to tell you, in no uncertain terms, that you should not try to call this number after 5PM. But if you've already decided, in your all-knowing form-developer mode, that no one should ever be allowed to enter alphanumeric characters in the Phone field, you're denying the user the ability to provide these sorts of valuable instructions.

Don't be cute

The main lesson here is: Don't be cute. Yes, you should strive to provide useful form validation. But if you're patting yourself on the back because you're certain that you've blocked every conceivable edge case... there's a good chance that you've also blocked some valid input. And when you block valid input, you run a severe risk of alienating your users.

Also, try to be realistic about just what risks exist if someone enters "bad" data in a given field. Sure, you may believe that a phone number should only ever be numeric. But is it really hurting anything if you allow letters in that field and store it on the backend as an alphanumeric?

Top comments (7)

Peter Ellis • Mar 14 '23 • Edited

My favourite bogus validation is checking that names only contain [a-z][A-Z] characters because it immediately locks out anyone whose name is:

Too posh (-)
Too Irish (')
Too educated (dr.)
Or simply too foreign (literally any non-ASCII letter)

Bonus points are awarded for unnecessarily demanding the name is split into first name and last name, confusing the hell out of everyone using Eastern name order.

Adam Nathaniel Davis • Mar 15 '23

Yes! This is a great example, and one of my pet peeves as well. In fact, I wrote this NPM package: npmjs.com/package/@toolz/string-co... because I see so many scenarios where I'm supposed to validate that something contains, say, only letters, or only numbers. But every single time I see the implemented elsewhere, they really on the basic English/ASCII character set.

Jason Elkin • Mar 15 '23

Ah email address validation... everyone gets this wrong.

Fun facts:

An email address can have multiple '@' symbols in it. me@my@domain.tld is technically valid.
An email address doesn't need a dot after the @. jason@tld is technically valid.

Granted these are rare, but not impossible.

The relevant question isn't really "is this address valid?", it's "can I send to it?". Chances are there are perfectly valid email addresses that your email sending infrastructure doesn't actually support - the only way to know is to try sending to the address.

Red Ochsenbein (he/him) • Mar 12 '23

Also. Don't try to be clever and require a certain list of character types in my password. I'm pretty sure my string of 4 words is more secure than your 8 characters with an @ uppercase P and 0 in it...😆

Adam Nathaniel Davis • Mar 12 '23

This is a great point. And it also begs the question: If someone uses a crappy password, should you really care? I mean, I get it. If you're working on a banking application, or any other app that maintains sensitive data, then maybe you do want to force the user to create a "strong" password. But I've seen so many sites where I need to register to use the app - but the app is some janky low-security utility. In scenarios like that, I really couldn't care less if someone hacks their database and finds out that my password is... "password".

Barry Melton • Mar 13 '23

Agreed completely. People forget that while you can (and should!) validate the smithereens out of an email address, the best way to for sure execute on Step 5 (Validate the TLD) is to send it an email and see if someone can receive it.

This is generally simpler than any other strategy, and definitely more effective.

Angel Umeh • Mar 13 '23

The numbers thing… I was just talking to a friend today about this. I constantly worry about edge cases and would have my nerves on fritz because I’d try to prepare for every single possible entry… and end up introducing completely avoidable bugs 💀