How to reduce incorrect email addresses
Last week I wrote about the 100% correct way to validate email addresses which, in short, pointed out that detecting ‘invalid’ email addresses misses the majority of mistakes.
I suggested checking that the user attempted to enter an email ( /.+@.+/) and then requiring a response to a confirmation email. In hindsight that post could have been a lot shorter.
Since then, two themes have occupied my daydream machine:
a) pithy retorts to slights gone by (unrelated to this post)
b) how to do a better job of detecting typos in email addresses
So enough of that opinion rubbish, today we’re going to get down and dirty and write some code to help our users get their email addresses right.
You can take a peek at the end result (click the ‘check email’ button) if you’re not into artful storytelling/rambling.
Here’s how a typical email validation flow might look.
That red line will form the backbone of this post.
I’ve referred to my book of statistics and can tell you that exactly 1 in 1000 people will mis-type their email address, and you will never hear from them again. That’s over one million dollars per year. More than forty football courts and a blue whale. (Aw, what’s the matter, whale? Why the long face?)
Good news!
We can do better, and it’s all about detecting typos, which is all about inferring intent. That is, working out if what the user typed is close to what the user meant to type.
I picture it like so: a human is standing in front of me filling out a form on a piece of paper. They hand it to me, I put my glasses right down on the end of my nose for some reason and I read through the fields. I see corey.baxter@gmaiol.com and the following takes place:
Me: “Hey Corey, did you mean to write gmail?” (I would lean my head to the right for gmail because that’s how you say italics.)
Corey: “Oh yeah, what did I put?”
Me: “That’s not important, Corey. What’s important is that you try harder next time.”
I scrunch up the form and throw it away.
Corey: “Hey, couldn’t I have just…”
I silence Corey with a stare over the top of my glasses and hand him a new form. He fills it in and hands it back.
cory.baxter@gmail.com
Me: “Corey, buddy? Do you still spell your name with an e?”
Corey: “Where?”
Me: “Before the y, Corey. Before the y.”
Corey: “Oh! Sorry, I leave that out on accident sometimes. You must think I’m a real nincompoop!”
Me: “I do, Corey, I do.”
Corey: “Well, I’ve just been to the dentist, I’m still a little flustered.”
Me: “Corey, we’re on a boat. We’ve been at sea for some months now. And I’m the only dentist.”
This goes on for quite some time and becomes less and less relevant to the topic at hand.
The point is, we can get a computer to do this. We can help Corey with code.
The anatomy of a tpyo
Before we get to the code, let’s come up with a definition for ‘typo’. We will imagine that the user has attempted to type corey.baxter@gmail.com and we want to check if they mistyped ‘gmail’ — we could look for any of these four scenarios:
- gmoil (wrong letter)
- gmaoil (extra letter)
- gmil (missing letter)
- gmial (swapped letters)
Simple enough. Obviously we don’t want to search for things like ‘gmoil’ exactly, we want some sort of expression. Not a wacky expression, just a regular one. (And now we’ve got two problems.)
Let’s look at each of the four typo types.
Finding a wrong letter in, say, position 3 is easy, we just turn the string gmail into a regex like /gm.il/ since the ‘.’ will match any character. We will loop over the word, one character at a time, trying a new regex for each position, so we will be looking for /g.ail/ and /gm.il/ and /gma.l/ and /gmai./.
Finding an extra letter is much the same. We just use /gma.il/. Boom.
Finding a missing letter is /gma{0}il/. Which is a bit ass-about, but basically says “I want a g, then an m, then exactly zero a’s, then an i, then an l.” The only string that matches this description is ‘gmil’. Yes, I could just remove the letter, but then I wouldn’t be using a regex.
To find switched letters I could not think of a regex! So I will just manually re-create a string using charAt() and substring(). Insult+suggestion combos in the comments are most welcome.
OK so we have the low level logic that we need, let’s write a code inventory:
- a function that takes the user input (the email address) and the domain (gmail)
- for each letter in the domain it will look for a wrong/extra/missing/swapped letter
- if the function finds something that looks like a typo, it will return a suggestion
- if the function doesn’t find a typo, it returns an empty string
I got halfway through writing this post and thought I should probably check if this has been done before. And it has! This one seems nicely done; I’m sure there are others. If you like your code pre-packaged, you may now proceed to get angry that I didn’t put this at the top of the post :)
You put your right part in…
We want to take the email address that a user has typed in and check it against a bunch of common domains. To reduce the number of false positives, we will only pass in the right part of the email address.
If the function finds a typo for any one domain, it returns immediately with the suggested fix. So it matters that gmail is before mail.
You put your left part in…
Checking common domain names is one thing. But why not look for a typo in the left part of the email? If we know the person’s first name and last name, there’s nothing stopping us giving it a crack.
If we pass in cory.baxter@gmail.com and pass in the name Corey, it’s going to suggest that Corey don’t know how to type his own name.
If your name is yazoo or iClaudia (I wouldn’t be surprised), you don’t want an incorrect typo warning, I suspect your life is difficult enough already. So we only pass in the left part of the email, then join it up again if a suggestion is offered.
And you shake it all about
This is where we throw in any random, ad-hoc checks for common mistakes.
If you’re serious about this stuff, go find your company’s email administrator (they will look like the sort of person that would win the French scrabble championships without speaking French). Ask them — gently now — for a list of all the email addresses that have bounced back. Look for common problems, work out how to detect them and fix them. You can then do something like the following. (I’ve put in three example mistakes.)
As with the other functions, if there’s a suspected typo, this returns a suggestion for a fix, otherwise it returns an empty string.
Side-note: a personal first happened today! I got a capturing regex right on the first try. After typing str.replace(/,(co\.\w{2}$)/, ‘.$1’) and seeing it work, I felt like I could just fly to the shops and get some cake. That’s not bragging because I can also tell you that I got most of the other ones wrong on the first try.
Unit tests
Did you know your browser has a built-in unit testing framework?
console.assert() is pretty simple, if the first parameter is falsy, it prints the second parameter. Else it keeps its trap shut.
All together now
Here’s all the code in one place. Click the result tab to try it out.
npm install
I considered making this a package. But I can guarantee that by the time webpack 2, browserify 14, node 7 and whatever the hell else comes out I will have lost interest. So really, I’m doing you a favor by not adding to the pile of un-maintained npm packages.
Counting the wins
So with a little bit of work we can save some users from accidentally entering incorrect, but valid, email addresses before they submit the form. This is good because:
- You will keep the users that you would have lost
- You will save the users who would have come back the hassle of filling your form in again
- As a super happy bonus, a user that types in their email incorrectly might think you’re pretty cool for showing a thoughtful, helpful message
Wanna come backstage?
We’re done with the email validation stuff, thanks for reading!
If you’re keen, I thought I’d share a bit about my dev setup when doing this sort of thing, since I like it when others share theirs. None of this has anything to do with email validation.
- Start a Chrome dev tools snippet (Sources tab > left pane > Snippets).
- Put a console.clear() at the top of the snippet.
- Put on some Puscifer.
- For complex stuff, I’ll write some tests that will run each time I run the snippet.
- Ctrl+Enter/Cmd+Enter runs the script. So making changes and checking I didn’t break something is easy.
- I use Chrome Canary which supports all the new JS goodies I’m likely to need (except for async functions. Quick quiz: which is the only browser that supports async functions?)
- If I’m using ‘const’ or ‘class’ I wrap the whole snippet in an IIFE so Chrome doesn’t complain about an already declared identifier.
- If I need some DOM I usually use jsbin, but for this post I used jsfiddle because it embeds better.
Bye.
Hacker Noon is how hackers start their afternoons. We’re a part of the @AMI family. We are now accepting submissions and happy to discuss advertising & sponsorship opportunities.
If you enjoyed this story, we recommend reading our latest tech stories and trending tech stories. Until next time, don’t take the realities of the world for granted!