DEV Community

Mark Railton
Mark Railton

Posted on • Originally published at markrailton.com

Comparing strings that may or may not contain diacritics in PHP

Today I ran into something that really had me scratching my head, I had to compare a string from a form against a string from the database. Clearly that's not where the issue was as it's a pretty simple thing in PHP, what had me scratching my head was that I needed to account for diacritics possibly being in 1 string but not in the other.

I spent quite some time looking online but eventually took to twitter and asked the wondrous PHP community for help

Ok, taking a complete blank and need some #php help. Need to compare 2 strings that may or may not contain diacritics. Example, Seán matches Sean. Don't know why I can't figure this one out, anyone any ideas?

— Mark Railton (@railto) August 6, 2021

Within minutes I had a couple of people offering suggestions and health conversation ensued. I settled on a solution by Derick Rethans That uses the Collator class from the intl extension. I took the example provided by Derick and tweaked it just a bit to suit how I wanted it, snippet of which is below

$c = new Collator( 'en' );
$c->setStrength( Collator::PRIMARY );

if ($c->getSortKey($newUser['firstname']) !== $c->getSortKey($existingUser->firstname)) {
    return null;
}
Enter fullscreen mode Exit fullscreen mode

To give a bit more context on this, Let's say we have a user called Sean. Sometimes people called Sean will spell it Sean but others may spell it Seán with the Irish diacritic Fada. Both of these people are called Sean and both spellings are seen as correct, however when doing a direct comparison in PHP (or any other language really) you'll end up getting a mismatch if you try using the equals operator. For the task I've been working on, it was important that we allow for the same person possibly having the Fada in their name in the database, but then maybe not entering it another time in a different form.

Thanks to Derick, Ben and the others that posted possible solutions on the twitter thread. It really helped and thankfully I was able to move on with the task.

Top comments (0)