DEV Community

Cover image for Sorting multidimensional PHP arrays by object values with accented character
William L'Archeveque
William L'Archeveque

Posted on • Originally published at Medium

Sorting multidimensional PHP arrays by object values with accented character

When working on a multilingual website, it often happens that we need to deal with special and accented characters. In Québec (Canada), Websites are most of the time bilingual because we speak French and English. This can cause a headache to developers when dealing with sorting arrays by alphabetical order due to “caractères spéciaux” (french for special characters).


I developed a few methods that can help overcome the difficulties of multidimensional array sorting.

The problem with sorting special characters

The best way to explain the problem is with an example. Let’s say we have a multidimensional array of category objects returned by an API that we want ordered alphabetically by category name for a specific language :

<?php

require 'StringHelper.php';

$helper = new StringHelper();
$categories = [];

$news = new stdClass();
$news->names = ["fr" => "Actualités", "en" => "News"];
$categories[] = $news;

$sports = new stdClass();
$sports->names = ["fr" => "Sports", "en" => "Sports"];
$categories[] = $sports;

$home = new stdClass();
$home->names = ["fr" => "Accueil", "en" => "Home"];
$categories[] = $home;

$events = new stdClass();
$events->names = ["fr" => "Événements", "en" => "Events"];
$categories[] = $events;

$special = new stdClass();
$special->names = ["fr" => "Spécial", "en" => "Special"];
$categories[] = $special;

// Alphabetically sorted result in french : Accueil, Actualités, Événement, Spécial, Sports
// Alphabetically sorted result in english : Events, Home, News, Special, Sports

// Original order
var_dump($categories);

$helper->alphabeticalCompareArrayByKey($categories, 'names', 'fr');

// Alphabetical order (french) by names
var_dump($categories);
Enter fullscreen mode Exit fullscreen mode

It’s a challenge to order this array correctly because of the accented characters and because of the way the array is formatted (with an array of objects).

The solution for sorting an array of objects

I have solved this problem by creating two different functions in a StringHelper.php class that can be used in the application :

<?php

class StringHelper
{

 /**
  * Compare an associative multidimensionnal array by specific object value
  *
  * @param array &$array     Reference of the array to sort
  * @param string $element   Element to order by specific object key from
  * @param string $key       Sort array by this specified key of element
  * @return void
  */
  public static function alphabeticalCompareArrayByKey(&$array, string $element, string $key){
      usort($array, function($a, $b) use ($element, $key) {
          return strcasecmp(self::transliterateString($a->{$element}->[$key]), self::transliterateString($b->{$element}->[$key]));
      });
  }

  /**
   * Replace accented caracters in string
   *
   * Example :
   * echo transliterateString('Événenement'); // evenement
   *
   * @param string  String with accented caracters
   * @return string  Transliterated string
   */
  public static function transliterateString($string)
  {
      $transliterationTable = ['á' => 'a', 'Á' => 'A', 'à' => 'a', 'À' => 'A', 'ă' => 'a', 'Ă' => 'A', 'â' => 'a', 'Â' => 'A', 'å' => 'a', 'Å' => 'A', 'ã' => 'a', 'Ã' => 'A', 'ą' => 'a', 'Ą' => 'A', 'ā' => 'a', 'Ā' => 'A', 'ä' => 'ae', 'Ä' => 'AE', 'æ' => 'ae', 'Æ' => 'AE', 'ḃ' => 'b', 'Ḃ' => 'B', 'ć' => 'c', 'Ć' => 'C', 'ĉ' => 'c', 'Ĉ' => 'C', 'č' => 'c', 'Č' => 'C', 'ċ' => 'c', 'Ċ' => 'C', 'ç' => 'c', 'Ç' => 'C', 'ď' => 'd', 'Ď' => 'D', 'ḋ' => 'd', 'Ḋ' => 'D', 'đ' => 'd', 'Đ' => 'D', 'ð' => 'dh', 'Ð' => 'Dh', 'é' => 'e', 'É' => 'E', 'è' => 'e', 'È' => 'E', 'ĕ' => 'e', 'Ĕ' => 'E', 'ê' => 'e', 'Ê' => 'E', 'ě' => 'e', 'Ě' => 'E', 'ë' => 'e', 'Ë' => 'E', 'ė' => 'e', 'Ė' => 'E', 'ę' => 'e', 'Ę' => 'E', 'ē' => 'e', 'Ē' => 'E', 'ḟ' => 'f', 'Ḟ' => 'F', 'ƒ' => 'f', 'Ƒ' => 'F', 'ğ' => 'g', 'Ğ' => 'G', 'ĝ' => 'g', 'Ĝ' => 'G', 'ġ' => 'g', 'Ġ' => 'G', 'ģ' => 'g', 'Ģ' => 'G', 'ĥ' => 'h', 'Ĥ' => 'H', 'ħ' => 'h', 'Ħ' => 'H', 'í' => 'i', 'Í' => 'I', 'ì' => 'i', 'Ì' => 'I', 'î' => 'i', 'Î' => 'I', 'ï' => 'i', 'Ï' => 'I', 'ĩ' => 'i', 'Ĩ' => 'I', 'į' => 'i', 'Į' => 'I', 'ī' => 'i', 'Ī' => 'I', 'ĵ' => 'j', 'Ĵ' => 'J', 'ķ' => 'k', 'Ķ' => 'K', 'ĺ' => 'l', 'Ĺ' => 'L', 'ľ' => 'l', 'Ľ' => 'L', 'ļ' => 'l', 'Ļ' => 'L', 'ł' => 'l', 'Ł' => 'L', 'ṁ' => 'm', 'Ṁ' => 'M', 'ń' => 'n', 'Ń' => 'N', 'ň' => 'n', 'Ň' => 'N', 'ñ' => 'n', 'Ñ' => 'N', 'ņ' => 'n', 'Ņ' => 'N', 'ó' => 'o', 'Ó' => 'O', 'ò' => 'o', 'Ò' => 'O', 'ô' => 'o', 'Ô' => 'O', 'ő' => 'o', 'Ő' => 'O', 'õ' => 'o', 'Õ' => 'O', 'ø' => 'oe', 'Ø' => 'OE', 'ō' => 'o', 'Ō' => 'O', 'ơ' => 'o', 'Ơ' => 'O', 'ö' => 'oe', 'Ö' => 'OE', 'ṗ' => 'p', 'Ṗ' => 'P', 'ŕ' => 'r', 'Ŕ' => 'R', 'ř' => 'r', 'Ř' => 'R', 'ŗ' => 'r', 'Ŗ' => 'R', 'ś' => 's', 'Ś' => 'S', 'ŝ' => 's', 'Ŝ' => 'S', 'š' => 's', 'Š' => 'S', 'ṡ' => 's', 'Ṡ' => 'S', 'ş' => 's', 'Ş' => 'S', 'ș' => 's', 'Ș' => 'S', 'ß' => 'SS', 'ť' => 't', 'Ť' => 'T', 'ṫ' => 't', 'Ṫ' => 'T', 'ţ' => 't', 'Ţ' => 'T', 'ț' => 't', 'Ț' => 'T', 'ŧ' => 't', 'Ŧ' => 'T', 'ú' => 'u', 'Ú' => 'U', 'ù' => 'u', 'Ù' => 'U', 'ŭ' => 'u', 'Ŭ' => 'U', 'û' => 'u', 'Û' => 'U', 'ů' => 'u', 'Ů' => 'U', 'ű' => 'u', 'Ű' => 'U', 'ũ' => 'u', 'Ũ' => 'U', 'ų' => 'u', 'Ų' => 'U', 'ū' => 'u', 'Ū' => 'U', 'ư' => 'u', 'Ư' => 'U', 'ü' => 'ue', 'Ü' => 'UE', 'ẃ' => 'w', 'Ẃ' => 'W', 'ẁ' => 'w', 'Ẁ' => 'W', 'ŵ' => 'w', 'Ŵ' => 'W', 'ẅ' => 'w', 'Ẅ' => 'W', 'ý' => 'y', 'Ý' => 'Y', 'ỳ' => 'y', 'Ỳ' => 'Y', 'ŷ' => 'y', 'Ŷ' => 'Y', 'ÿ' => 'y', 'Ÿ' => 'Y', 'ź' => 'z', 'Ź' => 'Z', 'ž' => 'z', 'Ž' => 'Z', 'ż' => 'z', 'Ż' => 'Z', 'þ' => 'th', 'Þ' => 'Th', 'µ' => 'u', 'а' => 'a', 'А' => 'a', 'б' => 'b', 'Б' => 'b', 'в' => 'v', 'В' => 'v', 'г' => 'g', 'Г' => 'g', 'д' => 'd', 'Д' => 'd', 'е' => 'e', 'Е' => 'e', 'ё' => 'e', 'Ё' => 'e', 'ж' => 'zh', 'Ж' => 'zh', 'з' => 'z', 'З' => 'z', 'и' => 'i', 'И' => 'i', 'й' => 'j', 'Й' => 'j', 'к' => 'k', 'К' => 'k', 'л' => 'l', 'Л' => 'l', 'м' => 'm', 'М' => 'm', 'н' => 'n', 'Н' => 'n', 'о' => 'o', 'О' => 'o', 'п' => 'p', 'П' => 'p', 'р' => 'r', 'Р' => 'r', 'с' => 's', 'С' => 's', 'т' => 't', 'Т' => 't', 'у' => 'u', 'У' => 'u', 'ф' => 'f', 'Ф' => 'f', 'х' => 'h', 'Х' => 'h', 'ц' => 'c', 'Ц' => 'c', 'ч' => 'ch', 'Ч' => 'ch', 'ш' => 'sh', 'Ш' => 'sh', 'щ' => 'sch', 'Щ' => 'sch', 'ъ' => '', 'Ъ' => '', 'ы' => 'y', 'Ы' => 'y', 'ь' => '', 'Ь' => '', 'э' => 'e', 'Э' => 'e', 'ю' => 'ju', 'Ю' => 'ju', 'я' => 'ja', 'Я' => 'ja'];

      $transliteratedString = str_replace(array_keys($transliterationTable), array_values($transliterationTable), $string);

      return trim(strtolower($transliteratedString));
  }
}
Enter fullscreen mode Exit fullscreen mode

The two functions look complicated at first sight, but they are not really.

Explanation and result (sorting the multidimensional array) 🧙‍♂️

The first function receives 3 parameters, the multidimensional array reference (or array of objects), the element and key that are used to sort the array.

Let’s say we want to order the previous array by french names. The parameters would be :

alphabeticalCompareArrayByKey($categories, 'names', 'fr');

As we are sending the array as a reference, no need to reassign it to a variable. The usort function sorts an array by values using a user-defined comparison function.

Our comparison function is a binary safe case-insensitive string comparison : strcasecmp().

Within the comparison we make sure the accented characters are replaced with the adequate ones (ie. é = e, â = a).

Our comparison will then be successful. 💪

The previous example only works with an array of objects, but you could easily adapt it to compare an array of array by modifying the strcasecmp() part by :

return strcasecmp(self::transliterateString($a[$element][$key]), self::transliterateString($b[$element][$key]));
Enter fullscreen mode Exit fullscreen mode

Let me know if this article helped you to sort your sorting problems!


Cover Image : Edu Grande (@edgr) from Unsplash

Top comments (4)

Collapse
 
jamesrweb profile image
James Robb

Nice, one thing I’d change is to get the characters by char code instead of manually writing them out because then you can’t miss any by accident if you use char code ranges for each language.

Collapse
 
wlarch profile image
William L'Archeveque

Thanks, that is a good idea. The characters array was found somewhere on the Internet and did fit my needs for French accented characters. Although, you can find characters for multiple others languages in it. Getting those by char code would be optimal!

Collapse
 
jonrandy profile image
Jon Randy 🎖️

I think you mean 'accented'

Collapse
 
wlarch profile image
William L'Archeveque

Thanks! I was convinced I was writing it the correct way. In French, accented is written "accentué".