DEV Community

Akhil
Akhil

Posted on

Group Anagram, Exploring canonical forms

Question : Given an array of strings, group anagrams together.

Eg: ["eat", "tea", "tan", "ate", "nat", "bat"]
Output :
[
["ate","eat","tea"],
["nat","tan"],
["bat"]
]

Let's start by understanding what are anagrams?
An anagram is a word or phrase that's formed by rearranging the letters of another word or phrase.
For eg: Let's considerr the word eat
It's anagrams are : ate, eat, tea.
Each letter occurs exactly at the same frequency as in the original string. Boils down to Do two string have same of each character

Question is asking us to group similar anagrams together, ie if two words are anagrams then they must be grouped together.

Let's go through this step by step.

Basic intuition: The first thing which might pop up in your mind might be to

1> create a frequency array of size 26 (because of a->z = 26) for each string.
2> parse each word and store occurrence of each character, something like :
   string "ate": c[0] = 1     // since a = 0, 
                 c[4] = 1     // since e = 4, and so on.
3> create a dictionary that will map this unique array to a corresponding string.
4> loop through each word's corresponding frequency array and group together 
those words who's frequency array's match.
Enter fullscreen mode Exit fullscreen mode

The idea isn't terrible but that's too much work and as Bill Gates once said:

Bill gates quote

Let's find an easy way to solve this problem.

Here we're trying to group the anagrams in one container, in the previous approach the metric we used to determine if two anagrams were equal or not was to create a frequency array and match frequency array as a metric to determine if two strings were anagram.

So we need a better way of comparing two string, this leads us to the idea of using sorting as that comparison metric.

Idea: If two string are anagrams of each other then their sorted version of string would match

Eg: Let's consider "eat" and "ate"

Sorting "ate" = "aet".
Sorting "eat" = "aet".

Since the sorted version is exactly the same this means that both are anagrams of each other.
Enter fullscreen mode Exit fullscreen mode

This approach has made our lives much better since now our work is reduced just to sort each string and match them.

Code:

var groupAnagrams = function(strs) {
    let map = new Map()

    for (let current of strs){
        let chars = current.split('')
        chars.sort()
        let sorted = chars.join('')
        if (!map.has(sorted)) {
            map.set(sorted, [])
        }
        map.get(sorted).push(current)
    }

    return Array.from(map.values())
};
Enter fullscreen mode Exit fullscreen mode

That's it!

If you stuck around till here, let's understand what's canonical form and how it's applied in real life.

"In mathematics and computer science, a canonical, normal, or standard form of a mathematical object is a standard way of presenting that object as a mathematical expression. Often, it is one which provides the simplest representation of an object and which allows it to be identified in a unique way"

In this question we brought down a string to their most basic form ie their sorted form and based on their sorted forms we grouped them together.

This technique is often used in image recognition and search where an image is converted into a mathematical form and images which match or are a close match to this mathematical form are grouped together and are grouped together as output.

Alt Text

In the above figure, two Starbucks images are being matched based to similarities in their vectors, similarly, on the right, detergents of various brands are bucketed together in to "detergent" category"

The same technique is applied for linear algebra where a mathematical statement is converted to it's a most basic form.
eg: 15x + 12y = 21 can be written as 5x + 4y = 7, this makes searching for solution much faster.

Source: https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/33030.pdf

Hope you like my explanation and learnt a bit about canonical forms and it's uses.

github:https://github.com/AKHILP96/Data-Structures-and-Algorithms/blob/master/problems/groupAnagram.js

Discussion (0)