Sebastian Nozzi

Posted on

# Counting word lengths (map approach)

In a previous post we saw already how to solve the problem by using an array.

Specifically we had a 12-length array, and each position was used to increase a counter according to the current word's length. We had to convert from word-length to array-index by subtracting 1 from the first number.

This solution worked perfectly fine and given the constraint of only considering word-lengths up to 12, I would go this route.

But it is worth looking at a map-based approach (in other languages "map" is known as "dictionary"). Some other problems are better solved with this approach.

## Recap

The complete array-based solution looked like this:

import "listUtil"

word_length_counters = [0] * 12

for word in words
word_length = len(word)
if word_length > 12 then continue
counter_index = word_length - 1
current_count = word_length_counters[counter_index]
word_length_counters[counter_index] = current_count + 1
end for

max_count = word_length_counters.max
max_count_index = word_length_counters.indexOf(max_count)
max_word_length = max_count_index + 1

print "The most common word length is " + max_word_length

Indeed, most of the code can remain the same. But some key changes need to be made.

## First changes

First, instead of storing the counters in an array we will use a map. It will be a generic empty map, like this:

word_length_counters = {}

One advantage of using maps is that we can use "human" numbers as keys. That is: we can use directly the word-lenghts (which go from 1 to 12) without having to subtract 1 as we did for array indexes.

In particular MiniScript allows you to use pretty much anything as map-indexes, which is very convenient. It even allows you to use numbers as keys - which we will be doing here.

The first part of the for-loop remains the same:

for word in words
word_length = len(word)
if word_length > 12 then continue

But then it gets interesting. We don't need this:

counter_index = word_length - 1

We can access the current_count by word_length as mentioned:

current_count = word_length_counters[word_length]

And store it back, also using word_length as the key of the map:

word_length_counters[counter_index] = current_count + 1

## Problem found

Putting it all together, this is how the for-loop would look like:

for word in words
word_length = len(word)
if word_length > 12 then continue
current_count = word_length_counters[word_length]
word_length_counters[word_length] = current_count + 1
end for

Looks ok, right? But what happens when you run it?

Why is that? Problem is, the map is initially empty! So, the first time the "current_count" wants to be retrieved, nothing is found. We need to handle the case where the map does not yet have a counter for a given word-length.

We need to ask: does this map have the key "word_length"? If not, add an entry for it with a 0. This will have the effect that retrieving the "current_count" in the next line will find something (a 0 value) to add 1 to.

## First-time access

How do we ask if a map has a certain "key"? In MiniScript this is solved at the "collection" level. That is: there is a method that is called the same and acts similarly for strings, lists and maps. That method is hasIndex.

With that in mind, this is the line we need:

if not word_length_counters.hasIndex(word_length) then word_length_counters[word_length] = 0

Putting that before the first access gives us then this (fixed) for-loop:

for word in words
word_length = len(word)
if word_length > 12 then continue
if not word_length_counters.hasIndex(word_length) then word_length_counters[word_length] = 0
current_count = word_length_counters[word_length]
word_length_counters[word_length] = current_count + 1
end for

## The result-calculation

Now is the time to adapt the result-calculation.

In the array-based approach word_length_counters already contained the word-lengths. But because we are using a map (which maps from word-length to word-counts) we need to look at the values of our map.

The first line would thus change from:

max_count = word_length_counters.max

to:

max_count = word_length_counters.values.max

Because the keys of the map are already word-lengths, we don't need to convert from "index" to "word-length". Even if MiniScript is talking about "index", it really means "key" in the context of a map.

Therefor the next lines can be simplified from:

max_count_index = word_length_counters.indexOf(max_count)
max_word_length = max_count_index + 1

to:

max_word_length = word_length_counters.indexOf(max_count)

The final print remains the same.

## Complete solution

Finally, the complete map-based solution looks like this:

import "listUtil"

word_length_counters = {}

for word in words
word_length = len(word)
if word_length > 12 then continue
if not word_length_counters.hasIndex(word_length) then word_length_counters[word_length] = 0
current_count = word_length_counters[word_length]
word_length_counters[word_length] = current_count + 1
end for

max_count = word_length_counters.values.max
max_word_length = word_length_counters.indexOf(max_count)

print "The most common word length is " + max_word_length

Hopefully it was clear how to use maps in general, and in particular in this case (interestingly, using numbers as keys).

JoeStrout

Nice post! I just wanted to point out another approach to the initial-value problem. Actually, two other approaches. :) One is to initialize your map to zeros right away:

word_length_counters = {}
for i in range(1,12); word_length_counters[i] = 0; end for

The other is to import "mapUtil" and then use the map.get method that module adds. get takes an index, and a default value to use if that index is not found. So then you could safely get the current_count as

current_count = word_length_counters.get(word_length, 0)