## DEV Community is a community of 865,621 amazing developers

We're a place where coders share, stay up-to-date and grow their careers.

Rex

Posted on

# A Naïve Construction of a Suffix Array for an auto-complete function with considerable input size via TDD workflow

### What is a suffix array

Wikipedia definition:

In computer science, a suffix array is a sorted array of all suffixes of a string. It is a data structure used in, among others, full text indices, data compression algorithms, and the field of bibliometrics.

Suffix Array is a space efficient alternative to Suffix Tree. The construction of a suffix array can be O(n) with an advanced algorithm. It can handle extremely large input, and it makes searching substrings very efficient.

#### Example

The steps to build a suffix array for the word Banana are:

1. find a list of all the suffixes. They are:
Suffix index
banana\$ 1
anana\$ 2
nana\$ 3
ana\$ 4
na\$ 5
a\$ 6
\$ 7
1. sort the suffixes. The array of the new sorted indexes is the suffix array for Banana: [7, 6, 4, 2, 1, 5, 3].
Suffix index
\$ 7
a\$ 6
ana\$ 4
anana\$ 2
banana\$ 1
na\$ 5
nana\$ 3

Because we can reconstruct all suffixes with the original string and its suffix array, there is no need to store any suffixes; therefore, it is very space-efficient compared to the suffix tree.

Typically, we add a `\$` sign to the end of the word to separate words i.e. to prevent a suffix of the previous word from being treated as prefixes to the following word. For example, if we are tasked to build a suffix array for Love Banana, we would concatenate the two words into a single string `Love\$Banana\$` and then build the suffix array. The `\$` sign prevents the program to treat `ove` as a prefix of `banana`. Note that we can choose any non-alphabet characters as long as it is smaller than any alphabet.

### Auto-complete

1. Input: An array of terms and a search text
2. Output: a list of terms containing the given search text
##### Approach

build a suffix array(naive approach: O(n^2logn), there are ways to build a suffix array at O(n))

1. concatenate all terms, with \$ at the end of each term, to prevent suffix being treated as a prefix to other terms
2. get all suffixes from the concatenated string
3. sort the suffixes, remove suffixes starting with \$
4. build a suffix array from the sorted suffixes

binary search the suffix array to find suffixes matching the search text

1. start in the middle, compare search text with the middle suffix
2. if smaller, search in the left half
3. if larger, search in the right half
4. if found a match to the search text
• expand and get all adjacent matches
5. return all matches

### Code

#### Tests

Tests result from the above planning in the approach section and TDD workflow. The tests can also serve as documentation for the implementations.

#### Implementations

My ultimate goal is to write simple and easy to understand code i.e. require very low cognitive load to read. If you are interested, you can read this awesome blog post by David Whitney Writing good code by understanding cognitive load.

If you feel you need to work out mentally on how something works in the code, I have failed.