loading...
Cover image for Learn Data Structure and Algorithm in JavaScript | Part 18

Learn Data Structure and Algorithm in JavaScript | Part 18

edisonnpebojot profile image Edison Pebojot(👨‍💻) Updated on ・26 min read


Prerequisite (✋😐)

If you're reading this article right now, please considered to read our Part 01: Big-O Notation, Part 02: JavaScript Unique Part, Part 03: JavaScript Numbers to Part 17: Graphs

Part Table of Contents Description
01 Big-O Notation By the end of this chapter, you will understand how to analyze an implementation of an algorithm with respect to both time (execution time) and space (memory consumed).
02 JavaScript Unique Parts Big-O is important for analyzing and comparing the efficiencies of algorithms. The analysis of Big-O starts by looking at the code, and, applying the rules, applying the rules is because to simplify the Big-O notation linear or quadratic rule is not enough.
03 JavaScript Numbers This part 3 will focus on JavaScript number operations, number representation, Number objects, common number algorithms, and random number generation.
04 JavaScript Strings This part 4 will focus on strings, JavaScript String object, and the String object’s built-in functions. You will learn how to access, compare, decompose, and search strings for commonly used real-life purposes. In addition, the chapter will explore string encoding, decoding, encryption, and decryption.
05 JavaScript Arrays As a JavaScript developer, you will use the array often; it is the most commonly used data structure. Arrays in JavaScript come with a lot of built-in methods. By the end of this part, you will understand arrays and choose the right method
06 JavaScript Object This part will focus on what JavaScript objects are, how they are declared, and how their properties can be changed. In addition, this part will cover how JavaScript classes are implemented using prototypal inheritance. Also this part will be short.
07 JavaScript Memory Management A variable takes up some memory. In C, the programmer allocate and deallocate memory manually. In contrast, modern JavaScript engines have garbage collectors that delete unused variables. However, there are pitfalls(unexpected) that developers can fall into(❗) This part will show these unexpected and present techniques to help the garbage collector minimize the memory problems.
08 Recursion This part 8 introduces the concept of recursion and recursive algorithms(Remember they are different, we will discuss them later(😉)). We will also discuss the definition of recursion and fundamental rules for recursive algorithms.
09 Sets This part focuses on the concepts of sets from both a mathematical definition and on the implementation. Also, Common set operations, as well as their implementations, are covered in great detail (💡).
10 Searching and Sorting This part 10 focuses on searching and sorting for arrays. By the end of this part 10, you will understand how to use sorting and searching algorithms for arrays. Also, this article is a bit complicated for beginners, so as much as possible the visual aids is your friend (👀). (😃)
11 Hash Tables A hash table is a fixed-sized data structure in which the size is defined at the start. This part 11 explains how hash tables work, and the method of generating a unique key. By the end of this part 11, you will understand various hashing techniques and know how to implement a hash table. (😃)
12 Stacks and Queues This part 12 covers stacks and queues(pronounce as kyooz (🔈)) not (kwewe) okay? hehehe (😅); both are data structures used in the implementation of complex data structures. You'll learn what the stacks and queues are, how they're used, when they're used, and how to implement them (😃) Let's go! (🔥🔥🔥)
13 Linked Lists A linked list is a data structure in which each node (or element) points to another node. Unlike arrays, which have a fixed size, a linked list is a dynamic data structure. By the end of this part 13, you will understand how to implement and work with linked lists. And oh! (😮) There are two types of linked lists: singly (➡️) and doubly (↔️). Let’s examine the singly linked list first.(😃) Let's go! (🔥🔥🔥)
14 Caching Caching is the process of storing data into temporary memory so that it can be easily retrieved for later use if it is required again. As an example, a database keeps data cached to avoid re-reading the hard drive, and a web browser caches web images to avoid re-downloading. In this part 14, two caching techniques will discussed: LFU and LRU caching.
15 Trees A general tree data structure is composed of nodes with children nodes. The top node is called the root node. This part 15 will explore different types of trees such as binary trees, binary search trees, and self-balancing binary search trees. First, this part 15 will cover what trees are and how they are structured. Then, it will cover methods of traversing(crossing or taking a zigzag path) the tree data structure in detail. Finally, you will learn about binary search trees and self-balancing binary search trees to understand how to store searchable data. (😃)
16 Heaps A heap is an important data structure that returns the highest or lowest element in O(1) time. This part 16 will focus on explaining how heaps are implemented as well as how to work with them. One example is heap sort, which is a sorting algorithm based on heaps.
17 Graphs In this Part 17, you will learn graph basics, including fundamental terminology and graph types. The Part 17 will also cover working with these different graph types and methods of representing graphs in data structures. Finally, algorithms for traversing, searching, and sorting graphs are explored to solve problems such as finding the shortest path between two graph nodes. (👍)
18 Advance Strings Part 18 will cover more complex string algorithms than covered in the previous section. Now that you have heard of certain other data models or structures, they should be easier to comprehend. Specifically, Part 18 will focus on string searching algorithms. (😉)
19 Dynamic Programming Dynamic programming involves breaking down problems into their subproblems. Solving the subproblems and saving those results into memory to access them whenever a repeated problem needs to be solved, the algorithmic complexity decreases significantly (⬇️). To explain dynamic programming, let’s re examine the Fibonacci sequence that was discussed in Part 8. Then Part 19 will cover the rules of dynamic programming and walk you through some examples to make the concepts more concrete. (😉)
20 Bit Manipulation Bit manipulation is an advanced topic that JavaScript developers typically do not need to know. However, you should learn a bit about bit manipulation if you want to implement high-performance server-side code. Understanding bit manipulation requires some knowledge of digital logic. Any introductory course in discrete math or circuits would be helpful to understand these concepts.

Part 18: Advance Strings (😱 🔥 🔠)

Alt Text

Part 18 will cover more complex string algorithms than covered in the previous section. Now that you have heard of certain other data models or structures, they should be easier to comprehend. Specifically, Part 18 will focus on string searching algorithms. (😉)

Trie (Prefix Tree) (1️⃣🌱)

Alt text

trie is special type of tree used for searching strings and matching strings. At each level, nodes can branch off (separate) to form complete words. For example:


Alt Text

Figure 18-1. Trie of Edison, Edisen, Edi

Figure 18-1 (above) shows a trie of the words: Edison, Edisen, Edi. Each ending node has a boolean flag. This indicates that the word ends in this path. For example:

function FunctionName () {
    this.endOfWord = false;
}
Enter fullscreen mode Exit fullscreen mode

i in Ed_i_ has endOfWord set to true. Nodes with endOfWord set to true are shaded in Figure 18-1 (above). A trie node can be formed by using an object to store the children. The trie has a root node that is instantiated in the constructor (See on Wikipedia) of the Trie class, as shown in the following code block:

// Trie Node
function TrieNode () {
    this.children = {} // Table
    this.endOfWord = false;
}

// Trie Class
function Trie () {
    this.root = new TrieNode();
}
Enter fullscreen mode Exit fullscreen mode

To insert into the trie, the child trie node is created on the root if it does not exist because... for each character in the word being inserted, it creates a child node if the character does not exist, as shown in the following code block:

// Insert
Trie.prototype.insert = function (word) {
    var current = this.root;
    for (var i = 0, l = word.length; i < l; i++) {
        var ch = word.charAt(i);
        var node = current.children[ch];
        if (node == null) {
            node = new TrieNode();
            current.children[ch] = node;
        }
        current = node;
    }
     // Mark the current nodes endOfWord as "true"
    current.endOfWord = true;
}
Enter fullscreen mode Exit fullscreen mode

To search inside a trie, each character of the word must be checked. This is done by setting a temporary variable (current). The current variable is updated as each character in the word is checked. For example:

// Search
Trie.prototype.search = function (word) {
    var current = this.root;
    for (var i = 0, l = word.length; i < l; i++) {
        var ch = word.charAt(i);
        var node = current.children[ch];
        if (node == null) {
            // Node doesn't exist
            return false;
        }
        current = node;
    }
    return current.endOfWord;
}
Enter fullscreen mode Exit fullscreen mode

To delete an element from a trie, the algorithm should traverse the root node until it reaches the last character of the word. Then, for each node that does not have any other children, the node should be deleted. For example, in a trie with Edi, when Edi is deleted, the E node in the root stays intact, but d and i are removed. The recursive implementation in the following code block implements this algorithm:

// Delete
Trie.prototype.delete = function (word) {
    this.deleteRecursively (this.root, word, 0);
}

// Delete: Helper Function
Trie.prototype.deleteRecursively = function (current, word, index) {
    if (index == word.length) {
        // When end of word is reached only delete if current.end of...
        // ... Word is "true"
        if (!current.endOfWord) {
            return false;
        }
        current.endOfWord = false;
        // If current has no other mapping then return "true"
        return Object.keys (current.children).length == 0;
    }
    var ch = word.charAt (index),
        node = current.children[ch];
    if (node == null) {
        return false;
    }
    var shouldDeleteCurrentNode = this.deleteRecursively (node, word, index + 1);
    // If "true" is returned then delete the mapping...
    // ...of character and trienode reference from map
    if (shouldDeleteCurrentNode) {
        delete current.children[ch];
        // Return "true" if no mappings are left in the map
        return Object.keys (current.children).length == 0;
    }
    return false;
}
Enter fullscreen mode Exit fullscreen mode

Trie(Full Example):

// Trie Node
function TrieNode () {
    this.children = {} // Table
    this.endOfWord = false;
}

// Trie Class
function Trie () {
    this.root = new TrieNode();
}

// Insert
Trie.prototype.insert = function (word) {
    var current = this.root;
    for (var i = 0, l = word.length; i < l; i++) {
        var ch = word.charAt(i);
        var node = current.children[ch];
        if (node == null) {
            node = new TrieNode();
            current.children[ch] = node;
        }
        current = node;
    }
     // Mark the current nodes endOfWord as "true"
    current.endOfWord = true;
}

// Search
Trie.prototype.search = function (word) {
    var current = this.root;
    for (var i = 0, l = word.length; i < l; i++) {
        var ch = word.charAt(i);
        var node = current.children[ch];
        if (node == null) {
            // Node doesn't exist
            return false;
        }
        current = node;
    }
    return current.endOfWord;
}

// Delete
Trie.prototype.delete = function (word) {
    this.deleteRecursively (this.root, word, 0);
}

// Delete: Helper Function
Trie.prototype.deleteRecursively = function (current, word, index) {
    if (index == word.length) {
        // When end of word is reached only delete if current.end of...
        // ... Word is "true"
        if (!current.endOfWord) {
            return false;
        }
        current.endOfWord = false;
        // If current has no other mapping then return "true"
        return Object.keys (current.children).length == 0;
    }
    var ch = word.charAt (index),
        node = current.children[ch];
    if (node == null) {
        return false;
    }
    var shouldDeleteCurrentNode = this.deleteRecursively (node, word, index + 1);
    // If "true" is returned then delete the mapping...
    // ...of character and trienode reference from map
    if (shouldDeleteCurrentNode) {
        delete current.children[ch];
        // Return "true" if no mappings are left in the map
        return Object.keys (current.children).length == 0;
    }
    return false;
}

// Instance of the Trie Class
var trie = new Trie ();

// 1st: Insert
trie.insert ("Edison");
trie.insert ("Edis");
trie.insert ("Edi");

// 2nd: Search
console.log(trie.search ("Edison")); // Print "true"
console.log(trie.search ("Edis")); // Print "true"
console.log(trie.search ("Edi")); // Print "true"
trie.search ("Ed"); // Print "false"

// 3rd: Delete
trie.delete ("Edison");
trie.delete ("Edis");
trie.delete ("Edi");

// 4th: Search (Again)
console.log(trie.search ("Edison")); // Print "false"
console.log(trie.search ("Edis")); // Print "false"
console.log(trie.search ("Edi")); // Print "false"
trie.search ("Ed"); // Print "false"
Enter fullscreen mode Exit fullscreen mode

Trie(Full Execution):

// Trie Node function TrieNode () { this.children = {} // Table this.endOfWord = false; } // Trie Class function Trie () { this.root = new TrieNode(); } // Insert Trie.prototype.insert = function (word) { var current = this.root; for (var i = 0, l = word.length; i < l; i++) { var ch = word.charAt(i); var node = current.children[ch]; if (node == null) { node = new TrieNode(); current.children[ch] = node; } current = node; } // Mark the current nodes endOfWord as "true" current.endOfWord = true; } // Search Trie.prototype.search = function (word) { var current = this.root; for (var i = 0, l = word.length; i < l; i++) { var ch = word.charAt(i); var node = current.children[ch]; if (node == null) { // Node doesn't exist return false; } current = node; } return current.endOfWord; } // Delete Trie.prototype.delete = function (word) { this.deleteRecursively (this.root, word, 0); } // Delete: Helper Function Trie.prototype.deleteRecursively = function (current, word, index) { if (index == word.length) { // When end of word is reached only delete if current.end of... // ... Word is "true" if (!current.endOfWord) { return false; } current.endOfWord = false; // If current has no other mapping then return "true" return Object.keys (current.children).length == 0; } var ch = word.charAt (index), node = current.children[ch]; if (node == null) { return false; } var shouldDeleteCurrentNode = this.deleteRecursively (node, word, index + 1); // If "true" is returned then delete the mapping... // ...of character and trienode reference from map if (shouldDeleteCurrentNode) { delete current.children[ch]; // Return "true" if no mappings are left in the map return Object.keys (current.children).length == 0; } return false; } // Instance of the Trie Class var trie = new Trie (); // 1st: Insert trie.insert ("Edison"); trie.insert ("Edis"); trie.insert ("Edi"); // 2nd: Search console.log(trie.search ("Edison")); // Print "true" console.log(trie.search ("Edis")); // Print "true" console.log(trie.search ("Edi")); // Print "true" trie.search ("Ed"); // Print "false" // 3rd: Delete trie.delete ("Edison"); trie.delete ("Edis"); trie.delete ("Edi"); // 4th: Search (Again) console.log(trie.search ("Edison")); // Print "false" console.log(trie.search ("Edis")); // Print "false" console.log(trie.search ("Edi")); // Print "false" trie.search ("Ed"); // Print "false"

Time Complexity: O(W)O \lparen W \rparen
Space Complexity: O(N×M)O \lparen N \times M \rparen

Time complexity is O(W)O \lparen W \rparen for all operations (insert, search, delete), where WW is the length of the string being searched because each character in the string is checked. The space complexity is O(N×M)O \lparen N \times M \rparen , where NN is the number of words inserted into the trie and MM is the length of the longest character.

Note (📝): A trie is an efficient data structure
when there are multiple strings with common prefixes. For searching one specific string pattern in one specific string, a trie is not efficient because of the additional memory required to store the strings in the tree-like structure.

For a pattern search in a single target string, the Boyer–Moore algorithm and the Knuth–Morris–Pratt (KMP) algorithm are useful and are covered in the next section. Let's go! (🔥🔥🔥)

Boyer–Moore String Search (👨🔍)

Alt text

The Boyer–Moore string search algorithm is used to power the find tool used in web browsers, like the one in Figure 18-2 (below):

Alt Text

Figure 18-2. Find tool commonly seen in many applications

The Boyer–Moore string search algorithm allows linear time in search by skipping indices when searching. For example, for the pattern jam and the string jellyjam, visualization of brute-force comparison is shown in Figure 18-3:


Alt Text

Figure 18-3. Brute-force pattern match iterations

Note (📝): It should be noted that in the fourth (the one with sky blue mark) iteration when j is compared with m, skipping ahead by 2 would be valid.

Figure 18-4 shows an optimized iteration cycle by skipping ahead when the string at the index exists in the pattern:


Alt Text

Figure 18-4. Boyer–Moore Skipping Indices

To implement this skip rule, you can build a bad match table (below):

Pattern Bad Match Table
jam {j: 2, a: 1, m: 3}
data {d: 3, a: 2, t: 1}
struct {s: 5, t: 4, r: 3, u: 2, c: 1}
roi {r: 2, o: 1, i: 3}

For the roi example: r:2 indicates that if r is not found in the string, the index should skip by 2. This bad match table can be implemented with the following code block:

// Build "Bad Match Table"
function buildBadMatchTable (str) {
    var tableObj = {},
        strLength = str.length;
    for (var i = 0; i < strLength - 1; i++) {
        tableObj[str[i]] = (strLength - 1) - i;
    }
    if (tableObj[str[strLength - 1]] == undefined) {
        tableObj[str[strLength - 1]] = strLength;
    }
    return tableObj;
}
Enter fullscreen mode Exit fullscreen mode

Execution:

// Build "Bad Match Table" function buildBadMatchTable (str) { var tableObj = {}, strLength = str.length; for (var i = 0; i < strLength - 1; i++) { tableObj[str[i]] = (strLength - 1) - i; } if (tableObj[str[strLength - 1]] == undefined) { tableObj[str[strLength - 1]] = strLength; } return tableObj; } // Insert String: console.log(buildBadMatchTable ("jam")); // Print "{ j: 2, a: 1, m: 3 }" console.log(buildBadMatchTable ("data")); // Print "{ d: 3, a: 2, t: 1 }" console.log(buildBadMatchTable ("struct")); // Print "{ s: 5, t: 4, r: 3, u: 2, c: 1 }" buildBadMatchTable ("roi"); // Print "{ r: 2, o: 1, i: 3 }"

Using this bad match table, the Boyer–Moore string search algorithm can be implemented. If the string being looked at exists in the bad match table it skips over by the bad match table value
associated with the string. Otherwise, it is incremented by 1 like the one in the example below:

// Boyer Moore Algorithm:
function boyerMoore (str, pattern) {
    var badMatchTable = buildBadMatchTable(pattern),
        offset = 0,
        patternLastIndex = (pattern.length - 1),
        scanIndex = patternLastIndex,
        maxOffset = (str.length - pattern.length);
    // If the "offset" is bigger than maxOffset, cannot be found:
    while (offset <= maxOffset) {
        scanIndex = 0;
        while (pattern[scanIndex] == str[scanIndex + offset]) {
            if (scanIndex == patternLastIndex) {
                // Found at this index:
                offset = scanIndex || patternLastIndex;
                return offset;
            }
            scanIndex++;
        }
        var badMatchString = str[offset + patternLastIndex];
        if (badMatchTable[badMatchString]) {
            // Increase the offset if it exists:
            offset += badMatchTable[badMatchString];
        } else {
            offset += 1;
        }
    }
    return -1;
}
Enter fullscreen mode Exit fullscreen mode

Execution:

// Build "Bad Match Table" function buildBadMatchTable (str) { var tableObj = {}, strLength = str.length; for (var i = 0; i < strLength - 1; i++) { tableObj[str[i]] = (strLength - 1) - i; } if (tableObj[str[strLength - 1]] == undefined) { tableObj[str[strLength - 1]] = strLength; } return tableObj; } // Boyer Moore Algorithm: function boyerMoore (str, pattern) { var badMatchTable = buildBadMatchTable(pattern), offset = 0, patternLastIndex = (pattern.length - 1), scanIndex = patternLastIndex, maxOffset = (str.length - pattern.length); // If the "offset" is bigger than maxOffset, cannot be found: while (offset <= maxOffset) { scanIndex = 0; while (pattern[scanIndex] == str[scanIndex + offset]) { if (scanIndex == patternLastIndex) { // Found at this index: offset = scanIndex || patternLastIndex; return offset; } scanIndex++; } var badMatchString = str[offset + patternLastIndex]; if (badMatchTable[badMatchString]) { // Increase the offset if it exists: offset += badMatchTable[badMatchString]; } else { offset += 1; } } return -1; } // Insert String: console.log(boyerMoore ("Edison", "Edi")); // Print "2". Indicates that the pattern starts at index "2" console.log(boyerMoore ("Edison", "Ed")); // Print "1". Indicates that the pattern starts at index "1" boyerMoore ("Edison", "X"); // Print "-1". Indicates that the pattern does not exist

Best Case:
In the best case, all the characters in the pattern are the same, and this consistently produces shifts by TT , where TT is the length of the pattern. Hence, O(WT)O \lparen {W \over T} \rparen is the best time complexity where WW is the string where the pattern is being searched. The space complexity is O(1)O \lparen 1 \rparen since only 11 value is stored into the bad match table.

Time Complexity: O(TW)O \lparen {T \over W} \rparen
Space Complexity: O(1)O \lparen 1 \rparen

Worst Case:
In the worst case, the string has the pattern at the end of the string. An example of this is a string of abcdefgxyzabcdefgxyz and pattern of xyzxyz . In this case, T×WT \times W string comparisons are done.

Time Complexity: O(T×W)O \lparen T \times W \rparen
Space Complexity: O(T)O \lparen T \rparen

Note (📝): If all the characters in the pattern and the string are the same. For example, the string bbbbbbbbbbbb and the pattern bbbbbb . This case cannot use the skip mechanism
to its fullest. Space complexity in this case is TT because the pattern could have all unique characters.

Knuth–Morris–Pratt String Search (👨🔍)

Alt text

The implementation of the KMP algorithm returns all indices where the pattern is present. This helps to skip re-examination of previously matched characters. A prefix array has to be built. For the string ababacaababaca , the prefix building looks like the following:

1️⃣) At current index 0, there is no string to compare to, and the prefix array value is initialized to 0:

  • Array Index: [0,1,2,3,4,5,6][0, 1, 2, 3, 4, 5, 6]
  • Character: [a,b,a,b,a,c,a][a, b, a, b, a, c, a]
  • Prefix Array: [0][0]

2️⃣) At current index 1, compare index 0 to the current index (which is 1): aa and bb mismatch (prefix[1] is set 0):

  • Array Index: [0,1,2,3,4,5,6][0, 1, 2, 3, 4, 5, 6]
  • Character: [a,b,a,b,a,c,a][a, b, a, b, a, c, a]
  • Prefix Array: [0,0][0, 0]

3️⃣) At current index 2, compare index to the current index (which is 2): aa and aa match (prefix[2] is set 1):

  • Array Index: [0,1,2,3,4,5,6][0, 1, 2, 3, 4, 5, 6]
  • Character: [a,b,a,b,a,c,a][a, b, a, b, a, c, a]
  • Prefix Array: [0,0,1][0, 0, 1]

4️⃣) At current index 3, compare index 1 to the current index (which is 3): bb and bb match (prefix[3] is set 2):

  • Array Index: [0,1,2,3,4,5,6][0, 1, 2, 3, 4, 5, 6]
  • Character: [a,b,a,b,a,c,a][a, b, a, b, a, c, a]
  • Prefix Array: [0,0,1,2][0, 0, 1, 2]

5️⃣) At current index 4, compare index 2 to the current index (which is 4): aa and aa match (prefix[4] is set 3):

  • Array Index: [0,1,2,3,4,5,6][0, 1, 2, 3, 4, 5, 6]
  • Character: [a,b,a,b,a,c,a][a, b, a, b, a, c, a]
  • Prefix Array: [0,0,1,2,3][0, 0, 1, 2, 3]

6️⃣) At current index 3, compare index 3 to the current index (which is 5): bb and cc mismatch (prefix[5] is set 0):

  • Array Index: [0,1,2,3,4,5,6][0, 1, 2, 3, 4, 5, 6]
  • Character: [a,b,a,b,a,c,a][a, b, a, b, a, c, a]
  • Prefix Array: [0,0,1,2,3,0][0, 0, 1, 2, 3, 0]

7️⃣) At current index 0, compare index 0 to the current index (which is 6): aa and bb mismatch (prefix[6] is set 0):

  • Array Index: [0,1,2,3,4,5,6][0, 1, 2, 3, 4, 5, 6]
  • Character: [a,b,a,b,a,c,a][a, b, a, b, a, c, a]
  • Prefix Array: [0,0,1,2,3,0,1][0, 0, 1, 2, 3, 0, 1]

The function in the following code block (below) illustrates this algorithm to build a prefix table:

// Longest Prefix
function longestPrefix (str) {
    // Prefix array is created
    var strLength = str.length;
    var prefix = new Array(strLength);
    var maxPrefix = 0;
    // Start the prefix at 0
    prefix[0] = 0;
    for (var i = 1; i < strLength; i++) {
        // Decrement the prefix value as long as there are mismatches
        while (str.charAt(i) !== str.charAt(maxPrefix) && maxPrefix > 0) {
            maxPrefix = prefix[maxPrefix - 1];
        }
        // Strings match, can update it
        if (str.charAt(maxPrefix) === str.charAt(i)) {
            maxPrefix++;
        }
        // Set the prefix
        prefix[i] = maxPrefix;
    }
    return prefix;
}
Enter fullscreen mode Exit fullscreen mode

Execution:

// Longest Prefix function longestPrefix (str) { // Prefix array is created var strLength = str.length; var prefix = new Array(strLength); var maxPrefix = 0; // Start the prefix at 0 prefix[0] = 0; for (var i = 1; i < strLength; i++) { // Decrement the prefix value as long as there are mismatches while (str.charAt(i) !== str.charAt(maxPrefix) && maxPrefix > 0) { maxPrefix = prefix[maxPrefix - 1]; } // Strings match, can update it if (str.charAt(maxPrefix) === str.charAt(i)) { maxPrefix++; } // Set the prefix prefix[i] = maxPrefix; } return prefix; } // Result longestPrefix("ababaca"); // Prints "[0, 0, 1, 2, 3, 0, 1]"

With this prefix table, KMP can be implemented. KMP iterates index by index. Whenever there is a mismatch, it can use the prefix table to compute a new index. When the index reaches the length of the pattern, the string is found. This is implemented in detail in the following code block (below):

// Knut Morris Pratt Algorithm:
function KMP (str, pattern) {
    // Build the prefix table
    var prefixTable = longestPrefix (pattern),
        patternIndex = 0,
        strIndex = 0;
    while (strIndex < str.length) {
        if (str.charAt(strIndex) != pattern.charAt(patternIndex)) {
            // Case 1: The characters are different
            if (patternIndex != 0) {
                // Use the prefix table if possible
                patternIndex = prefixTable[patternIndex - 1];
            } else {
                // Increment the str index to next character
                strIndex++;
            }
        } else if (str.charAt(strIndex) == pattern.charAt(patternIndex)) {
            // Case 2: The characters are same
            strIndex++;
            patternIndex++;
        }
        // Found the pattern
        if (patternIndex == pattern.length) {
            return true;
        }
    }
    return false;
}
Enter fullscreen mode Exit fullscreen mode

Execution:

// Longest Prefix function longestPrefix (str) { // Prefix array is created var strLength = str.length; var prefix = new Array(strLength); var maxPrefix = 0; // Start the prefix at 0 prefix[0] = 0; for (var i = 1; i < strLength; i++) { // Decrement the prefix value as long as there are mismatches while (str.charAt(i) !== str.charAt(maxPrefix) && maxPrefix > 0) { maxPrefix = prefix[maxPrefix - 1]; } // Strings match, can update it if (str.charAt(maxPrefix) === str.charAt(i)) { maxPrefix++; } // Set the prefix prefix[i] = maxPrefix; } return prefix; } // Knut Morris Pratt Algorithm: function KMP (str, pattern) { // Build the prefix table var prefixTable = longestPrefix (pattern), patternIndex = 0, strIndex = 0; while (strIndex < str.length) { if (str.charAt(strIndex) != pattern.charAt(patternIndex)) { // Case 1: The characters are different if (patternIndex != 0) { // Use the prefix table if possible patternIndex = prefixTable[patternIndex - 1]; } else { // Increment the str index to next character strIndex++; } } else if (str.charAt(strIndex) == pattern.charAt(patternIndex)) { // Case 2: The characters are same strIndex++; patternIndex++; } // Found the pattern if (patternIndex == pattern.length) { return true; } } return false; } console.log(KMP ("Edison", "Edi")); // Prints "true" console.log(KMP ("Edison", "Edis")); // Prints "true" console.log(KMP ("Edison", "Ediso")); // Prints "true" KMP ("Edison", "EdisonX"); // Prints "false"

Time Complexity: O(W)O(W)
Space Complexity: O(W)O(W)

Note (📝): Preprocessing the length of word WW requires both time and space complexity in O(W)O(W) .

Time Complexity: O(W+T)O(W + T) . Here, WW is the word in the TT (the main string being searched).

Rabin–Karp Search (👨🔍)

Alt text

The Rabin–Karp algorithm is based on hashing to find the pattern in text. While KMP skip redundant checks during the search, Rabin Karp seeks to speed up the equality of the pattern via a hash function. To do this efficiently, the hash function must be O(1)O(1) . Specifically for the Rabin-Karp search, the Rabin fingerprint hashing technique is used.

The Rabin Fingerprint (👍1️⃣🌱)

The Rabin fingerprint is calculated via the following equation: f(x)=m0+m1x++mn1xn1f \lparen x\rparen = m_{0} + m_{1}x + … + m_{n-1}x^{n-1} where nn is the number of characters being hashed and xx is some prime number. as shown in the following code block (below):

// Rabin Karp Search Class
function RabinKarpSearch () {
    this.prime = 101;
}

// Rabin Karp Fingerprint Hash
RabinKarpSearch.prototype.rabinkarpFingerprintHash = function (str, endLength) {
    if (endLength == null) {
        endLength = str.length;
    }
    var hashInt = 0;
    for (var i = 0; i < endLength; i++) {
        hashInt += str.charCodeAt (i) * Math.pow (this.prime, i);
    }
    return hashInt;
}
Enter fullscreen mode Exit fullscreen mode

An arbitrary prime number of 101 was set for this example. Any high prime number should work well in this case. However, be aware that if the x is too high, it could cause integer overflow because xn1x^{n-1} grows quickly. The endLength argument indicates to what string index the hash should be calculated. It should be defaulted to the length of str if the argument is not passed. See the result:

// Rabin Karp Search Class function RabinKarpSearch () { this.prime = 101; } // Rabin Karp Fingerprint Hash RabinKarpSearch.prototype.rabinkarpFingerprintHash = function (str, endLength) { if (endLength == null) { endLength = str.length; } var hashInt = 0; for (var i = 0; i < endLength; i++) { hashInt += str.charCodeAt (i) * Math.pow (this.prime, i); } return hashInt; } // Instance of the Rabin Karp Search Class var rks = new RabinKarpSearch (); // Result console.log(rks.rabinkarpFingerprintHash ("Edison")); // Prints "1167781325510" rks.rabinkarpFingerprintHash ("Adison"); // Prints "1167781325506"

As shown in the result, the hashes from Edison and Adison are unique because they are two different strings. The hash value allows you to quickly, in constant time, check whether two strings are the same. As an example, let’s look for am inside same. Since am is only two characters long, when you scan the text, sa, am, and me are formed from same and compute the hash as shown below:

rks.rabinkarpFingerprintHash("sa"); // 9912
rks.rabinkarpFingerprintHash("am"); // 11106
rks.rabinkarpFingerprintHash("me"); // 10310
Enter fullscreen mode Exit fullscreen mode

Let’s analyze it mathematically. Recall that for this example the xx is 101101 , the character code for s, a, m, and e are 115, 97, 109, and 101:

sa: f(x)=m0+m1x=115+(97)×(101)=9912f(x) = m_0 + m_{1}x = 115 + (97) \times (101) = 9912
am: f(x)=m0+m1x=97+(109)×(101)=11106f(x) = m_0 + m_{1}x = 97 + (109) \times (101) = 11106
me: f(x)=m0+m1x=109+(101)×(101)=10310f(x) = m_0 + m_{1}x = 109 + (101) \times (101) = 10310

To get the hash value from sa to am, you must subtract the first term, divide the remaining by the prime number, and then add the new term (this is explained above). This recalculation algorithm is implemented in the following code block:

// Recalculate Hash
RabinKarpSearch.prototype.recalculateHash = function (str, oldIndex, newIndex, oldHash, patternLength) {
    if (patternLength == null) {
        patternLength = str.length;
    }
    var newHash = (oldHash - str.charCodeAt (oldIndex));
    newHash = Math.floor (newHash/this.prime);
    newHash += str.charCodeAt (newIndex) * Math.pow (this.prime, (patternLength - 1));
    return newHash;
}
Enter fullscreen mode Exit fullscreen mode

Execution:

// Rabin Karp Search Class function RabinKarpSearch () { this.prime = 101; } // Rabin Karp Fingerprint Hash RabinKarpSearch.prototype.rabinkarpFingerprintHash = function (str, endLength) { if (endLength == null) { endLength = str.length; } var hashInt = 0; for (var i = 0; i < endLength; i++) { hashInt += str.charCodeAt (i) * Math.pow (this.prime, i); } return hashInt; } // Recalculate Hash RabinKarpSearch.prototype.recalculateHash = function (str, oldIndex, newIndex, oldHash, patternLength) { if (patternLength == null) { patternLength = str.length; } var newHash = (oldHash - str.charCodeAt (oldIndex)); newHash = Math.floor (newHash/this.prime); newHash += str.charCodeAt (newIndex) * Math.pow (this.prime, (patternLength - 1)); return newHash; } // Instance of the Rabin Karp Search Class var rks = new RabinKarpSearch (); // Result var oldHash = rks.rabinkarpFingerprintHash ("sa"); // Value: 9912 console.log(rks.recalculateHash ("same", 0, 2, oldHash, "sa".length)) // Prints "11106"

Lastly, two different strings can still have the same hash value (although it’s unlikely). Therefore, there needs to be a function to check that two strings are equal. This is implemented in the following code block:

// Check Equality
RabinKarpSearch.prototype.strEquals = function (str1, startIndex1, endIndex1, str2, startIndex2, endIndex2) {
    if ((endIndex1 - startIndex1) != (endIndex2 - startIndex2)) {
        return false;
    }
    while ((startIndex1 <= endIndex1) && (startIndex2 <= endIndex2)) {
        if (str1[startIndex1] != str2[startIndex2]) {
            return false;
        }
        startIndex1++;
        startIndex2++;
    }
    return true;
}
Enter fullscreen mode Exit fullscreen mode

Then, the main Rabin–Karp search function is implemented below:

// Rabin Karp Search Class
function RabinKarpSearch () {
    this.prime = 101;
}

// Rabin Karp Fingerprint Hash
RabinKarpSearch.prototype.rabinkarpFingerprintHash = function (str, endLength) {
    if (endLength == null) {
        endLength = str.length;
    }
    var hashInt = 0;
    for (var i = 0; i < endLength; i++) {
        hashInt += str.charCodeAt (i) * Math.pow (this.prime, i);
    }
    return hashInt;
}

// Recalculate Hash
RabinKarpSearch.prototype.recalculateHash = function (str, oldIndex, newIndex, oldHash, patternLength) {
    if (patternLength == null) {
        patternLength = str.length;
    }
    var newHash = (oldHash - str.charCodeAt (oldIndex));
    newHash = Math.floor (newHash/this.prime);
    newHash += str.charCodeAt (newIndex) * Math.pow (this.prime, (patternLength - 1));
    return newHash;
}

// Check Equality
RabinKarpSearch.prototype.strEquals = function (str1, startIndex1, endIndex1, str2, startIndex2, endIndex2) {
    if ((endIndex1 - startIndex1) != (endIndex2 - startIndex2)) {
        return false;
    }
    while ((startIndex1 <= endIndex1) && (startIndex2 <= endIndex2)) {
        if (str1[startIndex1] != str2[startIndex2]) {
            return false;
        }
        startIndex1++;
        startIndex2++;
    }
    return true;
}

// Rabin Karp Search Algorithm:
RabinKarpSearch.prototype.rabinkarpSearch = function (str, pattern) {
    var T = str.length,
        W = pattern.length,
        patternHash = this.rabinkarpFingerprintHash (pattern, W),
        textHash = this.rabinkarpFingerprintHash (str, W);
    for (var i = 1; i <= (T - W + 1); i++) {
        if ((patternHash == textHash) &&
            this.strEquals (str, i - 1, i + W - 2, pattern, 0, W - 1)) {
                return i - 1;
        }
        if (i < T - W + 1) {
            textHash = this.recalculateHash (str, i - 1, i + W - 1, textHash, W)
        }
    }
    return -1;
}

// Instance of the Rabin Karp Search Class
var rks = new RabinKarpSearch ();

// Result
console.log(rks.rabinkarpSearch ("Edison", "id"));// Prints "-1"
console.log(rks.rabinkarpSearch ("Edison", "son")); // Prints "3"
rks.rabinkarpSearch ("Edison", "Edi"); // Prints "0"
Enter fullscreen mode Exit fullscreen mode

Execution:

// Rabin Karp Search Class function RabinKarpSearch () { this.prime = 101; } // Rabin Karp Fingerprint Hash RabinKarpSearch.prototype.rabinkarpFingerprintHash = function (str, endLength) { if (endLength == null) { endLength = str.length; } var hashInt = 0; for (var i = 0; i < endLength; i++) { hashInt += str.charCodeAt (i) * Math.pow (this.prime, i); } return hashInt; } // Recalculate Hash RabinKarpSearch.prototype.recalculateHash = function (str, oldIndex, newIndex, oldHash, patternLength) { if (patternLength == null) { patternLength = str.length; } var newHash = (oldHash - str.charCodeAt (oldIndex)); newHash = Math.floor (newHash/this.prime); newHash += str.charCodeAt (newIndex) * Math.pow (this.prime, (patternLength - 1)); return newHash; } // Check Equality RabinKarpSearch.prototype.strEquals = function (str1, startIndex1, endIndex1, str2, startIndex2, endIndex2) { if ((endIndex1 - startIndex1) != (endIndex2 - startIndex2)) { return false; } while ((startIndex1 <= endIndex1) && (startIndex2 <= endIndex2)) { if (str1[startIndex1] != str2[startIndex2]) { return false; } startIndex1++; startIndex2++; } return true; } // Rabin Karp Search Algorithm: RabinKarpSearch.prototype.rabinkarpSearch = function (str, pattern) { var T = str.length, W = pattern.length, patternHash = this.rabinkarpFingerprintHash (pattern, W), textHash = this.rabinkarpFingerprintHash (str, W); for (var i = 1; i <= (T - W + 1); i++) { if ((patternHash == textHash) && this.strEquals (str, i - 1, i + W - 2, pattern, 0, W - 1)) { return i - 1; } if (i < T - W + 1) { textHash = this.recalculateHash (str, i - 1, i + W - 1, textHash, W) } } return -1; } // Instance of the Rabin Karp Search Class var rks = new RabinKarpSearch (); // Result console.log(rks.rabinkarpSearch ("Edison", "id"));// Prints "-1" console.log(rks.rabinkarpSearch ("Edison", "son")); // Prints "3" rks.rabinkarpSearch ("Edison", "Edi"); // Prints "0"

The main Rabin–Karp search function is implemented by calculating the starting hash and then recalculating the hashes in a sliding manner until the pattern is found or the end of the string is reached.

Preprocessing Time Complexity: O(W)O \lparen W \rparen
The preprocessing time complexity WW is the length of the word.

Matching Time Complexity: O(W+T)O\lparen W + T\rparen
At most, this algorithm iterates through the sum of length TT and length WW , where TT is the string being searched for.

Applications in Real Life (👉🔌👈)

The Rabin–Karp algorithm can be used for detecting plagiarism. With a source material, this algorithm can search through a paper submission for instances of phrases and sentences from the source material. The Rabin–Karp algorithm is also used in other string matching applications such as looking for a specific sequence in large DNA data.

Summary (📚📚)

Alt text

Part 18 returned to the topic of strings and looked at more advanced examples and searching on string patterns. Part 18 discussed several different types:

  • Trie is great for multiple searches and prefix pattern matching.

  • Boyer–Moore, with assumption that the absence of a match at the end means no need to match the beginning, it tries to match the last character of the pattern instead of the first; this allows large jumps (spaces between indexes) and works better when the text is larger.

  • The KMP algorithm searches for occurrences of the pattern in a string by observing that when a mismatch occurs, the pattern itself has sufficient information to determine the index in the string. Hence, the KMP algorithm is better for small sets.

Table 18-1 summarizes the different search algorithms.

Algorithm Preprocessing Time Complexity Matching Time Complexity Space Complexity
Naive Node O(W×T)O \lparen W \times T \rparen None
Boyer-Moore O(W+T)O \lparen W + T \rparen O(TW)O \lparen {T \over W} \rparen base case O(1)O \lparen 1 \rparen
KMP O(W)O \lparen W \rparen O(T)O \lparen T \rparen O(W)O \lparen W \rparen
Rabin-Karp O(W)O \lparen W \rparen O(W+T)O \lparen W + T \rparen O(1)O \lparen 1 \rparen
Table 18-1. Single String Search Summary

Up Next👉 Part 19: Dynamic Programming (🔥🔥) (September 26-27, 2020)


Alt Text

Discussion

pic
Editor guide