Introduction
In modern applications, efficient text search is crucial, especially when dealing with large databases. While MySQL provides basic full-text search capabilities, it falls short when it comes to fuzzy matching or handling misspellings. This is where trigram-based search comes into play. In this blog, we’ll explore what a trigram is, how it improves search performance, and how you can implement trigram search in MySQL.
What is a Trigram?
A trigram is a sequence of three consecutive characters from a given string. For instance, the word "search" can be broken down into the following trigrams:
- sea
- ear
- arc
- rch By breaking down words into trigrams, we can perform more flexible and efficient text matching, especially when trying to match incomplete or slightly misspelled terms.
Implementing Trigram Search in MySQL By Creating Temporary Table
1.Create Trigram Function in MySQL database. Trigram function code:
CREATE FUNCTION TRIGRAM_SEARCH(search_string VARCHAR(255), target_string VARCHAR(255))
RETURNS FLOAT
DETERMINISTIC
BEGIN
DECLARE i INT DEFAULT 1;
DECLARE total_trigrams INT DEFAULT 0;
DECLARE matched_trigrams INT DEFAULT 0;
DECLARE search_length INT;
DECLARE target_length INT;
SET search_length = CHAR_LENGTH(search_string);
SET target_length = CHAR_LENGTH(target_string);
-- Handle edge cases where strings are too short
IF search_length < 3 OR target_length < 3 THEN
RETURN 0;
END IF;
-- Create temporary tables to store trigrams
CREATE TEMPORARY TABLE search_trigrams (trigram VARCHAR(3));
CREATE TEMPORARY TABLE target_trigrams (trigram VARCHAR(3));
-- Insert trigrams from search_string into temporary table
WHILE i <= search_length - 2 DO
INSERT INTO search_trigrams VALUES (SUBSTRING(search_string, i, 3));
SET i = i + 1;
END WHILE;
-- Reset index for target_string
SET i = 1;
-- Insert trigrams from target_string into temporary table
WHILE i <= target_length - 2 DO
INSERT INTO target_trigrams VALUES (SUBSTRING(target_string, i, 3));
SET i = i + 1;
END WHILE;
-- Count the number of matching trigrams
SELECT COUNT(DISTINCT t1.trigram) INTO matched_trigrams
FROM search_trigrams t1
JOIN target_trigrams t2 ON t1.trigram = t2.trigram;
-- Count the total number of trigrams in search_string
SELECT COUNT(DISTINCT trigram) INTO total_trigrams FROM search_trigrams;
-- Drop temporary tables
DROP TEMPORARY TABLE search_trigrams;
DROP TEMPORARY TABLE target_trigrams;
-- Return similarity score (0 to 1)
IF total_trigrams > 0 THEN
RETURN matched_trigrams / total_trigrams;
ELSE
RETURN 0;
END IF;
END;
2.Now Indexing Desired Column to full-text
@Entity()
@Index(['title'], { fulltext: true })
export class Ebook extends BaseEntity {
@PrimaryGeneratedColumn()
ebookId: number;
@Column({ nullable: true })
title: string;
}
3.Testing of Trigram Search Function
select *
FROM ebook e
WHERE TRIGRAM_SEARCH('physis onlu', e.title) > 0.4
ORDER BY TRIGRAM_SEARCH('physis onlu', e.title) desc;
4.Implement trigram search in code
async find(title?: string) {
const eBooks = await this.dataSource
.getRepository(Ebook)
.createQueryBuilder('eBook');
if (title) {
eBooks.where(`TRIGRAM_SEARCH(:title, eBook.title) > 0.4`, { title });
}
const result = await eBooks.getMany();
return result;
}
Conclusion
Trigram search offers a powerful way to implement fuzzy matching in MySQL databases. By breaking down text into trigrams, we can perform more flexible and forgiving searches, greatly enhancing the user experience in applications where text search is crucial.
While this approach has its strengths, it's important to consider alternatives like Levenshtein distance or soundex algorithms depending on your specific use case and performance requirements.
By implementing trigram search, you can significantly improve the search capabilities of your Node.js and MySQL applications, providing users with more intelligent and forgiving search results.
Top comments (0)