DEV Community

Cover image for Search optimized database (full-text search) for system design interview
Daniel Lee
Daniel Lee

Posted on • Edited on

Search optimized database (full-text search) for system design interview

Traditional databases run a table scan to find a search term in the database. This is slow and efficient if a table stores a large dataset (1000+ rows). To improve, "search optimized database" can be used.

It uses indexing, tokenization and stemming to make search queries fast and efficient (by building "inverted indexes"):

  • Tokenization is a process of reducing words to their root form. For example, "running" and "runs" can be reduced to "run".
  • Stemming is a process of breaking a piece of task into individual words. It helps mapping words to documents containing those words in the inverted indexes.

Something to note is the underlying data structure of search mechanisms of search optimized database - Inverted Indexes.

It is a data structure that maps words to the documents that contain them. For example:

{
  "word1": [doc1,doc2,doc3],
  "word2": [doc2,doc3,doc6]
}
Enter fullscreen mode Exit fullscreen mode

Most search optimized database also support "Fuzzy Search" out of box as a configuration. Fuzzy Search works by leveraging "edit distance calculation" technique, which measures how many letters to be changed/added/removed to transform one word into another. Thus, results with minor misspellings or discrepancy relative to the search term can be returned efficiently in case of human errors.

One of the popular search optimized database is "ElasticSearch".


Reference

https://www.hellointerview.com/learn/system-design/in-a-hurry/key-technologies

Top comments (0)