DEV Community

guachilimbo
guachilimbo

Posted on

Using Binary Search Tree and Trie data structures to store and retrieve a Movie Catalogue

I created this code to have a go at data structure and algorithms on Python after taking the data structure & algorithms module in Codecademy.

GitHub link to code: https://github.com/guachilimbo/movie_recommendation

Data Structures Used

The program uses a database of the top 250 movies on IMDb which I found in Kaggle (https://www.kaggle.com/ramjasmaurya/top-250s-in-imdb)

It stores this information in two different data structures: A Binary Search Tree, and a Trie.

Binary Search Tree

When the data base is read, each movie entry is stored as a Binary Search Tree (BST). The tree is sorted alphabetically depending on the title of each movie. This allows for quick storage and retrieval with a running time performance of O(logN), where N is the number of levels in the tree.

However, in order to implement an autocomplete functionality, I thought this was not a good data structure, since for a given prefix (str) of length M, the runtime to find all the movies that match the string will be O(MlogN).

This is why as the data was read from the database, a Trie() structure containing the titles of each movie is created.

Trie

There are M + 1 Trie() classes created in this program, where M is the number of unique genres detected in the database. This allows to apply the autocomplete function to all the catalogue, or just the movies within an user chosen genre.

The nice thing about applying the autocomplete function, which searches all movies that start with a prefix input by the user, to a Trie is that search complexities are brought to optimal limit, this is, the prefix length.

Program Execution:

  1. Download folder and run python main.py on the command line. First step of running the program.
  2. The program will ask for choice between two running modes:

    a) Genre search: The program will display the genres found in the data base. The user will then choose one of the genres. The films that belong to this category will be displayed. The user will then be asked to enter part of the title of one of those films and a autocomplete method will be invoked on that string.
    Genre choice

    b) General search: The user will be asked to input the beginning of a movie title and the autocomplete method will be called on that string.
    Autocomplete function

  3. If a match is found, the movie's Title, Year, Genre and Duration will be displayed alongside the option to display a description of the movie.
    Movie card display

  4. Perform another search or exit the program.


Please any suggestions or improvements are greatly appreciated since this is my first time dealing with data structures and algorithms!

Discussion (0)