I created this code to have a go at data structure and algorithms on Python after taking the data structure & algorithms module in Codecademy.
GitHub link to code: https://github.com/guachilimbo/movie_recommendation
The program uses a database of the top 250 movies on IMDb which I found in Kaggle (https://www.kaggle.com/ramjasmaurya/top-250s-in-imdb)
It stores this information in two different data structures: A Binary Search Tree, and a Trie.
When the data base is read, each movie entry is stored as a Binary Search Tree (BST). The tree is sorted alphabetically depending on the title of each movie. This allows for quick storage and retrieval with a running time performance of
O(logN), where N is the number of levels in the tree.
However, in order to implement an autocomplete functionality, I thought this was not a good data structure, since for a given prefix
(str) of length M, the runtime to find all the movies that match the string will be
This is why as the data was read from the database, a
Trie() structure containing the titles of each movie is created.
There are M + 1
Trie() classes created in this program, where M is the number of unique genres detected in the database. This allows to apply the autocomplete function to all the catalogue, or just the movies within an user chosen genre.
The nice thing about applying the autocomplete function, which searches all movies that start with a prefix input by the user, to a Trie is that search complexities are brought to optimal limit, this is, the prefix length.
- Download folder and run
python main.pyon the command line.
The program will ask for choice between two running modes:
a) Genre search: The program will display the genres found in the data base. The user will then choose one of the genres. The films that belong to this category will be displayed. The user will then be asked to enter part of the title of one of those films and a autocomplete method will be invoked on that string.
Perform another search or exit the program.
Please any suggestions or improvements are greatly appreciated since this is my first time dealing with data structures and algorithms!