A smart resume filtering system which shows the best matching resumes according to a given job description.
Link to Code
prateekguptaiiitk / Resume_Filtering
A resume filtering based on natural language processing
Resume Filtering Using Machine Learning
Resume filtering on the basis of Job Descriptions(JDs). It was a summer internship project with Skybits Technologies Pvt. Ltd.
Introduction
The main feature of the current project is that it searches the entire resume
database to select and display the resumes which fit the best for the provided job description(JD)
. This is, in its current form, achieved by assigning a score to each CV by intelligently comparing them against the corresponding Job Description. This reduces the window to a fraction of an original size of applicants. Resumes in the final window can be manually checked for further analysis. The project uses techniques in Machine Learning
and Natural Language Processing
to automate the process.
Directory Structure
├── Data │ ├── CVs │ ├── collectCV.py │ └── jd.csv ├── Model │ ├── Model_Training.ipynb │ ├── Sentence_Extraction.ipynb │ ├── paragraph_extraction_from_posts.ipynb │ ├── sample_bitcoin.stackexchange_paras.txt │ ├── sample_bitcoin.stackexchange_sentences.txt…
Project Introduction
This is, in its current form, achieved by assigning a score to each CV by intelligently comparing them against the corresponding Job Description. This reduces the window to a fraction of an original size of applicants. Resumes in the final window can be manually checked for further analysis.
Overview
- Mainly three datasets were required.
- The Word2Vec Model using the StackOverflow data dump.
- Extracted sections from the CVs like Education, Experience etc.
- Finally, the CVs were awarded scores against each Job Descriptions available.
Data Collection
Mainly three datasets were required:
StackExchange Network Posts
This dataset was required to trains the word2vec model. Fortunately, StackExchange network dumps it's data in xml format under Creative Commons License. One can find a download link for the dataset(44 GB) on Internet Archive.
Resume Dataset
This dataset was required to test the trained word2vec model. Among these resumes, best matching resumes should be filtered out. Downloaded resumes from indeed.com
Job Description Dataset
This dataset was required to test the trained word2vec model. These job descriptions would be the basis of resume filtering. A Kaggle dataset containing Job Descriptions for several job openings was used.
Resources Used
spaCy Documentation: https://spacy.io/
spaCy GitHub Issue Page: https://github.com/explosion/spaCy/issues
Gensim Word2Vec Documentation: http://radimrehurek.com/gensim/models/word2vec.html
Gensim Word2Vec GitHub repository: link
Google Word2Vec: https://code.google.com/archive/p/word2vec/
GitHub Repository for Doc2Vec Illustration: https://github.com/linanqiu/word2vec-sentiments
Additional Thoughts
It was a great learning experience through this project. My learning doesn't stop here, I will be creating and contributing more in the future. However, there is definitely room for improvements, the result is satisfactory enough for the first iteration of the project.
Thank you octograd2020! Cheers🍻
Top comments (0)