Link to Code

prateekguptaiiitk / Resume_Filtering

A resume filtering based on natural language processing

Resume Filtering Using Machine Learning

Resume filtering on the basis of Job Descriptions(JDs). It was a summer internship project with Skybits Technologies Pvt. Ltd.

Introduction

The main feature of the current project is that it searches the entire resume database to select and display the resumes which fit the best for the provided job description(JD). This is, in its current form, achieved by assigning a score to each CV by intelligently comparing them against the corresponding Job Description. This reduces the window to a fraction of an original size of applicants. Resumes in the final window can be manually checked for further analysis. The project uses techniques in Machine Learning and Natural Language Processing to automate the process.

Directory Structure

├── Data
│   ├── CVs
│   ├── collectCV.py
│   └── jd.csv
├── Model
│   ├── Model_Training.ipynb
│   ├── Sentence_Extraction.ipynb
│   ├── paragraph_extraction_from_posts.ipynb
│   ├── sample_bitcoin.stackexchange_paras.txt
│   ├── sample_bitcoin.stackexchange_sentences.txt

…

View on GitHub

Data Collection

Mainly three datasets were required:

StackExchange Network Posts

This dataset was required to trains the word2vec model. Fortunately, StackExchange network dumps it's data in xml format under Creative Commons License. One can find a download link for the dataset(44 GB) on Internet Archive.

Resume Dataset

This dataset was required to test the trained word2vec model. Among these resumes, best matching resumes should be filtered out. Downloaded resumes from indeed.com

Job Description Dataset

This dataset was required to test the trained word2vec model. These job descriptions would be the basis of resume filtering. A Kaggle dataset containing Job Descriptions for several job openings was used.

Additional Thoughts

It was a great learning experience through this project. My learning doesn't stop here, I will be creating and contributing more in the future. However, there is definitely room for improvements, the result is satisfactory enough for the first iteration of the project.

Thank you octograd2020! Cheers🍻

What-if I told you complex Data Integration makes good Task Orchestration?

Rym - Sep 12

Interested in Football Analytics?

Enrique Uribe - Sep 13

How to Generate High-Quality Synthetic Data for Fine-Tuning Large Language Models (LLMs)

Victor Isaac Oshimua - Sep 12

Source Code Analysis of Apache SeaTunnel Zeta Engine (Part 2): Task Submission Process on the Client Side

Apache SeaTunnel - Sep 12

DEV Community

Octograd 2020 - Resume Filtering System