DEV Community

Cover image for 10 Portfolio Projects you can try as an entry-level Data Analyst/Scientist
Durgesh kumar prajapati
Durgesh kumar prajapati

Posted on

10 Portfolio Projects you can try as an entry-level Data Analyst/Scientist

I hate the word newbie. If you are in a hurry, skip to the third paragraph. I always do this “catching up” thing before going straight to the point.

In the course of my journey, here are 10 projects I had engaged in to build my portfolio/career.

1.Crop Recommendation System

Tools used: Python, HTML, CSS, Flask, Basic ML knowledge
Difficulty: Easy

This was the first project I ever did and even though I hate it so much now, I’m so proud of it. I built a decision tree model that recommends the best crop under certain weather and soil condition. I deployed it locally using Flask and I have a terrible version of the project on my github currently so I do not want to link it. When I push a better version, I will link it here.

2.Movie recommender system

Tools used: Python, Knowledge of NLTK and Cosine Similarity, Heroku, Streamlit
Difficulty: Medium

Now, this was my second project but it was nothing like the first project. It uses NLP and cosine similarity. I had just finished Andrew Ng’s Machine learning course on Coursera and watched a TMDB movie recommender tutorial on YouTube so I built one on the Netflix dataset. I also worked on streamlit to allow user access and even deployed using Heroku. For me, this is the hardest project I have ever done. I even cried. Currently, I have learned better ways to do things but I did learn a lot from it.This is a link to the github. It needs some tidying but it’s not that terrible.

3. Forbes 2022 EDA using Python

Tools used: Python (Pandas and Matplolib)
Difficulty: Easy

This was the first EDA project that I published. I had written about it too on this link. The project was easy, it made me realize you learn from small projects too. I revised my knowledge of Pandas and Matplolib. I also learned how to ask the right questions, and how analysis is targeted toward uncovering something. A whole lot of people got to know me through this project too. This is a GitHub link to the project.

4. Market Basket Analysis

Tools used: Python(pandas, matplotlib, association rules)
Difficulty: Medium

I haven't posted about this project yet but it’s one of the projects I think a data analyst should try. You get to understand association rules, how products in a company sell, and which products are best sold with each other. How a high-sales product can aid in selling a low-sales one and so on. I enjoyed learning and doing this one and might be pushing it on my GitHub soon but before then you should research and try it. It is easy.

5. Implementing Gayle-Shapley’s Stable Matching Algorithm

Tools Used: Python
Difficulty: Medium

Now, this isn’t a data-related project. I went for an academy program last year that is python oriented and I was opportune enough to implement this algorithm in python. This algorithm is so interesting. The Gayle-Sharply matching algorithm is aimed at ensuring stable matching. The end goal is meant to be that everyone gets married to a (man)/(woman) and they are all happy with their matches. They all get to be with their most available preference. I don’t think I am explaining it well enough. I might dedicate a whole post to it but before then, you can read/research about it on google.

6. The Bechdel test

Tools used: Tableau, Python (For analysis)
Difficulty: Easy

The Bechdel Test ascertains there exists at least a scene in a movie where a woman speaks to another woman and it isn’t about a man. I will definitely write a post about this project. It’s one of the ones that hooked me on the first read. The moment I heard of this test, I wanted to do something with it to tell people about it. I linked it with the evolution of feminism and researched if the impact of feminism has improved how society viewed women. As such, I grouped the years into different centuries and observed the number of movies that passed the test over the years. I even made a tableau visualization for it but I haven’t perfected it yet. I haven’t posted about it either.

7. Sentiment Analysis Project

Tools: Python, NLTK, Power BI
Difficulty: Easy

I had done a sentiment analysis project when black panther 2 came out and I did another recently with two different libraries. It’s quite easy to do and I think it’s something every data analyst should try. I even visualized it using Power BI and I dared to use a black background. Yes. I did that. Here is a link to the post: Black Panther.

8. Data science job salaries

Tools Used: PostgreSQL, Excel, Power BI
Difficulty: Medium

Again, one of the projects that made me out there. I got so many reviews and feedback on this project. I used SQL, Excel for cleaning, and Power BI for visualization. I had written about it and published it too on this link. The data was gotten from this link and I explored the salaries of data professionals by their professions, mobility, employment type, and many more. SQL was used for the data analysis. I had used window functions and subqueries and honestly, I was able to properly practice what I had learned.

9. Classification of a phishing mail

Tools used: Python
Difficulty: Hard

This is one of the toughest projects I have engaged in. I built models that classify phishing emails and non-phishing emails using email structure, stylometric features, and so on. It took quite a time. I worked on feature extraction, data cleaning, dimensionality reduction, cross-validation, and model building. explored different evaluation methods too. I haven't pushed this on my GitHub either but I will soon. I don’t think I can make a post about it though.

10. Open Source Contribution

There are still some more projects to talk about but the number 10 project will be to contribute to open source. I learned unit testing, git and so much more through open source. It is something I don't do often because I always have little jobs that keep me so occupied but once I have a full-time job, I will definitely become a regular contributor. There is so much to learn and open source is one of the fastest ways to learn them.

Connect With Me:
LinkedIn :
GitHub :
Twitter :

Top comments (0)