DEV Community

Cover image for WORDS TO ASPIRING DATA SCIENTISTS
Anthony Mipawa for NEUROTECH AFRICA

Posted on

WORDS TO ASPIRING DATA SCIENTISTS

Being passionate is one thing, the courage of doing what you are passionate for is a great win.

Had you ever dreamed of the day you will be capable of solving business problems using data science technology on ABC company, while all stakeholders are waiting for the insights from your analysis to conclude their decisions?

If not, just take a few seconds to think about that moment. It is an amazing moment if you love what you do as a data enthusiast.

But wait, have you solved any challenges using data science? If yes does your work can be shared with stakeholders and answer the question stated?

Sorry, now you have a lot of questions in your mind about the projects you have worked on. Listen stop solving problems for you're own just be real and solve community problems.

The process of career growth is not easy it requires a person to struggle to become the best in their professionals. Today I will be talking to aspiring data scientists on how they can solidify their skills in this field without being employed, this can help you to secure your first job at any company.

Things to solidify you are skills as an aspiring data scientist:

  1. Think and Act as a data scientist.
  2. Share what you are doing with the community.

Think and Act as a data scientist:

Have you figured out what real data scientists are doing in their day to day work, if not you can just use any social media platform to connect with various professional data scientists and ask some questions Majorities start their learning journey by focusing on building high performing models using AB algorithms, I have been seen in different data science competition platforms people building models but after the competitions, they just don’t even document their solutions.

Don't even blem yourself i used to do that before, just add it to your to-do list.

Take every chance you get to show and perform you are working as a real data scientist for a certain business company by building end-to-end solutions. I’m mentioning this because no one wants to hire a data scientist for trial on their business, employers want people who are capable of solving business challenges.

Take into consideration that your first job will not be on giant companies like Google, Facebook, Amazon, Apple, or Netflix. Talking of these companies their infrastructure of data are well developed so well to say already in the data-driven culture so they have data engineers who capable of working on data operations and ETL pipelines to make sure data are ready for analysis where a data scientist, data analyst, Business Intelligence Analysts, and Business Analysts can derive insights and develop reports from well-organized data.

Let’s get into reality, just do a little scan on the people around you, especially in Africa how many data engineers are there employed on the certain companies around two or more big cities?

https://blog.neurotech.africa/content/images/2022/03/istockphoto-1252494221-612x612.jpg

Hope you get what I mean, So you should consider if you get employed in those companies you will be working from extracting and gathering data from disparate sources, organizing the data, performing analysis, and so on. Sometimes even building data infrastructures by yourself.Best practices when you are working on the business challenges  as a data scientist:

  1. Consider framing business challenges as data science challenge

Problem framing is a vital and often overlooked step in the data science process but I really you can't build the right and best solution without understanding the real thing stakeholders want from you as a data scientist. It is more important to take a step back and have a closer look at the problem we want to solve.Data framing is all about defining the problem you want to solve, Let’s take a look at a real-world example:

Stakeholders View:

Let’s assume for a moment that you are a data scientist at ABC company, then you receive a call from a certain Bank manager, after a bit back and forth, explains that they have a problem with defaulting loans and they need a program that predicts loans which are going to default in the future. Unfortunately, he must end the call now, but he'll catch up with you later. In the meanwhile, you start to make sense of the problem.Data science View:

While it’s clear for the manager that he provided you with all necessary information, you lean back in your chair, and recap the problem:

  • The bank lends money to customers today. The customer promises the bank to pay back the loan bit by bit over the next couple of months/years. Unfortunately, some of the customers are not able to do so and are going to default on the loan.
  • So far everything is fine. The bank will give you data of the past and you are instructed to make a prediction. Fair enough, but what specifically was there to predict again?
  • Do they need to know whether every single loan is going to default or not?
  • Are they more concerned about the default trend throughout the whole bank?
  • The nature of the problem can be solved as a classification or regression challenges.
  • In the case of classification challenge bank loans, could be categorized into two groups: loan defaulted and another group for loans still performing. In the case of regression challenge, could be the percentage of loans that are going to default in a given time or the total amount of money the bank will lose in a given month.

Then wait to hear from the stakeholders when you set the meeting with them and share with them what you are having as a data scientist. Then from that, you will get what technique will be efficient to solve the problem by matching bank requirements.

This part will help to select the best datasets for the mentioned problem.

Learn more about framing business challenges from here

2. Performing Exploratory Data Analysis(EDA)

The best part is to get a sense of the dataset by performing analysis using statistical techniques and visualization tools. It helps determine how best to manipulate data sources to get the answers you need, making it easier for data scientists to discover patterns, spot anomalies, test a hypothesis, or check assumptions.As a data scientist through EDA, you can discover missing values, inconsistencies within features, duplicates entries, and so onThen the next task is to clean the data you have observed from various angles, this can be done by solving all problems found with data.You can learn more about EDA from here.

3. Feature Engineering

Feature Engineering refers to the process of using domain knowledge to select and transform the most relevant variables from raw data when creating a predictive model using machine learning or statistical modeling.

From the insights obtained in the EDA part now, it is time to find the best way of merging some concepts by generating new features and transforming features, Feature extraction, and selecting the best features from the given datasets.Learn more about Feature engineering from here.

4. Model Building and Evaluation

The process of modeling means training a machine-learning algorithm to predict the labels from the selected features, tuning it for the business need, and validating it on holdout data.

The best option is to start with simple algorithms such as linear regression and logistic regressions, taking into account that no single machine learning algorithm is best for every data science problem so from understanding the challenge you are working on it will be smooth to select the best algorithms. But don’t forget that simple solution are the best solutions.

Then you should evaluate the model after training to understand their performances with various machine learning evaluations metrics. The best practice here is to use more than one evaluation metric in order to inspect the model performance.After you have selected the best model from many options it’s time to dig more into the understanding of the final model predictions, here you should use machine learning model interpretation tools to inspect your model it will help your solution in a few aspects like fairness, trust, reliability, and causality.Learn more about Machine learning model interpretation from here

5. Reporting, Insights, and Recommendations

Here you are trying to prepare something best to share with the stakeholders, simple because you can’t show codes to them, just design a simple and well-organized report list insights you have obtained from datasets, recommendations and share with them.

The report should include the conclusion from the datasets you worked on.

The importance of understanding the problem is worth it throughout the data science journey simply because you can’t draw conclusions from the solutions you won't even understand or interpret to stakeholders. You should be able to say what the machine learning model predicts and to what extent also in what case the model you built will not work well.

Consider documenting every project you are working on.

Share what you are doing with the community:

After building an end-to-end solution just consider documenting you're solution and share it with the community, the specialist will see your works and provide their feedback which will help you to grow.

But sometimes over the internet, you won’t expect who will get attracted to your works and get interested to work with you. Also, it will help other people to learn from you and improve their understanding.

Just you can reach out to experienced data scientists on LinkedIn or Twitter and ask them to review your work and you can even ask to set calls to discuss your work this will give you the right direction of improving the way you approach business challenges with data science technology.

At the time you request to work for the ABC company it won’t be hard for you because you had the experience and exposure.

Being a data scientist is not just about building models there are a lot of things to consider such as business domain understanding, storytelling, building end-to-end data science solutions, and so on.

Every time you join data science competitions or hackathons just focus on building simple and best solutions which can answer the business problem stakeholders wants to hear. At the end of the hackathon just take a few days to document your work and share it with the community. Put yourself in the shoes of the only data scientist for that company and you will be required to bring the best solution which will be used to optimize the value of their products.

Creating a data science solution doesn't start and end with model building. If you're an aspiring data scientist, I suggest acquiring some domain knowledge in the industry you want to work in.

Thank you if you find this article interesting and informative don’t hesitate to share with others, sharing is caring.

Discussion (0)