DEV Community

Cover image for The Ultimate Guide to Getting Started in Data Science
Nelson chege
Nelson chege

Posted on

The Ultimate Guide to Getting Started in Data Science

believing that Artificial Intelligence and Machine learning is the next big step in the evolution of computer Technology, I found no better time to start learning Data Science than now.
thanks to @LuxAcademy and @DataFestAfrica I now have started my journey to be a data scientist

Having completed our first week of the Data Science and Machine Learning Bootcamp Marathon, Here are the topics i have been able to learn during that time:

introduction to python

having experience in python for sometime, this was a good refresher of some basic concepts that I had forgotten. This also acted as a reminder of the things I used to consider hard but now they are like a breeze,(to all who think its hard, just give it time)

anaconda and jupyter notebook installation

for a data scientist having anaconda installed on your machine will help you a lot. Because anaconda comes pre-installed with jupyter notebook, it was easy to get everything running very fast

relational database
having been in the market for approximately over 40 years, relational database have a way of storying data that are related to each other, it might be alit bit more work with designing and implementing the database compared to non-relational database, but querying the data from the database makes it worth all the work done before

introduction to python

having experience in python for some time, this was a good refresher of some basic concepts that I had forgotten. This also acted as a reminder of the things I used to consider hard but now they are like a breeze, (to all who think it’s hard, just give it time)

anaconda and jupyter notebook installation

for a data scientist having anaconda installed on your machine will help you a lot. Because anaconda comes pre-installed with jupyter notebook, it was easy to get everything running very fast

relational database

having been in the market for approximately over 40 years, relational database have a way of storying data that are related to each other, it might be alit bit more work with designing and implementing the database compared to non-relational database, but querying the data from the database makes it worth all the work done before

NumPy

Python having list, it has its advantages and disadvantages. One of these disadvantages is that it is slow to work on. that’s where NumPy comes in, it’s a python package that contains arrays that can be used.it is the faster compared to list because the package is implemented in c code. The NumPy is mostly used by other packages that I am going to discuss in this blog

Pandas

Pandas extends the array from NumPy into two major parts: series and dataframe. Of the two the mostly used is dataframe. You can think of dataframe as a matrix like shape

Matplotlib

As humans, we are more visual creatures. That’s were Matplotlib comes in, this a package that has inbuild graphs that are used to show visual representation of data

*seaborn *

As mentioned earlier, we are visual creatures and we also that visual representation to look nice. Seaborn is a packages that extends the matplotlib library and adds more styling on to the graphs

*byForest *

This is a package that contains other packages. once you have installed this package, you can use all the packages that are in it by importing it

PostgreSQL Connection

Having PostgreSQL connecting to your python script looks like a huge mountain to climb but it’s very easy, with a few lines of code you can connect your PostgreSQL database to your python script or project

Having just completed a week of the Data Science and Machine Learning Bootcamp Marathon.it is exciting how I was able to learn all this within a week. I won’t say I have mastered everything here but I can say that I have learnt the basic tools required in my Data Science career.

Here are some of the resources for the topics discussed above:

Get Started with Pandas In 5 mins: https://medium.com/bhavaniravi/python-pandas-tutorial-92018da85a33

A complete guide on NumPy for data science: https://medium.com/nerd-for-tech/a-complete-guide-on-numpy-for-data-science-c54f47dfef8d

An Introduction to Matplotlib: https://www.simplilearn.com/tutorials/python-tutorial/matplotlib

A Beginner’s Guide to matplotlib for Data Visualization and Exploration in Python: https://medium.com/analytics-vidhya/a-beginners-guide-to-matplotlib-for-data-visualization-and-exploration-in-python-3fb32d03c3cd

Seaborn — A Step by Step Guide to Catch Your Audience Using Data Visualization: https://python.plainenglish.io/seaborn-a-step-by-step-guide-to-catch-your-audience-part-1-42d9e6e30bea

Starting with Matplotlib and Seaborn: https://medium.datadriveninvestor.com/starting-with-matplotlib-and-seaborn-cba16c7beabf

Understand theft to use pyforest to simply package import.
Auto Import Python Libraries ( Using Pyforest to import important python libraries ) : https://towardsdatascience.com/auto-import-
python-libraries-d095a11b4cca

How to connect to a Postgres database with Python : https://medium.com/analytics-vidhya/how-to-setup-a-python-application-with-a-postgres-database-f965e7c1581e

thanks for reading
Happy Coding

Top comments (0)