DEV Community

Cover image for Self Study: Data Science - Machine Learning journey : Day 2 (Statistics | R | Python | Anaconda | Jupyter)

Self Study: Data Science - Machine Learning journey : Day 2 (Statistics | R | Python | Anaconda | Jupyter)

iamvigneshc profile image Vignesh C Updated on ・2 min read


Statistics is generally considered as one of the prerequisites to study machine learning. We need statistics to help transform observations into information and to answer questions about samples of observations.

Statistics is needed in Machine Learning for..

Alt Text

Another prerequisite to data science - machine learning is a programming language - R or Python. R is used for statistical analysis to build models while Python is used beyond statistics with wide range of libraries and having better integration with other programming languages.

Applied Statistics:

Two broad categories in the field of statistics:

  1. Descriptive statistics
  2. Inferential statistics

Descriptive statistics is the process of categorizing and describing the information.

Inferential statistics includes the process of analyzing a sample of data and using it to draw inferences about the population from which it was drawn.

We need to get familiarized with all these concepts to continue our machine learning journey effectively. Most of these concepts would have been covered as part of our graduate degree.

Alt Text

Install R Studio

Install R and R Studio Desktop for your version of OS from here..

Sample R code to illustrate AUC and ROC from Day 1:

Install Python

You can install and use python through command line or through Anaconda which come along with a tutorial, reference for various libraries.

Once installed, you shall open JupyterLab or Jupyter notebook and work on Python.

Some of my samples to get started:

Discussion (0)

Editor guide