DEV Community

El Marie 💜
El Marie 💜

Posted on

"R libraries to aid you to learn data science in 2018"

2018 is already here!What a year 2017 has been!
For someone who started learning data science later this year, it feels like the year has been short.The R learning curve may seem steep however continuous exposure to different tools and libraries/packages can make your experience simpler.
In this article, I share with you R packages under different branches of data science that have made my learning journey worthwhile so far.

Data Visualization

This is a very instrumental part of data science, for a data science newbie the ability to create great visualizations gives you the hope that you are on the right track.With great data visualizations comes a sense of appreciation for your work especially from none data scientists.The following packages will come in handy while visualizing in R.

1.ggplot2

This is an R package that a makes all that work of visualization much easy. It is known as the grammar of graphics and will take care of plotting details, has different graphical options and does great graph layering.
It is available on CRAN. Here is a great ggplot2 cheat sheet to get you started: ggplot2 cheat sheet

2.shiny

This an R package that gives users the power to explore dashboards and web apps.Shiny helps a lot with data collection and manipulation in real time as it handles reactivity in a great way.shiny apps can make use of HTML widgets, CSS themes and javascript actions to interface with R scripts.It is an awesome library for someone interested in data storytelling on their website.
shiny is available on CRAN.

Data Wrangling

One of the goals of every data scientist should be maximizing the data analysis time.To achieve this one needs to ensure the data they are working with is as clean as possible and can be subjected to manipulation easily.Data wrangling is the process of cleaning up data, removing redundancy and organizing it in a way that makes analysis much easier.The following packages are great and simple data wrangling tools.

1.tidyr

From the tidyr website,tidy data is defined as data where
  • Each variable is in a column.
  • Each observation is a row.
  • Each value is a cell. tidyr makes use of simple verbs as R functions like gather()to carry out quick data tidying operations on large datasets. tidyr is available on CRAN.

    2.dplyr

    While dealing with data, there are common manipulations that have to be carried out and dplyr helps solve these by providing verb functions to carry out these manipulations.This helps you filter your data and carry out operations that can group the data for deeper meaning. dplyr is s available on CRAN.

    Data Mining

    This is one of the biggest challenges for data science newbies.Although very many websites are full of open data sets and are free, It is also an accomplishing feeling for a data science newbie to learn how to extract a data set from the numerous sources of information on and off the web. The following libraries will do the magic:

    1.httr

    This package will enable you access data via modern web APIs. It makes use of HTTP verb functions, requests return JSON data that can be parsed as R objects and it supports Oauth. This makes it easy for a newbie working with APIs in R. This package is available on CRAN

    2.rvest

    An R package for web scraping. It reads HTML docs through URLs, selects parts of the document using the CSS selectors and parses HTML tables as data frames in R. This package is available on CRAN

    Conclusion

    The first days of data science can be a bit confusing, however focusing on each one of these branches can help you understand data science step by step. I wish you a great learning experience in 2018 .Dont stop learning.
    Feel free to reach out to me via twitter @lornamariak .I am happy to help and give some hype/support.Happy coding!

  • Top comments (0)