DEV Community 👩‍💻👨‍💻

Cover image for 65 Blog Posts to Learn Data Science
Indie Developer
Indie Developer

Posted on

65 Blog Posts to Learn Data Science

The science of using computer programs to sift through thousands of data points and then using computer programs to present that data in a visual format.

1. 10 Free Python Programming Courses For Beginners to Learn Online

There is no doubt that Python is currently the world’s #1 programming language and the biggest advantage of that is it’s bringing more and more people into the programming world.

2. Installing Ubuntu 18.04 along with Windows 10 (Dual Boot Installation) for Deep Learning

A short Guide to installing Ubuntu 18.04 alongside windows 10 on your PC.

3. What is One Hot Encoding? Why and When Do You Have to Use it?

One hot encoding is a process by which categorical variables are converted into a form that could be provided to ML algorithms to do a better job in prediction.

4. Everything you need to know about Neural Networks

Courtesy: Kailash Ahirwar (Co-Founder & CTO, Mate Labs)

5. One Shot Learning with Siamese Networks in PyTorch

Deep neural networks are the go to algorithm when it comes to image classification. This is partly because they can have arbitrarily large number of trainable parameters. However, this comes at a cost of requiring a large amount of data, which is sometimes not available. I will discuss One Shot Learning, which aims to mitigate such an issue, and how to implement a Neural Net capable of using it ,in PyTorch.

6. Thinking of Self-Studying Machine Learning? Remind yourself of these 6 things

We were hosting a Meetup on robotics in Australia and it was question time.

7. The AI Hierarchy of Needs

As is usually the case with fast-advancing technologies, AI has inspired massive FOMO , FUD and feuds. Some of it is deserved, some of it not — but the industry is paying attention. From stealth hardware startups to fintech giants to public institutions, teams are feverishly working on their AI strategy. It all comes down to one crucial, high-stakes question: ‘How do we use AI and machine learning to get better at what we do?’

8. 🔥 Latest Deep Learning OCR with Keras and Supervisely in 15 minutes

Hello world. This tutorial is a gentle introduction to building modern text recognition system using deep learning in 15 minutes. It will teach you the main ideas of how to use Keras and Supervisely for this problem. This guide is for anyone who is interested in using Deep Learning for text recognition in images but has no idea where to start.

9. Introduction to Recommender System. Part 1 (Collaborative Filtering, Singular Value Decomposition)

A recommender system refers to a system that is capable of predicting the future preference of a set of items for a user, and recommend the top items. One key reason why we need a recommender system in modern society is that people have too much options to use from due to the prevalence of Internet. In the past, people used to shop in a physical store, in which the items available are limited. For instance, the number of movies that can be placed in a Blockbuster store depends on the size of that store. By contrast, nowadays, the Internet allows people to access abundant resources online. Netflix, for example, has an enormous collection of movies. Although the amount of available information increased, a new problem arose as people had a hard time selecting the items they actually want to see. This is where the recommender system comes in. This article will give you a brief introduction to two typical ways for building a recommender system, Collaborative Filtering and Singular Value Decomposition.

10. Machine learning — Is the emperor wearing clothes?

Machine learning uses patterns in data to label things. Sounds magical? The core concepts are actually embarrassingly simple. I say “embarrassingly” because if someone made you think it’s mystical, they should be embarrassed. Here, let me fix that for you.

11. How To Scrape Google With Python

Ever since Google Web Search API deprecation in 2011, I've been searching for an alternative. I need a way to get links from Google search into my Python script. So I made my own, and here is a quick guide on scraping Google searches with requests and Beautiful Soup.

12. Choosing the Right Machine Learning Algorithm

Machine learning is part art and part science. When you look at machine learning algorithms, there is no one solution or one approach that fits all. There are several factors that can affect your decision to choose a machine learning algorithm.

13. R vs Python: What’s The Difference?

With the massive growth in the importance of Big Data, machine learning, and data science in the software industry or software service companies, two languages have emerged as the most favourable ones for the developers. R and Python have become the two most popular and favourite languages for the data scientists and data analysts. Both of these are similar, yet, different in their ways which makes it difficult for the developers to pick one out of the two.

14. Ubuntu 18.04 Deep Learning Environment Setup

Deep Learning on Ubuntu 18.04 isn’t officially supported since the CUDA Libraries aren’t officially supported by the OS yet.

15. How to Transform Your Data Into a Voice AI Knowledge Assistant

RAIN executives give a full breakdown of the build out and power of AI Voice Assistants.

16. 9 unusual problems that can be solved using Data Science

2. With donations to political parties reaching a new high, political strategists have started playing a big role in election campaigns of big political parties. However by using big data and data science an edge can be achieved in this field. For ex:- User targeted posts on social media, region wise campaigns highlighting local problems and creating positive image of a party can easily be done using Big Data and Data Science.

17. Boosting and Bagging: How To Develop A Robust Machine Learning Algorithm

Machine learning and data science require more than just throwing data into a python library and utilizing whatever comes out.

18. Announcing Camelot, a Python Library to Extract Tabular Data from PDFs

The PDF (Portable Document Format) was born out of The Camelot Project to create “a universal way to communicate documents across a wide variety of machine configurations, operating systems and communication networks”. Basically, the goal was to make documents viewable on any display and printable on any modern printer. PDF was built on top of PostScript (a page description language), which had already solved this “view and print anywhere” problem. PDF encapsulates the components required to create a “view and print anywhere” document. These include characters, fonts, graphics and images.

19. Taking Data Visualization to Another Level

When you tend to use one library for a certain period of time, you get used to it. But, you need to evolve and learn something new every day. If you are still stuck up with Matplotlib(Which is amazing), Seaborn(This is amazing too), Pandas(Basic, yet easy Visualization) and Bokeh, You need to move on and try something new. Many amazing visualization libraries are available in python, which turns to be very versatile. Here, I’m going to discuss about these amazing libraries:

20. The Problem With Machine Learning In Healthcare

Recently an article by the Wall Street Journal has been floating around online that discussed how models will run the world. I believe there is a lot of truth to that. Machine learning algorithms and models are becoming both ubiquitous and more trusted across industries. This, in turn will lead to us as humans spending less time questioning the output and simply allowing the system to tell us the answer. We already rely on companies like Google, Facebook and Amazon to inform us on ideas for dates, friends birthdays and what the best products are. Some of us don’t even think twice when it comes to the answers we receive from these companies.

21. Learn Data Engineering: My Favorite Free Resources

By Benjamin Rogojan originally posted here

22. Minimalistic Learning Path to Become a Data Scientist

Data has been around us forever, but ever since the day Harvard Business Review announced that ‘Data Scientist is The Sexiest Job of the 21st Century’, the demand for a new job role — Data Scientist has peaked and HR departments across industries have been assigned with this toughest task of recruiting ‘Data Scientist’ which is almost as equivalent as ‘The Martians’ — the never-seens.

23. How not to do Fast.ai (or any ML MOOC)

This post serves as a little guide to the newer fast.ai students. The MOOC’s third iteration goes live in Jan ‘19.

24. Implementation of Gaussian Naive Bayes in Python from scratch

Naive Bayes is a very handy, popular and important Machine Learning Algorithm especially for Text Analytics and General Classification. It has many different configurations namely:

25. Explaining p-values with puppies

You’ll find p-values lurking all over data science (and all the rest of science, for that matter). If you took STAT101, the explanation you probably heard runs something like this: A p-value is the probability of observing a statistic at least as extreme as ours, conditional on the null hypothesis. No wonder that didn’t stick! Let’s try it with puppies instead…

26. Introduction to Numpy -1 : An absolute beginners guide to Machine Learning and Data science.

Lets get started quickly. Numpy is a math library for python. It enables us to do computation efficiently and effectively. It is better than regular python because of it’s amazing capabilities.

27. Integrating Bokeh visualisations into Django Projects.

Despite being a python developer for years only recently have I needed to interact with Django. While exploring Django, I decided I wanted to learn a little more about Bokeh the visualisation library. I tried to integrate it into my django project and found it challenging to find a complete tutorial. I thought I would create a post outlining the steps to integrate Bokeh into Django in case anyone finds it useful.

28. Explainable AI won’t deliver. Here’s why.

Explainable AI (XAI) is getting a lot of attention these days and if you’re like most people, you’re drawn to it because of the conversation around AI and trust. If so, bad news: it can’t deliver the protection you’re hoping for. Instead, it provides a good source of incomplete inspiration.

29. Top 5 Data Science and Machine Learning Course for Programmers

Many programmers are moving towards data science and machine learning hoping for better pay and career opportunities — and there is a reason for it. The Data scientist has been ranked the number one job on Glassdoor for last a couple of years and the average salary of a data scientist is over $120,000 in the United States according to Indeed.

30. 5 Free R Programming Courses for Data Scientists and ML Programmers

More and more programmers are learning R programming language to become a Data Scientist, one of the hottest and high paying technical jobs on the planet.

31. Where’s Waldo : Terminator Edition

This post is inspired by material studied while interning with @jeremyphoward and @math_rachel‘s fast.ai, in particular Lesson 14 of their course Cutting Edge Deep Learning for Coders, taught at USF’s Data Institute. If you’d like to see my end-to-end code for this project, please check out my repository There’s Waldo.

32. Building Python Data Science Container using Docker

Artificial Intelligence(AI) and Machine Learning(ML) are literally on fire these days. Powering a wide spectrum of use-cases ranging from self-driving cars to drug discovery and to God knows what. AI and ML have a bright and thriving future ahead of them.

33. Learning languages very quickly — with the help of some very basic Data Science

I moved to Sweden 6 months ago with my girlfriend. It is a great country for expats like us, because almost everyone speaks really good English here. Even so, we would like to learn some Swedish, just to understand a bit more in daily conversation and about Swedish culture.

34. Intro to Pandas: -1 : An absolute beginners guide to Machine Learning and Data science.

Pandas is hands down one of the best libraries of python. It supports reading and writing excel spreadsheets, CVS's and a whole lot of manipulation. It is more like a mandatory library you need to know if you’re dealing with datasets from excel files and CSV files. i.e for Machine learning and data science.

35. What is a Data Lake and How to Create One for Your Business

If you are following the trends in data science, it is more likely that you have heard the words big data, analytics, and machine learning. These days everyone wants to jump into this area of data science. Many of the software giants like Google, Amazon, Microsoft & etc. are already leading the way.

36. How I Created a Bitcoin Trading Algorithm With a 29% Return Rate Using Sentiment Analysis

TL;DR: I’ve created a formula that predicts whether you should buy or sell Bitcoin based on daily exchange price data and Google Trends keyword sentiment. The model produced a 29% return over 90 days for a $28,839 profit.

37. What on earth is data science?

Behold my pithiest attempt: “Data science is the discipline of making data useful.” Feel free to flee now or stick around of a tour of its three subfields.

38. Wake-On-Lan through the internet

My daily work usually starts by opening an SSH connection to a server, running a docker image (with RStudio Server or Jupyter on it), and analyzing data or programming directly on the browser. It was always convenient like that until I got sudden disconnection last month. Suddenly anything stops working, and I wasted several hours hopelessly trying to fix it. When I went home, I figured out that the electricity in my apartment was very unstable due to a small construction upstairs. Of course, a simple solution is getting a UPS (Uninterruptible Power Supply), but I was fascinated by the idea that maybe I can turn on my server over the internet. That is like having the server’s power switch with me all the time. It would be so cool, especially when I’m away for an extended period and don’t want to waste money on the energy bills. This story is about how I’ve done it. You can have a look at my final network setup first.

39. The best Machine & Deep Learning books

Machine and Deep learning are one of the hottest fields in the recent years. We are witnessing tremendous achievements in almost any industry in the world thanks to the talented researchers who know how to harvest Machine learning in order to create amazing products.

40. What is Image Annotation? – An Intro to 5 Image Annotation Services

Image annotation is one of the most important tasks in computer vision. With numerous applications, computer vision essentially strives to give a machine eyes – the ability to see and interpret the world. At times, machine learning projects seem to unlock futuristic technology we never thought possible. AI-powered applications like augmented reality, automatic speech recognition, and neural machine translation have the potential to change lives and businesses around the world. Likewise, the technologies that computer vision can give us (autonomous vehicles, facial recognition, unmanned drones) are extraordinary.

41. Python/Flask Data Visualization & Interactive Maps

Have you ever wanted to create an interactive data visualization map? In my most recent side project, I created a pretty cool visualization for how a virus might spread across the United States. If you want to check out the finished site, you can click here:

42. Pornhub Growth Hack During Coronavirus Pandemic

The 2019–20 coronavirus pandemic is an ongoing pandemic of coronavirus disease 2019 (COVID-19), caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). The outbreak was first identified in Wuhan, Hubei, China, in December 2019, and was recognized as a pandemic by the World Health Organization (WHO) on 11 March 2020.

43. Using AI to Super Compress Images

Data driven algorithms like neural networks have taken the world by storm. Their recent surge is due to several factors, including cheap and powerful hardware, and vast amounts of data. Neural Networks are currently the state of the art when it comes to ‘cognitive’ tasks like image recognition, natural language understanding , etc. ,but they don’t have to be limited to such tasks. In this post I will discuss a way to compress images using Neural Networks to achieve state of the art performance in image compression , at a considerably faster speed.

44. Introduction to Web Scraping using Python

One of the most efficient ways to collect the data as a data scientist is with the help of web scraping.

45. Top 10 JavaScript Charting Libraries for Every Data Visualization Need

Nowadays, the amount of data grows exponentially, and the more information we see, the harder it gets to process it. That’s why we need data visualization — in charts and dashboards, preferably interactive. It helps us humans save a lot of time and effort to view, analyze, and understand data, and make the right, informed decisions based on that.

46. 6 Biggest Limitations of Artificial Intelligence Technology

While the release of GPT-3 marks a significant milestone in the development of AI, the path forward is still obscure. There are still certain limitations to the technology today. Here are six of the major limitations facing data scientists today.

47. 12+ High Paying Technology Jobs for Software Engineers and Computer Programmers

If you are a computer science graduate or someone who is thinking to make a career in software development world or an experienced programmer who is thinking about his next career move but not so sure which field you should go then you have a come to the right place.

48. Automatic Feature Selection in Python: An Essential Guide

Feature Selection in python is the process where you automatically or manually select the features in the dataset that contribute most to your prediction.

49. NLP Tutorial: Topic Modeling in Python with BerTopic

Topic modeling is an unsupervised machine learning technique that can automatically identify different topics present in a document (textual data). Data has become a key asset/tool to run many businesses around the world. With topic modeling, you can collect unstructured datasets, analyzing the documents, and obtain the relevant and desired information that can assist you in making a better decision.

50. How Many Cryptocurrencies Are Simply Following the Market?

In the last few days, we’ve experienced a massive rout in the cryptocurrency market.

51. TensorFlow Tutorial For Beginners

Originally published at https://www.datacamp.com/community/tutorials/tensorflow-tutorial

52. 160+ Data Science Interview Questions

A typical interview process for a data science position includes multiple rounds. Often, one of such rounds covers theoretical concepts, where the goal is to determine if the candidate knows the fundamentals of machine learning.

53. How I built a spreadsheet app with Python to make data science easier

Today I'm open sourcing "Grid studio", a web-based spreadsheet application with full integration of the Python programming language.

54. How To use Google Colab with VS Code

Google Colab and VS Code are popular editor tools. Learn how you can use Google Colab with VS Code and take advantage of a full-fledged code editor.

55. 7 Effective Ways to Deal With a Small Dataset

In a real-world setting, you often only have a small dataset to work with. Models trained on a small number of observations tend to overfit and produce inaccurate results. Learn how to avoid overfitting and get accurate predictions even if available data is scarce.

56. 10 Machine Learning, Data Science, and Deep Learning Courses for Programmers in 2020

A curated list of courses to learn data science, machine learning, and deep learning fundamentals.

57. 3 Best Ways To Import External Data Into Google Sheets [Automatically]

Google Sheets is a great tool to use for business intelligence and data analysis. If you want to eliminate manual data imports and save time, then let me will show you how you can automatically connect and import data from external sources into Google Sheets.

58. Top C/C++ Machine Learning Libraries For Data Science

Importance of C++ in Data Science and Big Data

59. Text Processing and Sentiment Analysis of Twitter Data

A complete guide to text processing using Twitter data and R.

60. Data Preprocessing: 6 Necessary Steps for Data Scientists

Hello everyone, I am back with another topic which is Data Preprocessing. This is a part of the data analytics and machine learning process that data scientists spend most of their time on. In this article, I'll dive into the topic, why we use it, and the necessary steps.

61. How To Plot A Decision Boundary For Machine Learning Algorithms in Python

Classification algorithms learn how to assign class labels to examples (observations or data points), although their decisions can appear opaque.

62. 3 Best Ways To Import JSON To Google Sheets [Ultimate Guide]

3 ways to pull JSON data into a Google Spreadsheet

63. Technical Data Science Interview Questions: SQL and Coding

A data science interview consists of multiple rounds. One of such rounds involves theoretical questions, which we covered previously in 160+ Data Science Interview Questions.

64. Intro to Audio Analysis: Recognizing Sounds Using Machine Learning

65. Python for Data Science: How to Scrape Website Data via the Internet's Top 300 APIs

In this post we are going to scrape websites to gather data via the API World's top 300 APIs of year. The major reason of doing web scraping is it saves time and avoid manual data gathering and also allows you to have all the data in a structured form.

66. How to Build a Web Scraper With Python [Step-by-Step Guide]

On my self-taught programming journey, my interests lie within machine learning (ML) and artificial intelligence (AI), and the language I’ve chosen to master is Python.

data-science

Photo credit, HackerNoon AI

Top comments (0)

50 CLI Tools You Can't Live Without

The top 50 must-have CLI tools, including some scripts to help you automate the installation and updating of these tools on various systems/distros.