DEV Community

Cover image for Why Data Cleaning?
SOMYA RAWAT
SOMYA RAWAT

Posted on

Why Data Cleaning?

~ "Garbage In, Garbage Out": Bad data will lead to bad results, plain and simple.
~ It's hard for computers to judge whether the data makes sense or not.
~ To get accurate results, you need to remove errors from you data which confuses the algorithms.
~ It's time-consuming process but important.

What are the causes?

  • Input Errors
  • Duplicates
  • Mangled Data
  • Malfunctioning Sensors
  • Lack of Standardization

Identifying Problems

  • Range Constraints
  • Data-Type
  • Compulsory Constraints
  • Unique Constraints
  • Cross Field Constraints

Data Cleaning Techniques

  • Removing missing data
  • Direct correction
  • Normalization
  • Syntax errors
  • Data Imputation
  • Spell Check
  • Filter Unwanted Outliers
  • Remove Irrelevant Values
  • Fix structural errors

Top comments (0)