DEV Community


Posted on

21 Data Science Terms Everyone Should Know

You’d agree that every field has their set of special words or expressions used that are difficult for others to understand.

In the realm of business, phrases such as 'trim the fat,' 'S.W.O.T.,' 'pain point,' and 'white paper' are commonly tossed around as industry jargon.

So, as you’d have guessed, Data Science just like every other field out there has its unique lexicon.

Hence, I've compiled a list of essential terms below to ensure we're all on the same page and moving towards a shared objective. Let's dive right in, shall we?

Learn Your ABC's:

Accuracy: The measure of how often a classification model correctly predicts outcomes among all instances it evaluates.

A/B Testing: A statistical method used to compare two versions of a product, webpage, or model to determine which performs better.

API (Application Programming Interface): A set of rules that allows one software application to interact with another.

BI (Business Intelligence): Technologies, processes, and tools that help organizations make informed business decisions.

Bias: An error in a model that causes it to consistently predict values away from the true values.

Correlation: A statistical measure that describes the degree of association between two variables.

Covariance: A measure of how much two random variables change together.

D and beyond :

Data Cleaning: The process of identifying and correcting errors or inconsistencies in datasets.

Data Mining: Extracting valuable patterns or information from large datasets.

Data Visualization: Presenting data in graphical or visual formats to aid understanding.

Exploratory Data Analysis (EDA): Analyzing and visualizing data to understand its characteristics and relationships.

False Positive and False Negative: Incorrect predictions in binary classification.

Gaussian Distribution: A type of probability distribution often used in statistical modeling.

Hypothesis Testing: A statistical method to test a hypothesis about a population parameter based on sample data.

Linear Regression: A statistical method for modeling the relationship between a dependent variable and one or more independent variables.

Null Hypothesis: A statistical hypothesis that assumes there is no significant difference between observed and expected results.

Predictive Analytics: Using data, statistical algorithms, and machine learning techniques to identify the likelihood of future outcomes.

P-value: The probability of obtaining a result as extreme as, or more extreme than, the observed result during hypothesis testing.

Standard Deviation: A measure of the amount of variation or dispersion in a set of values.

Variance: The degree of spread or dispersion of a set of values, and also the variability of model predictions.

Top comments (0)