Some concepts and answers one should know in machine learning before going further. Wrote them in my weekend, because I was bored and I thought to dive into some data science, machine learning and statistics.
Regression is a supervised learning technique which helps us in finding the correlation between the dependent target variable and one or more independent predicting variable(s).
It is mainly used for predictions, weather forecasting, or determining the causal-effect relationship between variables.
In Regression, we plot a graph between the variables which best fits in the given data points or dataset, using this plot, the machine learning model can make predictions about the data.
Fact: The method in which we analyse such type of regression is called Regression Analysis, which is indeed a statistical analysis to model the relationship between a dependent (target) and independent (predictor) variables with one or more independent variables.
Before we get to know the actual meaning of Geometric machine learning model, we need to know what's Euclidean data and non-Euclidean data.
So, basically most of the data and information that we get to know or have, is Euclidean data (most of it).
Euclidean data is just the datatypes that exists in 1-dimension or 2-dimension domain.
Euclidean data consist of audio, images, videos, numbers, some text and more similar type of data.
A non-Euclidean data is much like that we can't really explain in simple ways and it require much like 3-dimensional explanation, for example a molecular structure or some hierarchy or a tree structure.
Here a molecular structure or a network structure is a 3D structure which will fall into non-Euclidean data.
After all above, Geometric machine learning model or Geometric Deep learning (GDL) is the model that aims to deal with or learn from the non-Euclidean data, or Geometric machine learning model or Geometric Deep Learning (GDL) is the niche field under the umbrella of deep learning that aims to build neural networks that can learn from non-euclidean data.
Analysis of Variance is a test which an analyst might perform to check the difference among the means of the population by examining the amount of variation within each of the sample. ANOVA is the statistical approach of testing or comparing the two datasets, this is one of the best applied test when we have more that 2 populations or samples to be compared.
But to compare the means of two or more populations or datasets, ANOVA would be appreciated if the samples of datasets satisfies the following:-
- Independence of case -> this assumption means that the case or samples of dependent variable should be independent or selected randomly, there should not be any sequence or pattern to be followed when selecting sample datasets.
- Normality -> this means that the distribution of each group should be normal.
- Homogeneity -> this means that the variance between each group should be same.
ANOVA has basically 3-types.
- One way analysis: when we are comparing 3 or more than 3 groups based on 1 factor variable, then it's said to be one way analysis of that group.
Example: if we want to compare whether or not the mean output of three employees is same or not, based on the working hours of the three employees.
- Two way analysis: when there are two or more factor variables in a comparison then it is said to be two way analysis of those groups.
Example: if we compare whether or not the mean output of three employees is same or not based on their working hours and working locations.
- K-way analysis: When factor variables are k, then it is said to be the k-way analysis of variance(ANOVA).
Before we dive in both of these tests, we need to know what is Hypothesis testing. A more descriptive article about this can be found here.
So hypothesis testing is method for us to test if our hypothesis is true or false, about the population using the sample dataset. Through hypothesis testing we can know if we have enough evidence about the population, and conclude if our hypothesis is true or false.
So what is Z-test and T-test? Well both are Parametric tests i.e they rely on statistical distribution of the dataset. To know more about Parametric tests, check this out!
Z-test is a hypothesis test which ascertains the averages of the two datasets are different from each other, when standard deviation or variance is given. Z-test is usually preferred over T-test when in a dataset, sample-size > 30. It is based on Normal distribution and all data points are independent.
T-test is also a hypothesis test which is used when either the standard deviation or variance is not known or the sample-size < 30, in a dataset. This states how averages of two datasets differ from each other.
This test is based on t-distribution and data points are not dependent.
Hope this was informative for you! I might post few more briefly explained questions on similar topics.
Thanks for reading!