In the machine learning community two concepts usually popup very often, bias and variance. These two concepts provide a representation of the quality of a model in terms of how it adapts to the training data and how it adapts to new data.
It is related to a simple, rigid and constrained model, few or poor quality data, underfitting.
Models with high bias tend to miss relevant features in the data that is modeling, this miss of features can cause the model to underfit. This usually happens when the model is too simple and does not have enough capacity to work properly. It can also happen when even though the data is simple enough for the model, it contains too much noise, thus it requires a more complex model.
It is related to complex, variable, and adaptable models, overfitting.
Machine learning models with high variance tend to have a very high sensitivity, they are variable to any small noise and fluctuations that the data can have. This means that it will learn everything in the training set by memory (they are the worst generalizers). This models are too complex and overfit the training dataset.
All algorithms that learn from data have a tradeoff between variance and bias, the best model is the one that minimizes both, that is, a model that is complex enough to reduce the learning loss and simple enough to be able to generalize properly to new, unseen samples. Based on this tradeoff we can create a third tradeoff called the triple tradeoff (Dietterich 2003).
The first part of the tradeoff takes into account the complexity/capacity of the model, the model has to be complex enough to navigate through the noise and learn the underlying representation. For example, if we take random data samples from the following linear function
f(x) = x + noise(x) , the best model that can model this kind of data is a linear regression model. We need to be careful in not using a too complex model such as a polynomial or a neural network. As the complexity of the model increases, the generalization error decreases, but as we have seen before with the bias/variance tradeoff we need to stop increasing the complexity at some point; until we find the Optimal Model Complexity.
The amount of training data is very important, the more data we have, the more information we can learn about the distribution of the samples. As the amount of training data increases the, the generalization error decreases. This statement is true only up to a point because the noise that we can find in the dataset can confuse the model.
The generalization error decreases as we keep in check the other two parameters. The capacity of generalization of a machine learning model is the most important thing. When we learn from data, we want the model to be able to apply its knowledge to new situations if the model works perfectly with the training data but generalizes badly, it is useless and we might not even bother working with it.