*Memos:
 My post explains layers in PyTorch.
 My post explains activation functions in PyTorch.
 My post explains optimizers in PyTorch.
A loss function is the function which can get the mean(average) of the sum of the losses(differences) between model's predictions and train or test data to optimize a model during training or to evaluate how good a model is during testing. *Loss function is also called Cost Function or Error Function.
There are popular loss functions as shown below:
(1) L1 Loss:
 can compute the mean(average) of the sum of the absolute losses(differences) between model's predictions and train or test data.
 's formula:
 is used for a regression model.
 is also called Mean Absolute Error(MAE).
 is L1Loss() in PyTorch. *My post explains
L1Loss()
.  's pros:
 It's less sensitive to outliers and anomalies.
 The losses can be easily compared because they are just made absolute so the range of them is not big.
 's cons:
(2) L2 Loss:
 can compute the mean(average) of the sum of the squared losses(differences) between model's predictions and train or test data.
 's formula:
 is used for a regression model.
 is also called Mean Squared Error(MSE).
 is MSELoss() in PyTorch. *My post explains
MSELoss()
.  's pros:
 All squared losses can be differentiable.
 's cons:
 It's sensitive to outliers and anomalies.
 The losses cannot be easily compared because they are squared so the range of them is big.
(3) Huber Loss:
 can do the similar computation of either L1 Loss or L2 Loss depending on the absolute losses(differences) between model's predictions and train or test data compared with
delta
which you set. *Memos:
delta
is 1.0 basically.  Be careful, the computation is not exactly same as L1 Loss or L2 Loss according to the formulas below.

 's formula. *The 1st one is L2 Losslike one and the 2nd one is L1 Losslike one:
 is used for a regression model.
 is HuberLoss() in PyTorch. *My post explains
HuberLoss()
.  with
delta
of 1.0 is same as Smooth L1 Loss which is SmoothL1Loss() in PyTorch.  's pros:
 It's less sensitive to outliers and anomalies.
 All losses can be differentiable.
 The losses can be more easily compared than L2 Loss because only small losses are squared so the range of them is smaller than L2 Loss.
 's cons:
 The computation is more than L1 Loss and L2 Loss because the formula is more complex than them.
(4) BCE(Binary Cross Entropy) Loss:
 can compute the mean(average) of the sum of the losses(differences) between model's binary predictions and binary train or test data.
 s' formula:
 is used for Binary Classification in Computer Vision:
*Memos:
 Binary Classification is the technology to classify data into two classes.
 Computer Vision is the technology which enables a computer to understand objects.
 is also called Binary Cross Entropy or Log(Logarithmic) Loss.
 is BCELoss() in PyTorch:
*Memos:

My post explains
BCELoss()
.  Basically, Sigmoid is applied before BCE Loss:
*Memos:
 My post explains Sigmoid.
 There is BCEWithLogitsLoss() in PyTorch which is the combination of Sigmoid and BCE Loss. *My post explains
BCEWithLogitsLoss()
.

My post explains
(5) Cross Entropy Loss:
 can compute the mean(average) of the sum of the losses(differences) between model's predictions and train or test data:
 s' formula:
 is used for Multiclass Classification in Computer Vision. *Multiclass Classification is the technology to classify data into multiple classes.
 is CrossEntropyLoss() in PyTorch. *My post explains
CrossEntropyLoss()
.  s' code from scratch in PyTorch:
import torch
y_pred = torch.tensor([7.4, 2.8, 0.6])
y_train = torch.tensor([3.9, 5.1, 9.3])
def cross_entropy(y_pred, y_train):
return torch.sum(y_train * torch.log(y_pred))
print(cross_entropy(y_pred.softmax(dim=0), y_train.softmax(dim=0)))
# tensor(7.9744)
y_pred = torch.tensor([[7.4, 2.8, 0.6], [1.3, 0.0, 4.2]])
y_train = torch.tensor([[3.9, 5.1, 9.3], [5.3, 7.2, 8.4]])
print(cross_entropy(y_pred.softmax(dim=1), y_train.softmax(dim=1)))
# tensor(12.2420)
 s' code with mean from scratch in PyTorch:
import torch
y_pred = torch.tensor([7.4, 2.8, 0.6])
y_train = torch.tensor([3.9, 5.1, 9.3])
def cross_entropy(y_pred, y_train): # ↓ ↓ mean ↓ ↓
return (torch.sum(y_train * torch.log(y_pred))) / y_pred.ndim
print(cross_entropy(y_pred.softmax(dim=0), y_train.softmax(dim=0)))
# tensor(7.9744)
y_pred = torch.tensor([[7.4, 2.8, 0.6], [1.3, 0.0, 4.2]])
y_train = torch.tensor([[3.9, 5.1, 9.3], [5.3, 7.2, 8.4]])
print(cross_entropy(y_pred.softmax(dim=1), y_train.softmax(dim=1)))
# tensor(6.1210)
Top comments (0)