*Memos:
- My post explains Cross Entropy Loss.
- My post explains BCELoss().
- My post explains BCEWithLogitsLoss().
CrossEntropyLoss() can get the 0D or more D tensor of the zero or more values(float
) computed by Cross Entropy Loss from the 1D or more D input
tensor and the 0D or more D target
tensor of zero or more elements as shown below: of zero or more elements as shown below:
*Memos:
- The 1st argument for initialization is
weight
(Optional-Default:None
-Type:tensor
offloat
). If not given, it's1
. - There is
ignore_index
argument for initialization(Optional-Default:-100
-Type:int
): *Memos:- It works for class indices so keep it negative for class probabilities otherwise there is error.
- There is
reduction
argument for initialization(Optional-Default:'mean'
-Type:str
). *'none'
,'mean'
or'sum'
can be selected. - There is
label_smoothing
argument for initialization(Optional-Default:0.0
-Type:float
). *It must be between[0, 1]
. - There are
size_average
andreduce
argument for initialization but they are deprecated. - The 1st argument is
input
(Required-Type:tensor
offloat
). *A 1D or more D tensor can be set. *softmax() or Softmax() is not needed to use for it becausesoftmax()
is used internally for it. - The 2nd argument is
target
(Required-Type:tensor
ofint
for class indices ortensor
offloat
for class probabilities): *Memos:- The
target
tensor whose size is different frominput
tensor is treated as class indices(The indices ofinput
tensor). *softmax()
orSoftmax()
is not needed to use for it because it just has the indices of the elements ofinput
tensor. - The
target
tensor whose size is same asinput
tensor is treated as the class probabilities(The sum is 100%) which should be between[0, 1]
. *softmax()
orSoftmax()
should be used for it becausesoftmax()
is not used internally for it. - A 0D or 1D tensor can be set for class indices.
- The
- The empty 1D or more D
input
andtarget
tensor withreduction='mean'
returnnan
. - The empty 1D or more D
input
andtarget
tensor withreduction='sum'
orreduction='none'
return-0.
. - For class indices:
- For class probabilities:
import torch
from torch import nn
""" `target` tensor with class indices. """
tensor1 = torch.tensor([[7.4, 2.8, -0.6, 6.3],
[-1.9, 4.2, 3.9, 5.1],
[9.3, -5.3, 7.2, -8.4]])
tensor2 = torch.tensor([3, 0, 2])
# [softmax([7.4, 2.8, -0.6, 6.3]),
# softmax([-1.9, 4.2, 3.9, 5.1]),
# softmax([9.3, -5.3, 7.2, -8.4])]
# ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓
# [[0.74446, 0.0074832, 0.00024974, <0.24781>], # 3
# [<0.00053368>, 0.23794, 0.17627, 0.58525], # 0
# [0.8909, 0.00000040657, <0.1091>, 0.000000018315]] # 2
# ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓
# [-ln(0.24781), -ln(0.00053368), -ln(0.1091)]
# ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓
# [1.3951, 7.5357, 2.2155]
# ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓
# 1.3951 + 7.5357 + 2.2155 = 11.1463
# 11.1463 / 3 = 3.7154
cel = nn.CrossEntropyLoss()
cel(input=tensor1, target=tensor2)
# tensor(3.7154)
cel
# CrossEntropyLoss()
print(cel.weight)
# None
cel.ignore_index
# -100
cel.reduction
# 'mean'
cel.label_smoothing
# 0.0
cel = nn.CrossEntropyLoss(weight=None,
ignore_index=-100,
reduction='mean',
label_smoothing=0.0)
cel(input=tensor1, target=tensor2)
# tensor(3.7154)
cel = nn.CrossEntropyLoss(reduction='sum')
cel(input=tensor1, target=tensor2)
# tensor(11.1463)
cel = nn.CrossEntropyLoss(reduction='none')
cel(input=tensor1, target=tensor2)
# tensor([1.3951, 7.5357, 2.2155])
cel = nn.CrossEntropyLoss(weight=torch.tensor([0., 1., 2., 3.]))
cel(input=tensor1, target=tensor2)
# tensor(1.7233)
cel = nn.CrossEntropyLoss(ignore_index=2)
cel(input=tensor1, target=tensor2)
# tensor(4.4654)
cel = nn.CrossEntropyLoss(label_smoothing=0.8)
cel(input=tensor1, target=tensor2)
# tensor(4.8088)
""" `target` tensor with class probabilities. """
tensor1 = torch.tensor([[7.4, 2.8, -0.6],
[6.3, -1.9, 4.2]])
tensor2 = torch.tensor([[3.9, 5.1, 9.3],
[-5.3, 7.2, -8.4]])
# [softmax([7.4, 2.8, -0.6]),
# softmax([6.3, -1.9, 4.2])]
# [softmax([3.9, 5.1, 9.3]),
# softmax([-5.3, 7.2, -8.4])]
# ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓
# [[0.98972(A1), 0.0099485(B1), 0.00033201(C1)],
# [0.89069(D1), 0.00024463(E1), 0.10907(F1)]]
# [[0.0044301(A2), 0.014709(B2), 0.98086(C2)],
# [0.0000037266(D2), 1.0(E2), 0.00000016788(F2)]])
# ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓
# [[ln(A1)*A2*1(w), ln(B1)*B2*1(w), ln(C1)*C2*1(w)],
# [ln(D1)*D2*1(w), ln(E1)*E2*1(w), ln(F1)*F2*1(w)]]
# ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓
# [[-0.00004578, -0.0678, -7.857],
# [-0.00000043139, -8.3157, -0.00000037198]]]
# ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓
# -((-0.00004578) + (-0.0678) + (-7.857)) = 7.9249
# -((-0.00000043139) + (-8.3157) + (-0.00000037198)) = 8.3157
# 7.9249 + 8.3157 = 16.2406
# 16.2406 / 2 = 8.1203
cel = nn.CrossEntropyLoss()
cel(input=tensor1, target=tensor2.softmax(dim=1))
# tensor(8.1203)
cel
# CrossEntropyLoss()
print(cel.weight)
# None
cel.ignore_index
# -100
cel.reduction
# 'mean'
cel.label_smoothing
# 0.0
cel = nn.CrossEntropyLoss(weight=None,
ignore_index=-100,
reduction='mean',
label_smoothing=0.0)
cel(input=tensor1, target=tensor2.softmax(dim=1))
# tensor(8.1203)
cel = nn.CrossEntropyLoss(reduction='sum')
cel(input=tensor1, target=tensor2.softmax(dim=1))
# tensor(16.2406)
cel = nn.CrossEntropyLoss(reduction='none')
cel(input=tensor1, target=tensor2.softmax(dim=1))
# tensor([7.9249, 8.3157])
cel = nn.CrossEntropyLoss(weight=torch.tensor([0., 1., 2.]))
cel(input=tensor1, target=tensor2.softmax(dim=1))
# tensor(12.0488)
cel = nn.CrossEntropyLoss(label_smoothing=0.8)
cel(input=tensor1, target=tensor2.softmax(dim=1))
# tensor(4.7278)
tensor1 = torch.tensor([])
tensor2 = torch.tensor([])
cel = nn.CrossEntropyLoss(reduction='mean')
cel(input=tensor1, target=tensor2.softmax(dim=0))
# tensor(nan)
cel = nn.CrossEntropyLoss(reduction='sum')
cel(input=tensor1, target=tensor2.softmax(dim=0))
# tensor(-0.)
cel = nn.CrossEntropyLoss(reduction='none')
cel(input=tensor1, target=tensor2.softmax(dim=0))
# tensor(-0.)
Top comments (0)