DEV Community

Super Kai (Kazuya Ito)
Super Kai (Kazuya Ito)

Posted on • Updated on

The activation functions in PyTorch (5)

Buy Me a Coffee

*Memos:

(1) Tanh:

  • can convert an input value(x) to the output value between -1 and 1. *0 and 1 are exclusive.
  • 's formula is y = (ex - e-x) / (ex + e-x).
  • is also called Hyperbolic Tangent Function.
  • is Tanh() in PyTorch.
  • is used in:
    • RNN(Recurrent Neural Network). *RNN in PyTorch.
    • LSTM(Long Short-Term Memory). *LSTM() in PyTorch.
    • GRU(Gated Recurrent Unit). *GRU() in PyTorch.
    • GAN(Generative Adversarial Network).
  • s'pros:
    • It normalizes input values.
    • The convergence is stable.
    • It mitigates Exploding Gradient Problem.
    • It mitigates Dying ReLU Problem. *0 is still produced for the input value 0 so Dying ReLU Problem is not completely avoided.
  • s'cons:
    • It causes Vanishing Gradient Problem.
    • It's computationally expensive because of exponential and complex operation.
  • 's graph in Desmos:

Image description

(2) Softsign:

  • can convert an input value(x) to the output value between 1 and -1. *1 and -1 are exclusive.
  • 's formula is y = x / (1 + |x|).
  • is Softsign() in PyTorch.
  • 's pros:
    • It normalizes input values.
    • The convergence is stable.
    • It mitigates Exploding Gradient Problem.
    • It mitigates Dying ReLU Problem. *0 is still produced for the input value 0 so Dying ReLU Problem is not completely avoided.
  • 's cons:
    • It causes Vanishing Gradient Problem.
  • 's graph in Desmos:

Image description

(3) Sigmoid:

  • can convert an input value(x) to the output value between 0 and 1. *0 and 1 are exclusive.
  • 's formula is y = 1 / (1 + e-x).
  • is Sigmoid() in PyTorch.
  • is used in:
    • Binary Classification Model.
    • Logistic Regression.
    • LSTM.
    • GRU.
    • GAN.
  • 's pros:
    • It normalizes input values.
    • The convergence is stable.
    • It mitigates Exploding Gradient Problem.
    • It avoids Dying ReLU Problem.
  • 's cons:
    • It causes Vanishing Gradient Problem.
    • It's computationally expensive because of exponential operation.
  • 's graph in Desmos:

Image description

(4) Softmax:

  • can convert input values(xs) to the output values between 0 and 1 each and whose sum is 1(100%): *Memos:
    • *0 and 1 are exclusive.
    • If input values are [5, 4, -1], then the output values are [0.730, 0.268, 0.002] which is 0.730(73%) + 0.268(26.8%) + 0.002(0.2%) = 1(100%). *Sometimes, approximately 100%.
  • 's formula is: Image description
  • is Softmax() in PyTorch.
  • is used in:
    • Multi-Class Classification Model.
  • 's pros:
    • It normalizes input values.
    • The convergence is stable.
    • It mitigates Exploding Gradient Problem.
    • It avoids Dying ReLU Problem.
  • 's cons:
    • It causes Vanishing Gradient Problem.
    • It's computationally expensive because of exponential and complex operation.
  • 's graph in Desmos:

Image description

  • 's code from scratch in PyTorch. *dim=0 must be set for sum() otherwise different values are returned for a different D(Dimensional) tensor.
import torch

def softmax(input):
    e_i = torch.exp(input)
    return e_i / e_i.sum(dim=0)
                       # ↑↑↑↑↑ Must be set.

my_tensor = torch.tensor([8., -3., 0., 1.])

print(softmax(my_tensor))
# tensor([9.9874e-01, 1.6681e-05, 3.3504e-04, 9.1073e-04])

my_tensor = torch.tensor([[8., -3.], [0., 1.]])

print(softmax(my_tensor))
# tensor([[9.9966e-01, 1.7986e-02],
#         [3.3535e-04, 9.8201e-01]])

my_tensor = torch.tensor([[[8.], [-3.]], [[0.], [1.]]])

print(softmax(my_tensor))
# tensor([[[9.9966e-01],
#          [1.7986e-02]],
#         [[3.3535e-04],
#          [9.8201e-01]]])
Enter fullscreen mode Exit fullscreen mode

Top comments (0)