The activation functions in PyTorch (5)

#pytorch #activationfunction #deeplearning #machinelearning

Buy Me a Coffee☕

*Memos:

My post explains Tanh() and Softsign().
My post explains Sigmoid() and Softmax().
My post explains Step function, Identity and ReLU.
My post explains Leaky ReLU, PReLU and FReLU.
My post explains ELU, SELU and CELU.
My post explains GELU, Mish, SiLU and Softplus.
My post explains Vanishing Gradient Problem, Exploding Gradient Problem and Dying ReLU Problem.

(1) Tanh:

can convert an input value(x) to the output value between -1 and 1. *0 and 1 are exclusive.
's formula is y = (e^x - e^-x) / (e^x + e^-x).
is also called Hyperbolic Tangent Function.
is Tanh() in PyTorch.
is used in:
- RNN(Recurrent Neural Network). *RNN in PyTorch.
- LSTM(Long Short-Term Memory). *LSTM() in PyTorch.
- GRU(Gated Recurrent Unit). *GRU() in PyTorch.
- GAN(Generative Adversarial Network).
s'pros:
- It normalizes input values.
- The convergence is stable.
- It mitigates Exploding Gradient Problem.
- It mitigates Dying ReLU Problem. *0 is still produced for the input value 0 so Dying ReLU Problem is not completely avoided.
s'cons:
- It causes Vanishing Gradient Problem.
- It's computationally expensive because of exponential and complex operation.
's graph in Desmos:

(2) Softsign:

can convert an input value(x) to the output value between 1 and -1. *1 and -1 are exclusive.
's formula is y = x / (1 + |x|).
is Softsign() in PyTorch.
's pros:
- It normalizes input values.
- The convergence is stable.
- It mitigates Exploding Gradient Problem.
- It mitigates Dying ReLU Problem. *0 is still produced for the input value 0 so Dying ReLU Problem is not completely avoided.
's cons:
- It causes Vanishing Gradient Problem.
's graph in Desmos:

(3) Sigmoid:

can convert an input value(x) to the output value between 0 and 1. *0 and 1 are exclusive.
's formula is y = 1 / (1 + e^-x).
is Sigmoid() in PyTorch.
is used in:
- Binary Classification Model.
- Logistic Regression.
- LSTM.
- GRU.
- GAN.
's pros:
- It normalizes input values.
- The convergence is stable.
- It mitigates Exploding Gradient Problem.
- It avoids Dying ReLU Problem.
's cons:
- It causes Vanishing Gradient Problem.
- It's computationally expensive because of exponential operation.
's graph in Desmos:

(4) Softmax:

can convert input values(xs) to the output values between 0 and 1 each and whose sum is 1(100%): *Memos:
- *0 and 1 are exclusive.
- If input values are [5, 4, -1], then the output values are [0.730, 0.268, 0.002] which is 0.730(73%) + 0.268(26.8%) + 0.002(0.2%) = 1(100%). *Sometimes, approximately 100%.
's formula is:
is Softmax() in PyTorch.
is used in:
- Multi-Class Classification Model.
's pros:
- It normalizes input values.
- The convergence is stable.
- It mitigates Exploding Gradient Problem.
- It avoids Dying ReLU Problem.
's cons:
- It causes Vanishing Gradient Problem.
- It's computationally expensive because of exponential and complex operation.
's graph in Desmos:

's code from scratch in PyTorch. *dim=0 must be set for sum() otherwise different values are returned for a different D(Dimensional) tensor.

import torch

def softmax(input):
    e_i = torch.exp(input)
    return e_i / e_i.sum(dim=0)
                       # ↑↑↑↑↑ Must be set.

my_tensor = torch.tensor([8., -3., 0., 1.])

print(softmax(my_tensor))
# tensor([9.9874e-01, 1.6681e-05, 3.3504e-04, 9.1073e-04])

my_tensor = torch.tensor([[8., -3.], [0., 1.]])

print(softmax(my_tensor))
# tensor([[9.9966e-01, 1.7986e-02],
#         [3.3535e-04, 9.8201e-01]])

my_tensor = torch.tensor([[[8.], [-3.]], [[0.], [1.]]])

print(softmax(my_tensor))
# tensor([[[9.9966e-01],
#          [1.7986e-02]],
#         [[3.3535e-04],
#          [9.8201e-01]]])

DEV Community

The activation functions in PyTorch (5)

Top comments (0)

Read next

Google's Gemini Pro 0801 vs. GPT-4: A New Benchmark in AI Excellence

Building Scalable ML Models on AWS-SageMake

How to Start Your Career in AI: A Beginner's Guide

The Role of Artificial Intelligence and Machine Learning in eKYC