The activation functions in PyTorch (3)

#pytorch #activationfunction #deeplearning #machinelearning

*Memos:

My post explains PReLU() and ELU().
My post explains SELU() and CELU().
My post explains Step function, Identity and ReLU.
My post explains Leaky ReLU, PReLU and FReLU.
My post explains GELU, Mish, SiLU and Softplus.
My post explains Tanh, Softsign, Sigmoid and Softmax.
My post explains Vanishing Gradient Problem, Exploding Gradient Problem and Dying ReLU Problem.

(1) ELU(Exponential Linear Unit):

can convert an input value(x) to the output value between ae^x - a and x: *Memos:
- If x < 0, then ae^x - a while if 0 <= x, then x.
- a is 1.0 by default basically.
is ELU() in PyTorch.
's pros:
- It normalizes negative input values.
- The convergence with negative input values is stable.
- It mitigates Vanishing Gradient Problem.
- It mitigates Dying ReLU Problem. *0 is still produced for the input value 0 so Dying ReLU Problem is not completely avoided.
's cons:
- It's computationally expensive because of exponential operation.
- It's non-differentiable at x = 0 if a is not 1.
's graph in Desmos:

(2) SELU(Scaled Exponential Linear Unit):

can convert an input value(x) to the output value between λ(ae^x - a) and λx: *Memos:
- If x < 0, then λ(ae^x - a) while if 0 <= x, then λx.
- λ=1.0507009873554804934193349852946
- α=1.6732632423543772848170429916717
is SELU() in PyTorch.
's pros:
- It normalises negative input values.
- The convergence with negative input values is stable.
- It mitigates Vanishing Gradient Problem.
- It mitigates Dying ReLU Problem. *0 is still produced for the input value 0 so Dying ReLU Problem is not completely avoided.
's cons:
- It may cause Exploding Gradient Problem because a positive input value is increased by the multiplication with λ.
- It's computationally expensive because of exponential operation.
- It's non-differentiable at x = 0 if a is not 1.
's graph in Desmos:

(3) CELU(Continuously Differentiable Exponential Linear Unit):

is improved ELU, being able to differentiate at x = 0 even if a is not 1.
can convert an input value(x) to the output value between ae^x/a - a and x: *Memos:
- If x < 0, then ae^x/a - a while if 0 <= x, then x.
- a is 1.0 by default basically.
's formula is:
is CELU() in PyTorch.
's pros:
- It normalises negative input values.
- The convergence with negative input values is stable.
- It mitigates Vanishing Gradient Problem.
- It mitigates Dying ReLU Problem. *0 is still produced for the input value 0 so Dying ReLU Problem is not completely avoided.
's cons:
- It's computationally expensive because of exponential operation.
's graph in Desmos:

DEV Community