DEV Community

Super Kai (Kazuya Ito)
Super Kai (Kazuya Ito)

Posted on

Activation functions in PyTorch (3)

Buy Me a Coffee

*Memos:

  • My post explains PReLU() and ELU().
  • My post explains SELU() and CELU().
  • My post explains Step function, Identity and ReLU.
  • My post explains Leaky ReLU, PReLU and FReLU.
  • My post explains GELU, Mish, SiLU and Softplus.
  • My post explains Tanh, Softsign, Sigmoid and Softmax.
  • My post explains Vanishing Gradient Problem, Exploding Gradient Problem and Dying ReLU Problem.
  • My post explains layers in PyTorch.
  • My post explains loss functions in PyTorch.
  • My post explains optimizers in PyTorch.

(1) ELU(Exponential Linear Unit):

  • can convert an input value(x) to the output value between aex - a and x: *Memos:
    • If x < 0, then aex - a while if 0 <= x, then x.
    • a is 1.0 by default basically.
  • is ELU() in PyTorch.
  • 's pros:
    • It normalizes negative input values.
    • The convergence with negative input values is stable.
    • It mitigates Vanishing Gradient Problem.
    • It mitigates Dying ReLU Problem. *0 is still produced for the input value 0 so Dying ReLU Problem is not completely avoided.
  • 's cons:
    • It's computationally expensive because of exponential operation.
    • It's non-differentiable at x = 0 if a is not 1.
  • 's graph in Desmos:

Image description

(2) SELU(Scaled Exponential Linear Unit):

  • can convert an input value(x) to the output value between λ(aex - a) and λx: *Memos:
    • If x < 0, then λ(aex - a) while if 0 <= x, then λx.
    • λ=1.0507009873554804934193349852946
    • α=1.6732632423543772848170429916717
  • is SELU() in PyTorch.
  • 's pros:
    • It normalises negative input values.
    • The convergence with negative input values is stable.
    • It mitigates Vanishing Gradient Problem.
    • It mitigates Dying ReLU Problem. *0 is still produced for the input value 0 so Dying ReLU Problem is not completely avoided.
  • 's cons:
    • It may cause Exploding Gradient Problem because a positive input value is increased by the multiplication with λ.
    • It's computationally expensive because of exponential operation.
    • It's non-differentiable at x = 0 if a is not 1.
  • 's graph in Desmos:

Image description

(3) CELU(Continuously Differentiable Exponential Linear Unit):

  • is improved ELU, being able to differentiate at x = 0 even if a is not 1.
  • can convert an input value(x) to the output value between aex/a - a and x: *Memos:
    • If x < 0, then aex/a - a while if 0 <= x, then x.
    • a is 1.0 by default basically.
  • 's formula is: Image description
  • is CELU() in PyTorch.
  • 's pros:
    • It normalises negative input values.
    • The convergence with negative input values is stable.
    • It mitigates Vanishing Gradient Problem.
    • It mitigates Dying ReLU Problem. *0 is still produced for the input value 0 so Dying ReLU Problem is not completely avoided.
  • 's cons:
    • It's computationally expensive because of exponential operation.
  • 's graph in Desmos:

Image description

Top comments (0)