DEV Community

Super Kai (Kazuya Ito)
Super Kai (Kazuya Ito)

Posted on • Updated on

The activation functions in PyTorch (1)

Buy Me a Coffee

*Memos:

An activation function is the function or layer which enables neural network to learn complex(non-linear) relationships by transforming the output of the previous layer. *Without activation functions, neural network can only learn linear relationships.

(1) Step function:

  • can convert an input value(x) to 0 or 1. *If x < 0, then 0 while if x >= 0, then 1.
  • is also called Binary step function, Unit step function, Binary threshold function, Threshold function, Heaviside step function or Heaviside function.
  • is heaviside() in PyTorch.
  • 's pros:
    • It's simple, only expressing the two values 0 and 1.
    • It avoids Exploding Gradient Problem.
  • 's cons:
    • is rarely used in Deep Learning because the cons are more than other activation functions.
    • It can only express the two values 0 and 1 so the created model has bad accuracy, predicting inaccurately. *The activation functions which can express wider values can create the model of good accuracy, predicting accurately.
    • It causes Dying ReLU Problem.
    • It's non-differentiable at x = 0. *The gradient for step function doesn't exist at x = 0 during Backpropagation which does differential to calculate and get a gradient.
  • 's graph in Desmos:

Image description

(2) Identity:

  • can just return the same value as an input value(x) without any conversion.
  • 's formula is y = x.
  • is also called Linear function.
  • is Identity() in PyTorch.
  • 's pros:
    • It's simple, just returning the same value as an input value.
  • 's cons:
    • It's non-differentiable at x = 0.
  • 's graph in Desmos:

Image description

(3) ReLU(Rectified Linear Unit):

  • can convert an input value(x) to the output value between 0 and x. *If x < 0, then 0 while if 0 <= x, then x.
  • 's formula is y = max(0, x).
  • is ReLU() in PyTorch.
  • is used in:
    • Binary Classification Model.
    • Multi-Class Classification Model.
    • CNN(Convolutional Neural Network).
    • RNN(Recurrent Neural Network). *RNN in PyTorch.
    • Transformer. *Transformer() in PyTorch.
    • NLP(Natural Language Processing) based on RNN.
    • GAN(Generative Adversarial Network).
  • 's pros:
    • It mitigates Vanishing Gradient Problem.
  • 's cons:
    • It causes Dying ReLU Problem.
    • It's non-differentiable at x = 0.
  • 's graph in Desmos:

Image description

Top comments (0)