DEV Community

Super Kai (Kazuya Ito)
Super Kai (Kazuya Ito)

Posted on • Updated on

SGD in PyTorch

Buy Me a Coffee

*Memos:

  • My post explains CGD(Classic Gradient Descent), Momentum and Nesterov's Momentum.
  • My post explains Module().

SGD() can do the basic gradient descent with or without Momentum or Nesterov's Momentum as shown below. *SGD() in PyTorch is Classic(Basic) Gradient Descent(CGD) but not Stochastic Gradient Descent(SGD):

*Memos:

  • The 1st argument for initialization is params(Required-Type:generator).
  • The 2nd argument for initialization is lr(Optional-Default:0.001-Type:int or float). *It must be 0 <= x.
  • The 3rd argument for initialization is momentum(Optional-Default:0-Type:int or float). *It must be 0 <= x.
  • The 4th argument for initialization is dampening(Optional-Default:0-Type:int or float).
  • The 5th argument for initialization is weight_decay(Optional-Default:0-Type:int or float). *It must be 0 <= x.
  • The 6th argument for initialization is nesterov(Optional-Default:False-Type:bool). *If it's True, Nesterov's Momentum is used while if it's False, Momentum is used.
  • There is maximize argument for initialization(Optional-Default:False-Type:bool). *maximize= must be used.
  • There is foreach argument for initialization(Optional-Default:None-Type:bool). *foreach= must be used.
  • There is differentiable argument for initialization(Optional-Default:False-Type:bool). *differentiable= must be used.
  • There is fused argument for initialization(Optional-Default:None-Type:bool). *fused= must be used.
  • Both foreach and fused cannot be True.
  • Both differentiable and fused cannot be True.
  • step() can update parameters.
  • zero_grad() can reset gradients.
from torch import nn
from torch import optim

class MyModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.linear_layer = nn.Linear(in_features=4, out_features=5)

    def forward(self, x):
        return self.linear_layer(x)

mymodel = MyModel()

sgd = optim.SGD(params=mymodel.parameters())
sgd
# SGD (
# Parameter Group 0
#     dampening: 0
#     differentiable: False
#     foreach: None
#     fused: None
#     lr: 0.001
#     maximize: False
#     momentum: 0
#     nesterov: False
#     weight_decay: 0
# )

sgd.state_dict()
# {'state': {},
#  'param_groups': [{'lr': 0.001,
#    'momentum': 0,
#    'dampening': 0,
#    'weight_decay': 0,
#    'nesterov': False,
#    'maximize': False,
#    'foreach': None,
#    'differentiable': False,
#    'fused': None,
#    'params': [0, 1]}]}

sgd.step()
sgd.zero_grad()
# None

sgd = optim.SGD(params=mymodel.parameters(), lr=0.001, momentum=0, 
            dampening=0, weight_decay=0, nesterov=False, maximize=False, 
            foreach=None, differentiable=False, fused=None)
sgd
# SGD (
# Parameter Group 0
#     dampening: 0
#     differentiable: False
#     foreach: None
#     fused: None
#     lr: 0.001
#     maximize: False
#     momentum: 0
#     nesterov: False
#     weight_decay: 0
# )
Enter fullscreen mode Exit fullscreen mode

Top comments (0)