*Memos:
- My post explains CGD(Classic Gradient Descent), Momentum and Nesterov's Momentum.
- My post explains Module().
SGD() can do the basic gradient descent with or without Momentum or Nesterov's Momentum as shown below. *SGD()
in PyTorch is Classic(Basic) Gradient Descent(CGD) but not Stochastic Gradient Descent(SGD):
*Memos:
- The 1st argument for initialization is
params
(Required-Type:generator
). - The 2nd argument for initialization is
lr
(Optional-Default:0.001
-Type:int
orfloat
). *It must be0 <= x
. - The 3rd argument for initialization is
momentum
(Optional-Default:0
-Type:int
orfloat
). *It must be0 <= x
. - The 4th argument for initialization is
dampening
(Optional-Default:0
-Type:int
orfloat
). - The 5th argument for initialization is
weight_decay
(Optional-Default:0
-Type:int
orfloat
). *It must be0 <= x
. - The 6th argument for initialization is
nesterov
(Optional-Default:False
-Type:bool
). *If it'sTrue
, Nesterov's Momentum is used while if it'sFalse
, Momentum is used. - There is
maximize
argument for initialization(Optional-Default:False
-Type:bool
). *maximize=
must be used. - There is
foreach
argument for initialization(Optional-Default:None
-Type:bool
). *foreach=
must be used. - There is
differentiable
argument for initialization(Optional-Default:False
-Type:bool
). *differentiable=
must be used. - There is
fused
argument for initialization(Optional-Default:None
-Type:bool
). *fused=
must be used. - Both
foreach
andfused
cannot beTrue
. - Both
differentiable
andfused
cannot beTrue
. - step() can update parameters.
- zero_grad() can reset gradients.
from torch import nn
from torch import optim
class MyModel(nn.Module):
def __init__(self):
super().__init__()
self.linear_layer = nn.Linear(in_features=4, out_features=5)
def forward(self, x):
return self.linear_layer(x)
mymodel = MyModel()
sgd = optim.SGD(params=mymodel.parameters())
sgd
# SGD (
# Parameter Group 0
# dampening: 0
# differentiable: False
# foreach: None
# fused: None
# lr: 0.001
# maximize: False
# momentum: 0
# nesterov: False
# weight_decay: 0
# )
sgd.state_dict()
# {'state': {},
# 'param_groups': [{'lr': 0.001,
# 'momentum': 0,
# 'dampening': 0,
# 'weight_decay': 0,
# 'nesterov': False,
# 'maximize': False,
# 'foreach': None,
# 'differentiable': False,
# 'fused': None,
# 'params': [0, 1]}]}
sgd.step()
sgd.zero_grad()
# None
sgd = optim.SGD(params=mymodel.parameters(), lr=0.001, momentum=0,
dampening=0, weight_decay=0, nesterov=False, maximize=False,
foreach=None, differentiable=False, fused=None)
sgd
# SGD (
# Parameter Group 0
# dampening: 0
# differentiable: False
# foreach: None
# fused: None
# lr: 0.001
# maximize: False
# momentum: 0
# nesterov: False
# weight_decay: 0
# )
Top comments (0)