*Memos:
Adam() can do gradient descent by Momentum and RMSProp as shown below:
*Memos:
- The 1st argument for initialization is
params
(Required-Type:generator
). - The 2nd argument for initialization is
lr
(Optional-Default:0.01
-Type:int
orfloat
). *It must be0 <= x
. - The 3rd argument for initialization is
betas
(Optional-Default:(0.9, 0.999)
-Type:tuple
orlist
ofint
orfloat
). *It must be0 <= x < 1
. - The 4th argument for initialization is
eps
(Optional-Default:1e-08
-Type:int
orfloat
). *It must be0 <= x
. - The 5th argument for initialization is
weight_decay
(Optional-Default:0
-Type:int
orfloat
). *It must be0 <= x
. - The 6th argument for initialization is
amsgrad
(Optional-Default:False
-Type:bool
). *If it'sTrue
, AMSGrad is used. - There is
foreach
argument for initialization(Optional-Default:None
-Type:bool
). *foreach=
must be used. - There is
maximize
argument for initialization(Optional-Default:False
-Type:bool
). *maximize=
must be used. - There is
capturable
argument for initialization(Optional-Default:False
-Type:bool
). *capturable=
must be used. - There is
differentiable
argument for initialization(Optional-Default:False
-Type:bool
). *differentiable=
must be used. - There is
fused
argument for initialization(Optional-Default:None
-Type:bool
): *Memos:- If it's
True
, all the parameters must be thefloat
tensors ofcuda
,xpu
orprivateuseone
. -
fused=
must be used.
- If it's
- Both
foreach
andfused
cannot beTrue
. - Both
differentiable
andfused
cannot beTrue
. - step() can update parameters.
- zero_grad() can reset gradients.
from torch import nn
from torch import optim
class MyModel(nn.Module):
def __init__(self):
super().__init__()
self.linear_layer = nn.Linear(in_features=4, out_features=5)
def forward(self, x):
return self.linear_layer(x)
mymodel = MyModel()
adam = optim.Adam(params=mymodel.parameters())
adam
# Adam (
# Parameter Group 0
# amsgrad: False
# betas: (0.9, 0.999)
# capturable: False
# differentiable: False
# eps: 1e-08
# foreach: None
# fused: None
# lr: 0.001
# maximize: False
# weight_decay: 0
# )
adam.state_dict()
# {'state': {},
# 'param_groups': [{'lr': 0.001,
# 'betas': (0.9, 0.999),
# 'eps': 1e-08,
# 'weight_decay': 0,
# 'amsgrad': False,
# 'maximize': False,
# 'foreach': None,
# 'capturable': False,
# 'differentiable': False,
# 'fused': None,
# 'params': [0, 1]}]}
adam.step()
adam.zero_grad()
# None
adam = optim.Adam(params=mymodel.parameters(), lr=0.001,
betas=(0.9, 0.999), eps=1e-08, weight_decay=0,
amsgrad=False, foreach=None, maximize=False,
capturable=False, differentiable=False, fused=None)
adam
# Adam (
# Parameter Group 0
# amsgrad: False
# betas: (0.9, 0.999)
# capturable: False
# differentiable: False
# eps: 1e-08
# foreach: None
# fused: None
# lr: 0.001
# maximize: False
# weight_decay: 0
# )
Top comments (0)