DEV Community

Super Kai (Kazuya Ito)
Super Kai (Kazuya Ito)

Posted on • Updated on

Adam in PyTorch

Buy Me a Coffee

*Memos:

Adam() can do gradient descent by Momentum and RMSProp as shown below:

*Memos:

  • The 1st argument for initialization is params(Required-Type:generator).
  • The 2nd argument for initialization is lr(Optional-Default:0.01-Type:int or float). *It must be 0 <= x.
  • The 3rd argument for initialization is betas(Optional-Default:(0.9, 0.999)-Type:tuple or list of int or float). *It must be 0 <= x < 1.
  • The 4th argument for initialization is eps(Optional-Default:1e-08-Type:int or float). *It must be 0 <= x.
  • The 5th argument for initialization is weight_decay(Optional-Default:0-Type:int or float). *It must be 0 <= x.
  • The 6th argument for initialization is amsgrad(Optional-Default:False-Type:bool). *If it's True, AMSGrad is used.
  • There is foreach argument for initialization(Optional-Default:None-Type:bool). *foreach= must be used.
  • There is maximize argument for initialization(Optional-Default:False-Type:bool). *maximize= must be used.
  • There is capturable argument for initialization(Optional-Default:False-Type:bool). *capturable= must be used.
  • There is differentiable argument for initialization(Optional-Default:False-Type:bool). *differentiable= must be used.
  • There is fused argument for initialization(Optional-Default:None-Type:bool): *Memos:
    • If it's True, all the parameters must be the float tensors of cuda, xpu or privateuseone.
    • fused= must be used.
  • Both foreach and fused cannot be True.
  • Both differentiable and fused cannot be True.
  • step() can update parameters.
  • zero_grad() can reset gradients.
from torch import nn
from torch import optim

class MyModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.linear_layer = nn.Linear(in_features=4, out_features=5)

    def forward(self, x):
        return self.linear_layer(x)

mymodel = MyModel()

adam = optim.Adam(params=mymodel.parameters())
adam
# Adam (
# Parameter Group 0
#     amsgrad: False
#     betas: (0.9, 0.999)
#     capturable: False
#     differentiable: False
#     eps: 1e-08
#     foreach: None
#     fused: None
#     lr: 0.001
#     maximize: False
#     weight_decay: 0
# )

adam.state_dict()
# {'state': {},
#  'param_groups': [{'lr': 0.001,
#    'betas': (0.9, 0.999),
#    'eps': 1e-08,
#    'weight_decay': 0,
#    'amsgrad': False,
#    'maximize': False,
#    'foreach': None,
#    'capturable': False,
#    'differentiable': False,
#    'fused': None,
#    'params': [0, 1]}]}

adam.step()
adam.zero_grad()
# None

adam = optim.Adam(params=mymodel.parameters(), lr=0.001,
                  betas=(0.9, 0.999), eps=1e-08, weight_decay=0,
                  amsgrad=False, foreach=None, maximize=False,
                  capturable=False, differentiable=False, fused=None)
adam
# Adam (
# Parameter Group 0
#     amsgrad: False
#     betas: (0.9, 0.999)
#     capturable: False
#     differentiable: False
#     eps: 1e-08
#     foreach: None
#     fused: None
#     lr: 0.001
#     maximize: False
#     weight_decay: 0
# )
Enter fullscreen mode Exit fullscreen mode

Top comments (0)