学习率调度器(Scheduler)负责根据训练回合(epoch)来调整优化器的学习率(learning rate),该策略可以让模型更高效地收敛。
可以看出,学习率调度器Scheduler和参数优化器optimizer的使用紧密结合。
一、优化器optimizer的定义 我们以最常见的Adam优化器为例进行介绍(所有optimizers都继承自torch.optim.Optimizer类):
1 2 3 4 5 6 import torchnet = model() initial_lr = 0.1 optimizer = torch.optim.Adam(params=net.parameters(), lr=initial_lr, )
二、学习率调度器scheduler的定义 torch.optim.lr_scheduler 模块提供了一些根据epoch训练次数来调整学习率(learning rate)的方法。
下面列举常见的学习率调整策略有几种:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 import torch.optim.lr_scheduler as lr_schedulerscheduler = lr_scheduler.LambdaLR(optimizer, lr_lambda=lambda epoch:0.95 **epoch, ) scheduler = lr_scheduler.StepLR(optimizer, step_size=30 , gamma=0.1 , ) scheduler = lr_scheduler.ReduceLROnPlateau(optimizer, mode='min' , factor=0.1 , patience= 10 , min_lr= self.min_lr )
当然我们也可以继承lr_scheduler或其子类来自定义学习率的变化:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 class LinearDecay (lr_scheduler._LRScheduler): """This class implements LinearDecay""" def __init__ (self, optimizer, num_epochs, start_epoch=0 , min_lr=0 , last_epoch=-1 ): """implements LinearDecay Parameters: ---------- """ super ().__init__(optimizer, last_epoch) self.num_epochs = num_epochs self.start_epoch = start_epoch self.min_lr = min_lr def get_lr (self ): if self.last_epoch < self.start_epoch: return self.base_lrs lr = [base_lr - ((base_lr - self.min_lr) / self.num_epochs) * (self.last_epoch - self.start_epoch) for base_lr in self.base_lrs] return lr
三、学习率预热机制-Warmup 1、什么是Warmup? Warmup是在ResNet论文中提到的一种学习率预热的方法,即先用最初的小学习率训练,然后每个step增大一点点,直到达到最初设置的比较大的学习率时(注:此时预热学习率完成),采用最初设置的学习率进行训练(注:预热学习率完成后的训练过程,学习率是衰减的),有助于使模型收敛速度变快,效果更佳。
2、为什么使用Warmup? 由于刚开始训练时,模型的权重(weights)是随机初始化的,此时若选择一个较大的学习率,可能带来模型的不稳定(振荡),选择Warmup预热学习率的方式,可以使得开始训练的几个epoches或者一些steps内学习率较小,在预热的小学习率下,模型可以慢慢趋于稳定,等模型相对稳定后再选择预先设置的学习率进行训练,使得模型收敛速度变得更快,模型效果更佳。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 class WarmRestart (lr_scheduler.CosineAnnealingLR): """This class implements Stochastic Gradient Descent with Warm Restarts(SGDR): https://arxiv.org/abs/1608.03983. Set the learning rate of each parameter group using a cosine annealing schedule, When last_epoch=-1, sets initial lr as lr. This can't support scheduler.step(epoch). please keep epoch=None. """ def __init__ (self, optimizer, T_max=30 , T_mult=1 , eta_min=0 , last_epoch=-1 ): """implements SGDR Parameters: ---------- T_max : int Maximum number of epochs. T_mult : int Multiplicative factor of T_max. eta_min : int Minimum learning rate. Default: 0. last_epoch : int The index of last epoch. Default: -1. """ self.T_mult = T_mult super ().__init__(optimizer, T_max, eta_min, last_epoch) def get_lr (self ): import math if self.last_epoch == self.T_max: self.last_epoch = 0 self.T_max *= self.T_mult return [self.eta_min + (base_lr - self.eta_min) * (1 + math.cos(math.pi * self.last_epoch / self.T_max)) / 2 for base_lr in self.base_lrs]
四、封装的学习率调整器 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 import torch.optim.lr_scheduler as lr_schedulerclass Scheduler (): def __init__ (self,name ): self.name = name self.min_lr = 0.0000001 def get_scheduler (self, optimizer ): if self.name == 'lambdaLR' : scheduler = lr_scheduler.LambdaLR(optimizer, lr_lambda=lambda epoch:0.95 **epoch, ) elif self.name == 'stepLR' : scheduler = lr_scheduler.StepLR(optimizer, step_size=30 , gamma=0.1 , ) elif self.name == 'plateau' : scheduler = lr_scheduler.ReduceLROnPlateau(optimizer, mode='min' , factor=0.1 , patience= 10 , min_lr= self.min_lr) elif self.name == 'sgdr' : scheduler = WarmRestart(optimizer) elif self.name == 'linear' : scheduler = LinearDecay(optimizer, min_lr=self.min_lr, num_epochs=10 , start_epoch=5 ) else : raise ValueError("Scheduler [%s] 无法初始化." % self.config['scheduler' ]['name' ]) return scheduler def get_test_optimizer (): import torch class model (torch.nn.Module): def __init__ (self ): super ().__init__() self.conv1 = torch.nn.Conv2d(in_channels=3 , out_channels=3 , kernel_size=3 ) def forward (self, x ): pass net = model() initial_lr = 0.1 optimizer = torch.optim.Adam(params=net.parameters(), lr=initial_lr, ) return optimizer if __name__=="__main__" : optimizer = get_test_optimizer() scheduler = Scheduler('linear' ).get_scheduler(optimizer) print ("初始化的学习率:" , optimizer.defaults['lr' ]) for epoch in range (1 , 11 ): optimizer.zero_grad() optimizer.step() print ("第%d个epoch的学习率:%f" % (epoch, optimizer.param_groups[0 ]['lr' ])) scheduler.step()
测试输出结果:
初始化的学习率: 0.1
第1个epoch的学习率:0.100000
第2个epoch的学习率:0.100000
第3个epoch的学习率:0.100000
第4个epoch的学习率:0.100000
第5个epoch的学习率:0.100000
第6个epoch的学习率:0.100000
第7个epoch的学习率:0.090000
第8个epoch的学习率:0.080000
第9个epoch的学习率:0.070000
第10个epoch的学习率:0.060000
进程已结束,退出代码 0