迁移学习 (Finetune) 中我们经常需要固定pretrained层的学习率,或者把其学习率设置比后面的网络小,这就需要我们对不同的层设置不同的学习率,这里总结一下实现设置每层学习率的方法。

使用net.collect_params(‘re’).setattr(‘lr_mult’,ratio)方法

  net.collect_params()将返回一个ParamterDict类型的变量,其中包含了网络中所有参数。其函数原型如下:

1
2
3
4
def collect_params(self,select=None)

model.collect_params('conv1_weight|conv1_bias|fc_weight|fc_bias')
model.collect_params('.*weight|.*bias')

  其中select参数可以为一个正则表达式,从而collect_params只会选择被该正则表达式匹配上的参数。我们首先把需要单独设置学习率的参数都用正则表达式匹配出来。比如说下面的ResNet50,其所有参数如下(中间省略了一些层):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
print(net.collect_params())

resnet50v1 (
Parameter resnet50v1batchnorm0_gamma (shape=(0,), dtype=<class 'numpy.float32'>)
Parameter resnet50v1batchnorm0_beta (shape=(0,), dtype=<class 'numpy.float32'>)
Parameter resnet50v1batchnorm0_running_mean (shape=(0,), dtype=<class 'numpy.float32'>)
Parameter resnet50v1batchnorm0_running_var (shape=(0,), dtype=<class 'numpy.float32'>)
Parameter resnet50v1conv0_weight (shape=(64, 0, 5, 5), dtype=<class 'numpy.float32'>)
Parameter resnet50v1layer1_batchnorm0_gamma (shape=(0,), dtype=<class 'numpy.float32'>)
Parameter resnet50v1layer1_batchnorm0_beta (shape=(0,), dtype=<class 'numpy.float32'>)
Parameter resnet50v1layer1_batchnorm0_running_mean (shape=(0,), dtype=<class 'numpy.float32'>)
Parameter resnet50v1layer1_batchnorm0_running_var (shape=(0,), dtype=<class 'numpy.float32'>)
Parameter resnet50v1layer1_conv0_weight (shape=(64, 0, 1, 1), dtype=<class 'numpy.float32'>)
Parameter resnet50v1layer1_batchnorm1_gamma (shape=(0,), dtype=<class 'numpy.float32'>)
Parameter resnet50v1layer1_batchnorm1_beta (shape=(0,), dtype=<class 'numpy.float32'>)
Parameter resnet50v1layer1_batchnorm1_running_mean (shape=(0,), dtype=<class 'numpy.float32'>)
Parameter resnet50v1layer1_batchnorm1_running_var (shape=(0,), dtype=<class 'numpy.float32'>)
Parameter resnet50v1layer1_conv1_weight (shape=(64, 0, 3, 3), dtype=<class 'numpy.float32'>)
Parameter resnet50v1layer1_batchnorm2_gamma (shape=(0,), dtype=<class 'numpy.float32'>)
Parameter resnet50v1layer1_batchnorm2_beta (shape=(0,), dtype=<class 'numpy.float32'>)
Parameter resnet50v1layer1_batchnorm2_running_mean (shape=(0,), dtype=<class 'numpy.float32'>)
Parameter resnet50v1layer1_batchnorm2_running_var (shape=(0,), dtype=<class 'numpy.float32'>)
Parameter resnet50v1layer1_conv2_weight (shape=(256, 0, 1, 1), dtype=<class 'numpy.float32'>)
Parameter resnet50v1conv1_weight (shape=(256, 0, 1, 1), dtype=<class 'numpy.float32'>)
Parameter resnet50v1batchnorm1_gamma (shape=(0,), dtype=<class 'numpy.float32'>)
Parameter resnet50v1batchnorm1_beta (shape=(0,), dtype=<class 'numpy.float32'>)
Parameter resnet50v1batchnorm1_running_mean (shape=(0,), dtype=<class 'numpy.float32'>)
Parameter resnet50v1batchnorm1_running_var (shape=(0,), dtype=<class 'numpy.float32'>)
Parameter resnet50v1layer1_batchnorm3_gamma (shape=(0,), dtype=<class 'numpy.float32'>)
Parameter resnet50v1layer1_batchnorm3_beta (shape=(0,), dtype=<class 'numpy.float32'>)
Parameter resnet50v1layer1_batchnorm3_running_mean (shape=(0,), dtype=<class 'numpy.float32'>)
Parameter resnet50v1layer1_batchnorm3_running_var (shape=(0,), dtype=<class 'numpy.float32'>)
Parameter resnet50v1layer1_conv3_weight (shape=(64, 0, 1, 1), dtype=<class 'numpy.float32'>)
Parameter resnet50v1layer1_batchnorm4_gamma (shape=(0,), dtype=<class 'numpy.float32'>)
Parameter resnet50v1layer1_batchnorm4_beta (shape=(0,), dtype=<class 'numpy.float32'>)
Parameter resnet50v1layer1_batchnorm4_running_mean (shape=(0,), dtype=<class 'numpy.float32'>)
Parameter resnet50v1layer1_batchnorm4_running_var (shape=(0,), dtype=<class 'numpy.float32'>)
Parameter resnet50v1layer1_conv4_weight (shape=(64, 0, 3, 3), dtype=<class 'numpy.float32'>)
Parameter resnet50v1layer1_batchnorm5_gamma (shape=(0,), dtype=<class 'numpy.float32'>)
Parameter resnet50v1layer1_batchnorm5_beta (shape=(0,), dtype=<class 'numpy.float32'>)
Parameter resnet50v1layer1_batchnorm5_running_mean (shape=(0,), dtype=<class 'numpy.float32'>)
Parameter resnet50v1layer1_batchnorm5_running_var (shape=(0,), dtype=<class 'numpy.float32'>)
Parameter resnet50v1layer1_conv5_weight (shape=(256, 0, 1, 1), dtype=<class 'numpy.float32'>)
Parameter resnet50v1layer1_batchnorm6_gamma (shape=(0,), dtype=<class 'numpy.float32'>)
.....
.....
.....
.....
.....

Parameter resnet50v1layer4_batchnorm7_running_var (shape=(0,), dtype=<class 'numpy.float32'>)
Parameter resnet50v1layer4_conv7_weight (shape=(512, 0, 3, 3), dtype=<class 'numpy.float32'>)
Parameter resnet50v1layer4_batchnorm8_gamma (shape=(0,), dtype=<class 'numpy.float32'>)
Parameter resnet50v1layer4_batchnorm8_beta (shape=(0,), dtype=<class 'numpy.float32'>)
Parameter resnet50v1layer4_batchnorm8_running_mean (shape=(0,), dtype=<class 'numpy.float32'>)
Parameter resnet50v1layer4_batchnorm8_running_var (shape=(0,), dtype=<class 'numpy.float32'>)
Parameter resnet50v1layer4_conv8_weight (shape=(2048, 0, 1, 1), dtype=<class 'numpy.float32'>)
Parameter resnet50v1dense0_weight (shape=(10, 2048), dtype=float32)
Parameter resnet50v1dense0_bias (shape=(10,), dtype=float32)
)

  假设我们想加大最后全连接层的学习率,那么我们可以通过正则表达式将其参数选出来,net.collect_params('.*dense'),结果如下:

1
2
3
4
5
6
print(net.collect_params('.*dense'))

resnet50v1 (
Parameter resnet50v1dense0_weight (shape=(10, 2048), dtype=float32)
Parameter resnet50v1dense0_bias (shape=(10,), dtype=float32)
)

  选完了我们需要设置的参数后,最后只要设置其lr_mult属性就行,该层的学习率为lr*lr_mult,当lr_mult=0时,那么该层参数不会更新。

1
2
trainter = mx.gluon.Trainer(net.collect_params(),'sgd',{'learning_rate':0.1})
net.collect_params('.*dense').setattr('lr_mult',0.1)

总结

  在net.collect_params()中,我们通过使用正则表达式匹配出需要单独设置的参数。最后再通过setattr()方法设置其学习率因子lr_mult,从而实现设置该层的学习率。所以设计网络时,我们可以通过为每层加上特定的prefix_name从而能让我们方便地用正则表达式匹配出每一层的参数。同理我们也可以通过这种方式来对不同层的参数进行单独初始化。