ResNet

Concept and Principle

  • 加更多的层不一定总是改进精度

    • 新的层可能是使模型收敛范围偏差到一个不符合预期的区域
    • ResNet使各层更容易学会恒等变换,从而更容易使模型收敛范围达到Nested function classes
  • 残差块

    • 基本的ResBlock结构如下,f(x)+x保证了包含原收敛范围

    • 具体使用时,ResBlock的设计细节

  • ResNet架构
    一般来说现在的主流设计架构就是接入一个Stage(7x7Conv-3x3MP),之后再连接具体想要的网络架构,ResNet架构如下也是这种设计思想,具体架构如下

  • Tricks
    • 实际应用中,Res34用的最多,达不到要求可以继续用Res50
    • Res152、Res101一般用来刷榜

Implementation

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
from torch import nn,optim
import torch
import d2l

class Residual(nn.Module):
def __init__(
self,in_channels,out_channels,
use_1x1conv=False,stride=1
):
super().__init__()
self.conv1=nn.Conv2d(in_channels,out_channels,3,stride,1)
self.conv2=nn.Conv2d(out_channels,out_channels,3,padding=1)
self.bn1=nn.BatchNorm2d(out_channels)
self.bn2=nn.BatchNorm2d(out_channels)
# inplace更省内存(显存)
self.relu=nn.ReLU(inplace=True)
self.conv3=None

if(use_1x1conv):
self.conv3=nn.Conv2d(in_channels,out_channels,1,stride)

def forward(self,X):
Y=self.relu(self.bn1(self.conv1(X)))
Y=self.bn2(self.conv2(Y))

if(self.conv3):
X=self.conv3(X)
return self.relu(Y+X)

s1=nn.Sequential(nn.Conv2d(1,64,7,2,3),nn.BatchNorm2d(64),nn.ReLU(),nn.MaxPool2d(3,2,1))
def resnet_block(input_channels, num_channels, num_residuals,
first_block=False):
blk = []
for i in range(num_residuals):
if i == 0 and not first_block:
blk.append(
Residual(input_channels, num_channels, use_1x1conv=True,
stride=2))
else:
blk.append(Residual(num_channels, num_channels))
return blk

s2=nn.Sequential(*resnet_block(64,64,2,True))
s3=nn.Sequential(*resnet_block(64,128,2))
s4=nn.Sequential(*resnet_block(128,256,2))
s5=nn.Sequential(*resnet_block(256,512,2))

device=torch.device("cuda:0")
res_net=nn.Sequential(
s1,s2,s3,s4,s5,
nn.AdaptiveAvgPool2d((1,1)),
nn.Flatten(),
nn.Linear(512,10)
)

x=torch.rand((20,1,224,224))
print(res_net(x).shape)

opt=optim.Adam(res_net.parameters())
train_iter,val_iter=d2l.load_data_fashion_mnist(128,(224,224))

# d2l.train(
# 10,nn.CrossEntropyLoss(),opt,
# res_net,train_iter,save_name="res_net"
# )

d2l.evaluate(res_net,val_iter,nn.CrossEntropyLoss(),"./params/res_net_2",device=torch.device("cuda:0"))