NiN

Concept and Principle

全连接层的问题：
- 全连接层参数比卷积层的参数多很多，导致很多的内存（显存）及计算带宽占用
- 全连接层容易带来过拟合
NiN思想：完全不要全连接层
NiN块：
- 一个卷积层后跟两个起到全连接层的作用的卷积层
- 起到全连接层的作用的卷积层为1x1步幅为1无填充的卷积层
NiN架构
- 无全连接层
- 交替使用NiN块和步幅为2的最大池化层
- 最后使用全局平均池化层得到输出（通道数是类别数）

Implementation

import torch
from torch import nn,optim

class NiNBlock(nn.Module):
    def __init__(
        self,in_channels,out_channels,
        kernel_size,stride,padding
    ):
        super().__init__()
        self.conv=nn.Conv2d(
            in_channels,out_channels,kernel_size,
            stride=stride,padding=padding
        )
        self.f1=nn.Conv2d(out_channels,out_channels,1)
        self.f2=nn.Conv2d(out_channels,out_channels,1)
        self.relu=nn.ReLU()

    def forward(self,x):
        x=self.relu(self.conv(x))
        x=self.relu(self.f1(x))
        x=self.relu(self.f2(x))
        return x

nin_net=nn.Sequential(
    NiNBlock(1,96,11,4,0),
    nn.MaxPool2d(3,stride=2),
    NiNBlock(96,256,5,1,2),
    nn.MaxPool2d(3,stride=2),
    NiNBlock(256,384,3,1,1),
    nn.MaxPool2d(3,stride=2),nn.Dropout(),
    NiNBlock(384,10,3,1,1),
    # 目标输出size为1x1，也就是全局池化
    nn.AdaptiveAvgPool2d(1),
    nn.Flatten()
)

loss_f=nn.CrossEntropyLoss()
opt=optim.Adam(nin_net.parameters())

import d2l

train_iter,test_iter=d2l.load_data_fashion_mnist(128,resize=224)

# d2l.train(
#     10,loss_f,opt,nin_net,train_iter,
#     device=torch.device("cuda:0"),
#     save_name="NIN"
# )
d2l.evaluate(
    nin_net,test_iter,loss_f,
    "D:/code/machine_learning/limu_d2l/params/NIN_5",
    device=torch.device("cuda:0")
)