原创

基于Pytorch的卷积神经网络


1.原始数据来源

本次案例的数据是来自Kaggle官网的猫狗数据集,一共有25000猫狗张图片。由于pytorch框架更容易读取下列格式的路径文件,所以我们需要先利用python中的常用的方法先对其进行分配。

// 从官网获取的数据路径
dogsandcats/
    train/
        dog.183.jpg
        cat.2.jpg
        cat.17.jpg
        dog.186.jpg
        cat.27.jpg
        dog.193.jpg


// 整理后的路径
dogsandcats/
    train/
        dog/
            dog.183.jpg
            dog.186.jpg
            dog.193.jpg
        cat/
            cat.17.jpg
            cat.2.jpg
            cat.27.jpg
    valid/
         dog/
             dog.173.jpg
             dog.156.jpg
             dog.123.jpg
         cat/
             cat.172.jpg
             cat.20.jpg
             cat.21.jpg

【ps】链接:https://pan.baidu.com/s/1l1AnBgkAAEhh0vI5_loWKw 提取码:2xq4

2.将数据加载到PyTorch张量器

2.1 常见的加载方法

通常来说,当你处理图像,文本,语音或者视频数据时,你可以使用标准 python 包将数据加载成 numpy 数组格式,然后将这个数组转换成 *torch.Tensor张量

  • 图像:可以用 Pillow,OpenCV
  • 语音:可以用 scipy,librosa
  • 文本:可以直接用 Python 或 Cython 基础数据加载模块,或者用 NLTK 和 SpaCy

特别是对于视觉,我们已经创建了一个叫做 totchvision 的包,该包含有支持加载类似Imagenet,CIFAR10,MNIST 等公共数据集的数据加载模块 torchvision.datasets 和支持加载图像数据数据转换模块 torch.utils.data.DataLoader。这提供了极大的便利,并且避免了编写“样板代码”。

2.2 猫狗图片加载到PyTorch张量器

torchvision.transforms中的:Compose(transforms)

  • transforms: 由transforms构成的列表

torchvision.datasets中的:ImageFolder( root="root folder path",[transform, target_transform])

  • self.classes - 用一个list保存类名
  • self.class_to_idx - 类名对应的索引
  • self.imgs - 保存(img-path, class) tuple的list
from torchvision.datasets import ImageFolder
from torchvision import transforms

# 数据加载器
simple_transform = transforms.Compose([transforms.Resize((224,224))
                                       ,transforms.ToTensor()            #将图片转换为Tensor,归一化至[0,1]
                                       ,transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
                                      ])
# 训练集:1500个猫、1500个狗
train = ImageFolder('dogsandcats/train/',simple_transform)
# 验证集:500个猫、500个狗
valid = ImageFolder('dogsandcats/valid/',simple_transform)
2.3 可视化张量 ---> 显示图片
import numpy as np
import matplotlib.pyplot as plt

# 自定义函数:可视化张量 ---> 显示图片
def imshow(inp,cmap=None):
    inp = inp.numpy().transpose((1, 2, 0))
    mean = np.array([0.485, 0.456, 0.406])
    std = np.array([0.229, 0.224, 0.225])
    inp = std * inp + mean
    inp = np.clip(inp, 0, 1)
    plt.imshow(inp,cmap)
	
# 调用
imshow(valid[999][0])

avatar

【ps】我们的验证集一共存在1000个,因为索引-1,所以最大索引是999

# 查看验证集的多张照片
n_rows = 4
n_cols = 10
plt.figure(figsize=(n_cols * 2, n_rows * 2))
for row in range(n_rows):
    for col in range(n_cols):
        index = np.random.randint(1,1000)
        plt.subplot(n_rows, n_cols, n_cols * row + col + 1)    # 设置图片的个数
        imshow(valid[index][0])
        plt.axis('off')
        if valid[index][1] == 0:
            plt.title('cat', fontsize=12)
        else:
            plt.title('dog', fontsize=12)
plt.subplots_adjust(wspace=0.2, hspace=0.5)
plt.show()

avatar

2.4 查看其他属性
print('-----------------------------------------------------------------------------')
print(type(train))
print('-----------------------------------------------------------------------------')
print(train)
print('-----------------------------------------------------------------------------')
print(type(train.class_to_idx))
print('-----------------------------------------------------------------------------')
print(type(train.classes))
print('-----------------------------------------------------------------------------')
print(type(train[0][0]))
print('-----------------------------------------------------------------------------')
print(type(train[0][1]))

【输出】:

-----------------------------------------------------------------------------
<class 'torchvision.datasets.folder.ImageFolder'>
-----------------------------------------------------------------------------
Dataset ImageFolder
    Number of datapoints: 3000
    Root location: dogsandcats/train/
    StandardTransform
Transform: Compose(
               Resize(size=(224, 224), interpolation=PIL.Image.BILINEAR)
               ToTensor()
               Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
           )
-----------------------------------------------------------------------------
<class 'dict'>
-----------------------------------------------------------------------------
<class 'list'>
-----------------------------------------------------------------------------
<class 'torch.Tensor'>
-----------------------------------------------------------------------------
<class 'int'>
print(train.class_to_idx)
print(train.classes) 

print('训练集第1张照片的数据内容:\n', train[0][0])     #第二维度为0,表示图片数据
print('训练集第1张照片的标签(数字代表):', train[0][1]) #第二维度为1 ,表示label
print('训练集第1张照片的大小:', train[0][0].size())    #size():查看图片大小

【输出】:

{'cat': 0, 'dog': 1}
['cat', 'dog']

训练集第1张照片的数据内容:
 tensor([[[-1.4500, -1.4329, -1.4158,  ...,  1.4098,  1.3755,  1.3070],
         [-1.4329, -1.4329, -1.4158,  ...,  1.3413,  1.3070,  1.2728],
         [-1.4500, -1.4500, -1.4672,  ...,  1.2557,  1.2899,  1.3242],
         ...,
         [-1.6213, -1.6555, -1.7240,  ..., -1.3815, -1.4329, -1.2788],
         [-1.5870, -1.6384, -1.7240,  ..., -1.4500, -1.3987, -1.2274],
         [-1.5870, -1.6555, -1.7412,  ..., -1.2103, -1.3130, -1.3644]],

        [[-1.2654, -1.2654, -1.2479,  ...,  1.5357,  1.4832,  1.4307],
         [-1.2479, -1.2654, -1.2479,  ...,  1.4307,  1.4132,  1.3782],
         [-1.2654, -1.2829, -1.3004,  ...,  1.3431,  1.3782,  1.4132],
         ...,
         [-1.5630, -1.5980, -1.6681,  ..., -1.5280, -1.5630, -1.3704],
         [-1.5280, -1.5805, -1.6681,  ..., -1.5805, -1.5280, -1.3179],
         [-1.5280, -1.5980, -1.6856,  ..., -1.3354, -1.4405, -1.4580]],

        [[-1.1073, -1.0724, -1.0201,  ...,  1.2631,  1.1062,  1.0017],
         [-1.0898, -1.0724, -1.0201,  ...,  1.1759,  1.0714,  0.9668],
         [-1.1073, -1.0898, -1.0550,  ...,  1.0888,  1.0539,  1.0539],
         ...,
         [-1.3164, -1.3513, -1.4210,  ..., -1.4036, -1.4384, -1.2467],
         [-1.2816, -1.3339, -1.4210,  ..., -1.4733, -1.4210, -1.2119],
         [-1.2816, -1.3513, -1.4384,  ..., -1.2467, -1.3513, -1.3687]]])
训练集第1张照片的标签(数字代表): 0
训练集第1张照片的大小: torch.Size([3, 224, 224])

3.批处理数据加载

方法torch.utils.data.DataLoader(dataset, batch_size=1, shuffle=False, num_workers=0)

  • dataset (Dataset) – 加载数据的数据集。
  • batch_size (int, optional) – 每个batch加载多少个样本(默认: 1)。
  • shuffle (bool, optional) – 设置为True时会在每个epoch重新打乱数据(默认: False).
  • num_workers (int, optional) – 用多少个子进程加载数据。0表示数据将在主进程中加载(默认: 0)
import torch

# 批处理
train_data_loader = torch.utils.data.DataLoader(train, batch_size=32, num_workers=3, shuffle=True)
valid_data_loader = torch.utils.data.DataLoader(valid, batch_size=32, num_workers=3, shuffle=True)

4.神经网络的初始化

4.1 神经网络常见层的方法

【注意】class torch.nn.Module: 所有网络的基类,你的模型也应该继承这个类。

1.torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True)

  • in_channels(int) – 输入信号的通道
  • out_channels(int) – 卷积产生的通道
  • kerner_size(int or tuple) - 卷积核的尺寸
  • stride(int or tuple, optional) - 卷积步长
  • padding(int or tuple, optional) - 输入的每一条边补充0的层数
  • dilation(int or tuple, optional) – 卷积核元素之间的间距
  • groups(int, optional) – 从输入通道到输出通道的阻塞连接数
  • bias(bool, optional) - 如果bias=True,添加偏置

查看属性

  • weight(tensor) - 卷积的权重,大小是(out_channels, in_channels,kernel_size)
  • bias(tensor) - 卷积的偏置系数,大小是(out_channel)

2.torch.nn.Dropout2d(p=0.5, inplace=False) 随机将输入张量中整个通道设置为0,对于每次前向调用,被置0的通道都是随机的。

  • p(float, optional) - 将元素置0的概率。 默认值:0.5
  • in-place(bool, optional) - 若设置为True,会在原地执行操作。

【说明】:

  1. 输出与输入形状相同
  2. 在论文Efficient Object Localization Using Convolutional Networks,如果特征图中相邻像素是强相关的(在前几层卷积层很常见),那么 torch.nn.Dropout不会归一化激活,而只会降低学习率。在这种情形,nn.Dropout2d()可以提高特征图之间的独立程度,所以应该使用它。

3. torch.nn.Linear(in_features, out_features, bias=True) 输入数据做线性变换

  • in_features - 每个输入样本的大小
  • out_features - 每个输出样本的大小
  • bias - 若设置为False,这层不会学习偏置。默认值:True

查看属性

  • weight -形状为(out_features x in_features)的模块中可学习的权值
  • bias -形状为(out_features)的模块中可学习的偏置
import torch
import torch.nn as nn
from torch import autograd

// 举例子
m = nn.Linear(20, 30)
input = autograd.Variable(torch.randn(128, 20))
output = m(input)
print(output.size())

print(m.weight)
print(m.bias)

【输出】:

torch.Size([128, 30])
------略------
------略------

4.torch.nn.functional.max_pool2d(input,kernel_size,stride=None,padding=0,dilation=1,ceil_mode=False,return_indices=False)

  • input – 输入的张量
  • kernel_size – 池化区域的大小
  • stride – 池化操作的步长
  • padding – 在输入上隐式的零填充,默认: 0
  • ceil_mode – 当为True时,公式中将使用ceil而不是floor来计算输出形状。默认值:False
  • count_include_pad – 当为True时,将包括平均计算中的零填充。默认值:True

【可能存在的疑问】为什么不使用torch.nn中的池化层,而是调用torch.nn.functional中的池化层?

【个人想法】:这二者在功能上并没有本质的区别, 其实nn.dropout是调用的F.dropout的函数实现的, 而他们在使用的时候是有区别的。 nn.Dropout派生自nn.Module,通过这样,我们可以把nn.Dropout定义为模型中的一层。所以nn.dropout在模型类的__init__()函数中被定义为一层,而F.dropout在forward()函数中直接使用。其实我们在构建网络的时候,可以根据个人喜好来使用nn.dropout或者F.dropout,有一些观点认为nn.dropout更好,理由如下:

  • Dropout被设计为只在训练中使用,所以当你对模型进行预测或评估时,你需要关闭Dropout。nn.dropout可以方便地处理这个问题,在模型进入eval时立即关闭Dropout,而F.dropout并care你是什么模式
  • 分配给模型的所有模块都在模型中注册。所以模型类跟踪它们,这就是为什么可以通过调用eval()关闭dropout模块。当使用F.dropout时,您的模型并不知道它,所以模型的summary中也不会出现dropout模块

5.非线性激活函数torch.nn.functional.relu(input, inplace=False)

6.对于view(x.size(0), -1)的理解 为了将前面多维度的tensor展平成一维,x.size(0)指batchsize的值 ,x = x.view(x.size(0), -1)简化x = x.view(batchsize, -1) ,view()函数的功能根reshape类似,用来转换size大小。 而-1指在不告诉函数有多少列的情况下,根据原tensor数据和batchsize自动分配列数。

7.torch.nn.functional.log_softmax(input, dim)

  • dim:指明维度,dim=0表示按列计算;dim=1表示按行计算。默认dim的方法已经弃用了,最好声明dim,否则会警告
import torch
from torch import nn
from torch import autograd

m = nn.Softmax(dim=1)
input = autograd.Variable(torch.randn(2, 3))
print(input)
print(m(input))

【输出】:

tensor([[-0.2152,  0.1656,  0.0704],
        [-0.9096,  0.8762, -0.6123]])

tensor([[0.2636, 0.3857, 0.3507],
        [0.1203, 0.7177, 0.1620]])
4.2 本次案例的卷积神经网络构建
import torch.nn as nn

class Net(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(3, 10, kernel_size=5)
        self.conv2 = nn.Conv2d(10, 20, kernel_size=5)
        self.conv2_drop = nn.Dropout2d()
        self.fc1 = nn.Linear(56180, 500)
        self.fc2 = nn.Linear(500,50)
        self.fc3 = nn.Linear(50, 2)

    def forward(self, x):
        x = F.relu(F.max_pool2d(self.conv1(x), 2))
        x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2))
        x = x.view(x.size(0), -1)
        x = F.relu(self.fc1(x))
        x = F.dropout(x, training=self.training)
        x = F.relu(self.fc2(x))
        x = F.dropout(x,training=self.training)
        x = self.fc3(x)
        return F.log_softmax(x,dim=1)
		
# 创建神经网络对象
model = Net()

【ps】:关于层的输入参数大小的设置,可以通过计算或者通过打印每层网络的输出大小进而一层层设置。

avatar

5.模型训练

5.1 训练时常见的方法

1.随机梯度下降torch.optim.SGD(params, lr=,momentum=0,dampening=0,weight_decay=0, nesterov=False)

  • params (iterable) – 待优化参数的iterable或者是定义了参数组的dict
  • lr (float) – 学习率
  • momentum (float, 可选) – 动量因子(默认:0)
  • weight_decay (float, 可选) – 权重衰减(L2惩罚)(默认:0)
  • dampening (float, 可选) – 动量的抑制因子(默认:0)
  • nesterov (bool, 可选) – 使用Nesterov动量(默认:False)

2.torch.nn中的model.train()与model.eval()

  • 在使用pytorch构建神经网络的时候,训练过程中会在程序上方添加一句model.train(),作用是启用batch normalization和dropout。
  • 测试过程中会使用model.eval(),这时神经网络会沿用batch normalization的值,并不使用dropout。

3.对于volatile=True的理解

【说明】:它与requires_grad有区别,这里先不讲解。

5.2 本次案例的训练
from torch import optim

# 优化器:随机梯度下降
optimizer = optim.SGD(model.parameters(),lr=0.01,momentum=0.5)

from torch.autograd import Variable
import torch.nn.functional as F

# 创建训练方法:
#     参数epoch:循环次数
#     参数model:神经网络模型对象
#     参数data_loader:批处理的数据集
#     参数phase:模式选择:训练/验证
#     参数volatile:循环次数
#     返回值:损失值loss,准确率accuracy

def fit(epoch,model,data_loader,phase='training',volatile=False):
    
    # 选择模式:如果是训练,则启用BatchNormalization和Dropout,将BatchNormalization和Dropout置为True
    #           如果是验证,则不启用BatchNormalization和Dropout,将BatchNormalization和Dropout置为False
    #【说明】:eval()时,框架会自动把BN和DropOut固定住,不会取平均,而是用训练好的值,
    #          不然的话,一旦test的batch_size过小,很容易就会被BN层导致生成图片颜色失真极大!
    if phase == 'training':
        model.train()
    if phase == 'validation':
        model.eval()
        volatile=True
        
    # 初始化loss与correct存放的列表
    running_loss = 0.0
    running_correct = 0
    
    for batch_idx , (data,target) in enumerate(data_loader):
        
        # 转换数据
        data , target = Variable(data,volatile),Variable(target)
    
        # 如果是训练模式:梯度参数清0
        if phase == 'training':
            optimizer.zero_grad()
        
        # 模型输出结果output
        output = model(data)
        # 根据模型输出结果output与标签计算损失loss
        loss = F.nll_loss(output,target)
        
        running_loss += F.nll_loss(output,target,size_average=False).item()
        preds = output.data.max(dim=1,keepdim=True)[1]
        running_correct += preds.eq(target.data.view_as(preds)).cpu().sum()
        
        # 如果是训练模式:进行反向传播、优化等
        if phase == 'training':
            loss.backward()
            optimizer.step()
    
    # 计算损失值
    loss = running_loss/len(data_loader.dataset)
    # 计算准确率
    accuracy = 100. * running_correct/len(data_loader.dataset)
    
    # 打印
    print(f'{phase} loss: {loss:{5}.{2}} and {phase} accuracy: {running_correct}/{len(data_loader.dataset)}{accuracy:{10}.{4}}')
    return loss,accuracy
	
	
# ---------------------------------开始训练模型------------------------------------
train_losses , train_accuracy = [],[]
val_losses , val_accuracy = [],[]

# 进行20次循环训练
for epoch in range(1,20):
    # 训练
    epoch_loss, epoch_accuracy = fit(epoch,model,train_data_loader,phase='training')
    # 验证
    val_epoch_loss , val_epoch_accuracy = fit(epoch,model,valid_data_loader,phase='validation')
    # 添加训练集和验证集:损失和准确率
    train_losses.append(epoch_loss)
    train_accuracy.append(epoch_accuracy)
    val_losses.append(val_epoch_loss)
    val_accuracy.append(val_epoch_accuracy)

【输出】:

training loss is  0.69 and training accuracy is 1520/3000     50.67
validation loss is  0.69 and validation accuracy is 500/1000      50.0
training loss is  0.69 and training accuracy is 1607/3000     53.57
validation loss is  0.68 and validation accuracy is 573/1000      57.3
training loss is  0.68 and training accuracy is 1684/3000     56.13
validation loss is  0.67 and validation accuracy is 554/1000      55.4
training loss is  0.67 and training accuracy is 1756/3000     58.53
validation loss is  0.66 and validation accuracy is 637/1000      63.7
training loss is  0.66 and training accuracy is 1819/3000     60.63
validation loss is  0.65 and validation accuracy is 639/1000      63.9
training loss is  0.65 and training accuracy is 1868/3000     62.27
validation loss is  0.65 and validation accuracy is 641/1000      64.1
training loss is  0.63 and training accuracy is 1916/3000     63.87
validation loss is  0.63 and validation accuracy is 671/1000      67.1
training loss is  0.62 and training accuracy is 1944/3000      64.8
validation loss is  0.63 and validation accuracy is 669/1000      66.9
training loss is  0.61 and training accuracy is 2038/3000     67.93
validation loss is  0.61 and validation accuracy is 684/1000      68.4
training loss is   0.6 and training accuracy is 2031/3000      67.7
validation loss is  0.61 and validation accuracy is 685/1000      68.5
training loss is  0.59 and training accuracy is 2058/3000      68.6
validation loss is   0.6 and validation accuracy is 697/1000      69.7
training loss is  0.57 and training accuracy is 2125/3000     70.83
validation loss is  0.59 and validation accuracy is 696/1000      69.6
training loss is  0.54 and training accuracy is 2178/3000      72.6
validation loss is  0.58 and validation accuracy is 703/1000      70.3
training loss is  0.52 and training accuracy is 2238/3000      74.6
validation loss is  0.58 and validation accuracy is 708/1000      70.8
training loss is  0.47 and training accuracy is 2362/3000     78.73
validation loss is   0.6 and validation accuracy is 688/1000      68.8
training loss is  0.45 and training accuracy is 2396/3000     79.87
validation loss is  0.59 and validation accuracy is 682/1000      68.2
training loss is   0.4 and training accuracy is 2450/3000     81.67
validation loss is   0.6 and validation accuracy is 702/1000      70.2
training loss is  0.36 and training accuracy is 2524/3000     84.13
validation loss is  0.66 and validation accuracy is 697/1000      69.7
training loss is  0.34 and training accuracy is 2571/3000      85.7
validation loss is  0.63 and validation accuracy is 703/1000      70.3

6.结果可视化

# 可视化:损失值loss
plt.plot(range(1,len(train_losses)+1),train_losses,'bo',label = 'training loss')
plt.plot(range(1,len(val_losses)+1),val_losses,'r',label = 'validation loss')
plt.legend()

avatar

# 可视化:准确率accuracy
plt.plot(range(1,len(train_accuracy)+1),train_accuracy,'bo',label = 'train accuracy')
plt.plot(range(1,len(val_accuracy)+1),val_accuracy,'r',label = 'val accuracy')
plt.legend()

avatar

【ps】:通过图像可以发现,对于每次迭代,训练集的损失都在减少,而验证集的损失却变得更糟糕。在训练过程中,准确率也增加,但在70%左右几乎饱和。显而易见,这是一个没有泛化的模型,为了获得效果更好的模型,我们可以近一步调整网路的参数和层数设计等等,然而我们可以通过另一种技术“迁移学习”来更加简便的获取精确的模型!

7.结语

本次案例练习的结果并不是非常的理想,我并没有对其进行更深的改善。这是因为这次仅仅是基于Pytorch框架对构建卷积神经网络整体流程的大致了解,以及常见的API参数以及调用方法的使用,关于Pytorch的基础知识将在后续慢慢展开!

Python
深度学习
神经网络
  • 作者:李延松(联系作者)
  • 发表时间:2020-10-05 21:25
  • 版本声明:自由转载-非商用-非衍生-保持署名(创意共享3.0许可证)
  • 公众号转载:请在文末添加作者公众号二维码

评论

留言