本次案例的数据是来自Kaggle官网的猫狗数据集,一共有25000猫狗张图片。由于pytorch框架更容易读取下列格式的路径文件,所以我们需要先利用python中的常用的方法先对其进行分配。
// 从官网获取的数据路径
dogsandcats/
train/
dog.183.jpg
cat.2.jpg
cat.17.jpg
dog.186.jpg
cat.27.jpg
dog.193.jpg
// 整理后的路径
dogsandcats/
train/
dog/
dog.183.jpg
dog.186.jpg
dog.193.jpg
cat/
cat.17.jpg
cat.2.jpg
cat.27.jpg
valid/
dog/
dog.173.jpg
dog.156.jpg
dog.123.jpg
cat/
cat.172.jpg
cat.20.jpg
cat.21.jpg
【ps】链接:https://pan.baidu.com/s/1l1AnBgkAAEhh0vI5_loWKw 提取码:2xq4
通常来说,当你处理图像,文本,语音或者视频数据时,你可以使用标准 python 包将数据加载成 numpy 数组格式,然后将这个数组转换成 *torch.Tensor张量
特别是对于视觉,我们已经创建了一个叫做 totchvision 的包,该包含有支持加载类似Imagenet,CIFAR10,MNIST 等公共数据集的数据加载模块 torchvision.datasets 和支持加载图像数据数据转换模块 torch.utils.data.DataLoader。这提供了极大的便利,并且避免了编写“样板代码”。
torchvision.transforms中的:Compose(transforms)
torchvision.datasets中的:ImageFolder( root="root folder path",[transform, target_transform])
from torchvision.datasets import ImageFolder
from torchvision import transforms
# 数据加载器
simple_transform = transforms.Compose([transforms.Resize((224,224))
,transforms.ToTensor() #将图片转换为Tensor,归一化至[0,1]
,transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])
# 训练集:1500个猫、1500个狗
train = ImageFolder('dogsandcats/train/',simple_transform)
# 验证集:500个猫、500个狗
valid = ImageFolder('dogsandcats/valid/',simple_transform)
import numpy as np
import matplotlib.pyplot as plt
# 自定义函数:可视化张量 ---> 显示图片
def imshow(inp,cmap=None):
inp = inp.numpy().transpose((1, 2, 0))
mean = np.array([0.485, 0.456, 0.406])
std = np.array([0.229, 0.224, 0.225])
inp = std * inp + mean
inp = np.clip(inp, 0, 1)
plt.imshow(inp,cmap)
# 调用
imshow(valid[999][0])
【ps】我们的验证集一共存在1000个,因为索引-1,所以最大索引是999
# 查看验证集的多张照片
n_rows = 4
n_cols = 10
plt.figure(figsize=(n_cols * 2, n_rows * 2))
for row in range(n_rows):
for col in range(n_cols):
index = np.random.randint(1,1000)
plt.subplot(n_rows, n_cols, n_cols * row + col + 1) # 设置图片的个数
imshow(valid[index][0])
plt.axis('off')
if valid[index][1] == 0:
plt.title('cat', fontsize=12)
else:
plt.title('dog', fontsize=12)
plt.subplots_adjust(wspace=0.2, hspace=0.5)
plt.show()
print('-----------------------------------------------------------------------------')
print(type(train))
print('-----------------------------------------------------------------------------')
print(train)
print('-----------------------------------------------------------------------------')
print(type(train.class_to_idx))
print('-----------------------------------------------------------------------------')
print(type(train.classes))
print('-----------------------------------------------------------------------------')
print(type(train[0][0]))
print('-----------------------------------------------------------------------------')
print(type(train[0][1]))
【输出】:
-----------------------------------------------------------------------------
<class 'torchvision.datasets.folder.ImageFolder'>
-----------------------------------------------------------------------------
Dataset ImageFolder
Number of datapoints: 3000
Root location: dogsandcats/train/
StandardTransform
Transform: Compose(
Resize(size=(224, 224), interpolation=PIL.Image.BILINEAR)
ToTensor()
Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
)
-----------------------------------------------------------------------------
<class 'dict'>
-----------------------------------------------------------------------------
<class 'list'>
-----------------------------------------------------------------------------
<class 'torch.Tensor'>
-----------------------------------------------------------------------------
<class 'int'>
print(train.class_to_idx)
print(train.classes)
print('训练集第1张照片的数据内容:\n', train[0][0]) #第二维度为0,表示图片数据
print('训练集第1张照片的标签(数字代表):', train[0][1]) #第二维度为1 ,表示label
print('训练集第1张照片的大小:', train[0][0].size()) #size():查看图片大小
【输出】:
{'cat': 0, 'dog': 1}
['cat', 'dog']
训练集第1张照片的数据内容:
tensor([[[-1.4500, -1.4329, -1.4158, ..., 1.4098, 1.3755, 1.3070],
[-1.4329, -1.4329, -1.4158, ..., 1.3413, 1.3070, 1.2728],
[-1.4500, -1.4500, -1.4672, ..., 1.2557, 1.2899, 1.3242],
...,
[-1.6213, -1.6555, -1.7240, ..., -1.3815, -1.4329, -1.2788],
[-1.5870, -1.6384, -1.7240, ..., -1.4500, -1.3987, -1.2274],
[-1.5870, -1.6555, -1.7412, ..., -1.2103, -1.3130, -1.3644]],
[[-1.2654, -1.2654, -1.2479, ..., 1.5357, 1.4832, 1.4307],
[-1.2479, -1.2654, -1.2479, ..., 1.4307, 1.4132, 1.3782],
[-1.2654, -1.2829, -1.3004, ..., 1.3431, 1.3782, 1.4132],
...,
[-1.5630, -1.5980, -1.6681, ..., -1.5280, -1.5630, -1.3704],
[-1.5280, -1.5805, -1.6681, ..., -1.5805, -1.5280, -1.3179],
[-1.5280, -1.5980, -1.6856, ..., -1.3354, -1.4405, -1.4580]],
[[-1.1073, -1.0724, -1.0201, ..., 1.2631, 1.1062, 1.0017],
[-1.0898, -1.0724, -1.0201, ..., 1.1759, 1.0714, 0.9668],
[-1.1073, -1.0898, -1.0550, ..., 1.0888, 1.0539, 1.0539],
...,
[-1.3164, -1.3513, -1.4210, ..., -1.4036, -1.4384, -1.2467],
[-1.2816, -1.3339, -1.4210, ..., -1.4733, -1.4210, -1.2119],
[-1.2816, -1.3513, -1.4384, ..., -1.2467, -1.3513, -1.3687]]])
训练集第1张照片的标签(数字代表): 0
训练集第1张照片的大小: torch.Size([3, 224, 224])
方法torch.utils.data.DataLoader(dataset, batch_size=1, shuffle=False, num_workers=0)
import torch
# 批处理
train_data_loader = torch.utils.data.DataLoader(train, batch_size=32, num_workers=3, shuffle=True)
valid_data_loader = torch.utils.data.DataLoader(valid, batch_size=32, num_workers=3, shuffle=True)
【注意】class torch.nn.Module: 所有网络的基类,你的模型也应该继承这个类。
1.torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True)
查看属性:
2.torch.nn.Dropout2d(p=0.5, inplace=False) 随机将输入张量中整个通道设置为0,对于每次前向调用,被置0的通道都是随机的。
【说明】:
3. torch.nn.Linear(in_features, out_features, bias=True) 输入数据做线性变换
查看属性:
import torch
import torch.nn as nn
from torch import autograd
// 举例子
m = nn.Linear(20, 30)
input = autograd.Variable(torch.randn(128, 20))
output = m(input)
print(output.size())
print(m.weight)
print(m.bias)
【输出】:
torch.Size([128, 30])
------略------
------略------
4.torch.nn.functional.max_pool2d(input,kernel_size,stride=None,padding=0,dilation=1,ceil_mode=False,return_indices=False)
【可能存在的疑问】:为什么不使用torch.nn中的池化层,而是调用torch.nn.functional中的池化层?
【个人想法】:这二者在功能上并没有本质的区别, 其实nn.dropout是调用的F.dropout的函数实现的, 而他们在使用的时候是有区别的。 nn.Dropout派生自nn.Module,通过这样,我们可以把nn.Dropout定义为模型中的一层。所以nn.dropout在模型类的__init__()函数中被定义为一层,而F.dropout在forward()函数中直接使用。其实我们在构建网络的时候,可以根据个人喜好来使用nn.dropout或者F.dropout,有一些观点认为nn.dropout更好,理由如下:
5.非线性激活函数torch.nn.functional.relu(input, inplace=False)
6.对于view(x.size(0), -1)的理解 为了将前面多维度的tensor展平成一维,x.size(0)指batchsize的值 ,x = x.view(x.size(0), -1)简化x = x.view(batchsize, -1) ,view()函数的功能根reshape类似,用来转换size大小。 而-1指在不告诉函数有多少列的情况下,根据原tensor数据和batchsize自动分配列数。
7.torch.nn.functional.log_softmax(input, dim)
import torch
from torch import nn
from torch import autograd
m = nn.Softmax(dim=1)
input = autograd.Variable(torch.randn(2, 3))
print(input)
print(m(input))
【输出】:
tensor([[-0.2152, 0.1656, 0.0704],
[-0.9096, 0.8762, -0.6123]])
tensor([[0.2636, 0.3857, 0.3507],
[0.1203, 0.7177, 0.1620]])
import torch.nn as nn
class Net(nn.Module):
def __init__(self):
super().__init__()
self.conv1 = nn.Conv2d(3, 10, kernel_size=5)
self.conv2 = nn.Conv2d(10, 20, kernel_size=5)
self.conv2_drop = nn.Dropout2d()
self.fc1 = nn.Linear(56180, 500)
self.fc2 = nn.Linear(500,50)
self.fc3 = nn.Linear(50, 2)
def forward(self, x):
x = F.relu(F.max_pool2d(self.conv1(x), 2))
x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2))
x = x.view(x.size(0), -1)
x = F.relu(self.fc1(x))
x = F.dropout(x, training=self.training)
x = F.relu(self.fc2(x))
x = F.dropout(x,training=self.training)
x = self.fc3(x)
return F.log_softmax(x,dim=1)
# 创建神经网络对象
model = Net()
【ps】:关于层的输入参数大小的设置,可以通过计算或者通过打印每层网络的输出大小进而一层层设置。
1.随机梯度下降torch.optim.SGD(params, lr=,momentum=0,dampening=0,weight_decay=0, nesterov=False)
2.torch.nn中的model.train()与model.eval()
3.对于volatile=True的理解
【说明】:它与requires_grad有区别,这里先不讲解。
from torch import optim
# 优化器:随机梯度下降
optimizer = optim.SGD(model.parameters(),lr=0.01,momentum=0.5)
from torch.autograd import Variable
import torch.nn.functional as F
# 创建训练方法:
# 参数epoch:循环次数
# 参数model:神经网络模型对象
# 参数data_loader:批处理的数据集
# 参数phase:模式选择:训练/验证
# 参数volatile:循环次数
# 返回值:损失值loss,准确率accuracy
def fit(epoch,model,data_loader,phase='training',volatile=False):
# 选择模式:如果是训练,则启用BatchNormalization和Dropout,将BatchNormalization和Dropout置为True
# 如果是验证,则不启用BatchNormalization和Dropout,将BatchNormalization和Dropout置为False
#【说明】:eval()时,框架会自动把BN和DropOut固定住,不会取平均,而是用训练好的值,
# 不然的话,一旦test的batch_size过小,很容易就会被BN层导致生成图片颜色失真极大!
if phase == 'training':
model.train()
if phase == 'validation':
model.eval()
volatile=True
# 初始化loss与correct存放的列表
running_loss = 0.0
running_correct = 0
for batch_idx , (data,target) in enumerate(data_loader):
# 转换数据
data , target = Variable(data,volatile),Variable(target)
# 如果是训练模式:梯度参数清0
if phase == 'training':
optimizer.zero_grad()
# 模型输出结果output
output = model(data)
# 根据模型输出结果output与标签计算损失loss
loss = F.nll_loss(output,target)
running_loss += F.nll_loss(output,target,size_average=False).item()
preds = output.data.max(dim=1,keepdim=True)[1]
running_correct += preds.eq(target.data.view_as(preds)).cpu().sum()
# 如果是训练模式:进行反向传播、优化等
if phase == 'training':
loss.backward()
optimizer.step()
# 计算损失值
loss = running_loss/len(data_loader.dataset)
# 计算准确率
accuracy = 100. * running_correct/len(data_loader.dataset)
# 打印
print(f'{phase} loss: {loss:{5}.{2}} and {phase} accuracy: {running_correct}/{len(data_loader.dataset)}{accuracy:{10}.{4}}')
return loss,accuracy
# ---------------------------------开始训练模型------------------------------------
train_losses , train_accuracy = [],[]
val_losses , val_accuracy = [],[]
# 进行20次循环训练
for epoch in range(1,20):
# 训练
epoch_loss, epoch_accuracy = fit(epoch,model,train_data_loader,phase='training')
# 验证
val_epoch_loss , val_epoch_accuracy = fit(epoch,model,valid_data_loader,phase='validation')
# 添加训练集和验证集:损失和准确率
train_losses.append(epoch_loss)
train_accuracy.append(epoch_accuracy)
val_losses.append(val_epoch_loss)
val_accuracy.append(val_epoch_accuracy)
【输出】:
training loss is 0.69 and training accuracy is 1520/3000 50.67
validation loss is 0.69 and validation accuracy is 500/1000 50.0
training loss is 0.69 and training accuracy is 1607/3000 53.57
validation loss is 0.68 and validation accuracy is 573/1000 57.3
training loss is 0.68 and training accuracy is 1684/3000 56.13
validation loss is 0.67 and validation accuracy is 554/1000 55.4
training loss is 0.67 and training accuracy is 1756/3000 58.53
validation loss is 0.66 and validation accuracy is 637/1000 63.7
training loss is 0.66 and training accuracy is 1819/3000 60.63
validation loss is 0.65 and validation accuracy is 639/1000 63.9
training loss is 0.65 and training accuracy is 1868/3000 62.27
validation loss is 0.65 and validation accuracy is 641/1000 64.1
training loss is 0.63 and training accuracy is 1916/3000 63.87
validation loss is 0.63 and validation accuracy is 671/1000 67.1
training loss is 0.62 and training accuracy is 1944/3000 64.8
validation loss is 0.63 and validation accuracy is 669/1000 66.9
training loss is 0.61 and training accuracy is 2038/3000 67.93
validation loss is 0.61 and validation accuracy is 684/1000 68.4
training loss is 0.6 and training accuracy is 2031/3000 67.7
validation loss is 0.61 and validation accuracy is 685/1000 68.5
training loss is 0.59 and training accuracy is 2058/3000 68.6
validation loss is 0.6 and validation accuracy is 697/1000 69.7
training loss is 0.57 and training accuracy is 2125/3000 70.83
validation loss is 0.59 and validation accuracy is 696/1000 69.6
training loss is 0.54 and training accuracy is 2178/3000 72.6
validation loss is 0.58 and validation accuracy is 703/1000 70.3
training loss is 0.52 and training accuracy is 2238/3000 74.6
validation loss is 0.58 and validation accuracy is 708/1000 70.8
training loss is 0.47 and training accuracy is 2362/3000 78.73
validation loss is 0.6 and validation accuracy is 688/1000 68.8
training loss is 0.45 and training accuracy is 2396/3000 79.87
validation loss is 0.59 and validation accuracy is 682/1000 68.2
training loss is 0.4 and training accuracy is 2450/3000 81.67
validation loss is 0.6 and validation accuracy is 702/1000 70.2
training loss is 0.36 and training accuracy is 2524/3000 84.13
validation loss is 0.66 and validation accuracy is 697/1000 69.7
training loss is 0.34 and training accuracy is 2571/3000 85.7
validation loss is 0.63 and validation accuracy is 703/1000 70.3
# 可视化:损失值loss
plt.plot(range(1,len(train_losses)+1),train_losses,'bo',label = 'training loss')
plt.plot(range(1,len(val_losses)+1),val_losses,'r',label = 'validation loss')
plt.legend()
# 可视化:准确率accuracy
plt.plot(range(1,len(train_accuracy)+1),train_accuracy,'bo',label = 'train accuracy')
plt.plot(range(1,len(val_accuracy)+1),val_accuracy,'r',label = 'val accuracy')
plt.legend()
【ps】:通过图像可以发现,对于每次迭代,训练集的损失都在减少,而验证集的损失却变得更糟糕。在训练过程中,准确率也增加,但在70%左右几乎饱和。显而易见,这是一个没有泛化的模型,为了获得效果更好的模型,我们可以近一步调整网路的参数和层数设计等等,然而我们可以通过另一种技术“迁移学习”来更加简便的获取精确的模型!
本次案例练习的结果并不是非常的理想,我并没有对其进行更深的改善。这是因为这次仅仅是基于Pytorch框架对构建卷积神经网络整体流程的大致了解,以及常见的API参数以及调用方法的使用,关于Pytorch的基础知识将在后续慢慢展开!
评论