Caffe使用训练好的GoogleNet进行图像识别

                                                                                                                                           点击此处返回总目录

 

这节课讲解怎么在caffe中使用GoogleNet来实现图像的识别。

 

一、 到caffe的GitHub上去下载训练好的GoogleNet模型。

    地址:https://github.com/BVLC/caffe

    models->bvlc_googlenet->点击下面的链接,下载。

    Caffe使用训练好的GoogleNet进行图像识别

提醒:不要半夜下载。可能是关闭的,半夜下不下来。

 

下载完后为:Caffe使用训练好的GoogleNet进行图像识别,有51M。

放到caffe-master\models\bvlc_googlenet文件夹下。

 

我们可以看一下GoogleNet的网络结构,使用绘图工具。比较恶心,就不粘上了。解释一下网络结构吧

//deploy.prototxt(不全,拿了一部分)

name: "GoogleNet"
layer {
  name: "data"
  type: "Input"
  top: "data"
  input_param { shape: { dim: 10 dim: 3 dim: 224 dim: 224 } }         //一个批次10张图片。彩色图片。
}
layer {
  name: "conv1/7x7_s2"
  type: "Convolution"
  bottom: "data"
  top: "conv1/7x7_s2"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 64
    pad: 3                                      #外面补3圈0
    kernel_size: 7
    stride: 2
    weight_filler {
      type: "xavier"
      std: 0.1                                #标准差。但是对于"xavier"算法来说没用。当type为其他类型时,比如高斯算法时有用。
    }
    bias_filler {
      type: "constant"                    #常数。为0.2。如果不设置就是0
      value: 0.2
    }
  }
}
layer {
  name: "conv1/relu_7x7"
  type: "ReLU"
  bottom: "conv1/7x7_s2"
  top: "conv1/7x7_s2"
}
layer {
  name: "pool1/3x3_s2"
  type: "Pooling"
  bottom: "conv1/7x7_s2"
  top: "pool1/3x3_s2"
  pooling_param {
    pool: MAX
    kernel_size: 3
    stride: 2
  }
}
layer {
  name: "pool1/norm1"
  type: "LRN"                                #局部响应归一化。可以提高模型识别的准确率。
  bottom: "pool1/3x3_s2"
  top: "pool1/norm1"
  lrn_param {
    local_size: 5
    alpha: 0.0001
    beta: 0.75
  }
}

layer {
  name: "inception_4a/pool_proj"
  type: "Convolution"
  bottom: "inception_4a/pool"
  top: "inception_4a/pool_proj"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 64
    kernel_size: 1
    weight_filler {
      type: "xavier"
      std: 0.1
    }
    bias_filler {
      type: "constant"
      value: 0.2
    }
  }
}
layer {
  name: "inception_4a/relu_pool_proj"
  type: "ReLU"
  bottom: "inception_4a/pool_proj"
  top: "inception_4a/pool_proj"
}
layer {
  name: "inception_4a/output"
  type: "Concat"       //表示合并数据的意思。把前面很多个分支的输出汇总。合并的条件是数据的后面三个参数一样
  bottom: "inception_4a/1x1"
  bottom: "inception_4a/3x3"
  bottom: "inception_4a/5x5"
  bottom: "inception_4a/pool_proj"
  top: "inception_4a/output"
}
layer {
  name: "inception_4b/1x1"
  type: "Convolution"
  bottom: "inception_4a/output"
  top: "inception_4b/1x1"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 160
    kernel_size: 1
    weight_filler {
      type: "xavier"
      std: 0.03
    }
    bias_filler {
      type: "constant"
      value: 0.2
    }
  }
}

layer {
  name: "inception_5a/relu_5x5"
  type: "ReLU"
  bottom: "inception_5a/5x5"
  top: "inception_5a/5x5"
}
layer {
  name: "inception_5a/pool"
  type: "Pooling"
  bottom: "pool4/3x3_s2"
  top: "inception_5a/pool"
  pooling_param {
    pool: MAX
    kernel_size: 3
    stride: 1
    pad: 1
  }
}

layer {
  name: "loss3/classifier"
  type: "InnerProduct"
  bottom: "pool5/7x7_s1"
  top: "loss3/classifier"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  inner_product_param {
    num_output: 1000
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layer {
  name: "prob"
  type: "Softmax"
  bottom: "loss3/classifier"
  top: "prob"
}

GoogleNet中有这样的结构,叫做Inception。池化层的输出给了三个卷积层,还有一个池化层。Inception中,这三个卷积层意味着三个不同大小的感受野,最后合并意味着不同尺度特征的融合。

采用1,3,5的卷积核大小,是因为使用步长为1,pad为0,1,2的方式采样后得到的特征平面大小相同。比如,

原图像大小为x*x。卷积核为1*1,pad = 0,得到图片:x*x

原图像x*x,卷积核3*3,pad = 1,得到图片(x+2)-3+1 =x ,还是x*x

原图片x*x,卷积核5*5,pad = 2,得到图片(x+4)-5+1 = x ,还是x*x 

 

这样才能够合并。

 

Caffe使用训练好的GoogleNet进行图像识别

 

 

 

2. 准备要识别的图片

caffe-windows\models\bvlc_googlenet目录下新建文件夹image。

从网上随便下载了几张图片。

Caffe使用训练好的GoogleNet进行图像识别

 

3. 准备synset_words.txt文件

网上应该能搜到。前面是编号,后面是1000个物体的分类。

放到caffe-windows\models\bvlc_googlenet下。

Caffe使用训练好的GoogleNet进行图像识别

 

 

 

4. 运行程序,进行图像识别。

//

# coding: utf-8

import caffe
import numpy as np
import matplotlib.pyplot as plt
import os
import PIL
from PIL import Image
import sys

#定义Caffe根目录
caffe_root = 'F:/deep_learning/Caffe/caffe-windows/'
#网络结构描述文件
deploy_file = caffe_root+'models/bvlc_googlenet/deploy.prototxt'
#训练好的模型
model_file = caffe_root+'models/bvlc_googlenet/bvlc_googlenet.caffemodel'

#cpu模式.因为只安装了CPU的版本,所以这句话没有也可以。
caffe.set_mode_cpu()

#定义网络模型
net = caffe.Classifier(deploy_file, #调用deploy文件
                       model_file,  #调用模型文件
                       mean=np.load(caffe_root +'python/caffe/imagenet/ilsvrc_2012_mean.npy').mean(1).mean(1), #调用均值文件
                       channel_swap=(2,1,0),  #caffe中图片是BGR格式,而原始格式是RGB,所以要转化
                       raw_scale=255,         #python中将图片存储为[0, 1],而caffe中将图片存储为[0, 255],所以需要一个转换
                       image_dims=(224, 224)) #输入模型的图片要是224*224的图片

#分类标签文件
imagenet_labels_filename = caffe_root +'models/bvlc_googlenet/synset_words.txt'
#载入分类标签文件
labels = np.loadtxt(imagenet_labels_filename, str, delimiter='\t')

#对目标路径中的图像,遍历并分类
for root,dirs,files in os.walk(caffe_root+'models/bvlc_googlenet/image/'):
    for file in files:
        #加载要分类的图片
        image_file = os.path.join(root,file)
        input_image = caffe.io.load_image(image_file)   #载入图片

        #打印图片路径及名称
        image_path = os.path.join(root,file)
        print(image_path)
        
        #显示图片
        img=Image.open(image_path)
        plt.imshow(img)
        plt.axis('off')
        plt.show()
        
        #预测图片类别
        prediction = net.predict([input_image])           #结果是1000个分类对应的概率值
        print 'predicted class:',prediction[0].argmax()   #最大的概率所在的位置

        # 输出概率最大的前5个预测结果
        top_k = prediction[0].argsort()[-5:][::-1]        #对1000个概率进行排序。提取最后的5个值。最后再倒序。得到的是编号。
        for node_id in top_k:     
            #获取分类名称
            human_string = labels[node_id]
            #获取该分类的置信度
            score = prediction[0][node_id]
            print('%s (score = %.5f)' % (human_string, score))

运行结果:

D:\Anaconda3\envs\py2\lib\site-packages\skimage\io\_io.py:49: UserWarning: `as_grey` has been deprecated in favor of `as_gray`
  warn('`as_grey` has been deprecated in favor of `as_gray`')
F:/deep_learning/Caffe/caffe-windows/models/bvlc_googlenet/image/1.jpg

Caffe使用训练好的GoogleNet进行图像识别

predicted class: 249
n02110063 malamute, malemute, Alaskan malamute (score = 0.56430)
n02109961 Eskimo dog, husky (score = 0.21304)
n02110185 Siberian husky (score = 0.20320)
n02091467 Norwegian elkhound, elkhound (score = 0.01089)
n02106662 German shepherd, German shepherd dog, German police dog, alsatian (score = 0.00340)
F:/deep_learning/Caffe/caffe-windows/models/bvlc_googlenet/image/2.jpg

Caffe使用训练好的GoogleNet进行图像识别

predicted class: 436
n02814533 beach wagon, station wagon, wagon, estate car, beach waggon, station waggon, waggon (score = 0.60606)
n02974003 car wheel (score = 0.24771)
n04285008 sports car, sport car (score = 0.07949)
n03770679 minivan (score = 0.01537)
n03100240 convertible (score = 0.01536)
F:/deep_learning/Caffe/caffe-windows/models/bvlc_googlenet/image/3.jpg

Caffe使用训练好的GoogleNet进行图像识别

predicted class: 660
n03776460 mobile home, manufactured home (score = 0.43408)
n02859443 boathouse (score = 0.12835)
n02793495 barn (score = 0.05500)
n04589890 window screen (score = 0.03707)
n04435653 tile roof (score = 0.03689)
F:/deep_learning/Caffe/caffe-windows/models/bvlc_googlenet/image/4.jpg

Caffe使用训练好的GoogleNet进行图像识别

predicted class: 296
n02134084 ice bear, polar bear, Ursus Maritimus, Thalarctos maritimus (score = 0.98611)
n02120079 Arctic fox, white fox, Alopex lagopus (score = 0.01358)
n02114548 white wolf, Arctic wolf, Canis lupus tundrarum (score = 0.00020)
n02111889 Samoyed, Samoyede (score = 0.00006)
n02441942 weasel (score = 0.00002)
F:/deep_learning/Caffe/caffe-windows/models/bvlc_googlenet/image/5.jpg

Caffe使用训练好的GoogleNet进行图像识别

predicted class: 850
n04399382 teddy, teddy bear (score = 0.22030)
n02883205 bow tie, bow-tie, bowtie (score = 0.07489)
n04579432 whistle (score = 0.05284)
n02910353 buckle (score = 0.03879)
n04133789 sandal (score = 0.03587)
F:/deep_learning/Caffe/caffe-windows/models/bvlc_googlenet/image/6.jpg

Caffe使用训练好的GoogleNet进行图像识别

predicted class: 283
n02123394 Persian cat (score = 0.48360)
n02123045 tabby, tabby cat (score = 0.38249)
n02124075 Egyptian cat (score = 0.05283)
n02123159 tiger cat (score = 0.03804)
n02127052 lynx, catamount (score = 0.01692)
F:/deep_learning/Caffe/caffe-windows/models/bvlc_googlenet/image/7.jpg

Caffe使用训练好的GoogleNet进行图像识别

predicted class: 584
n03476684 hair slide (score = 0.16116)
n03954731 plane, carpenter's plane, woodworking plane (score = 0.15686)
n04133789 sandal (score = 0.04462)
n04517823 vacuum, vacuum cleaner (score = 0.03880)
n04372370 switch, electric switch, electrical switch (score = 0.03754)
F:/deep_learning/Caffe/caffe-windows/models/bvlc_googlenet/image/8.jpg

Caffe使用训练好的GoogleNet进行图像识别

predicted class: 283
n02123394 Persian cat (score = 0.82506)
n02112018 Pomeranian (score = 0.03154)
n03325584 feather boa, boa (score = 0.01828)
n02328150 Angora, Angora rabbit (score = 0.01628)
n02127052 lynx, catamount (score = 0.01535)
F:/deep_learning/Caffe/caffe-windows/models/bvlc_googlenet/image/9.jpg

 

Caffe使用训练好的GoogleNet进行图像识别

predicted class: 283
n02123394 Persian cat (score = 0.49727)
n02127052 lynx, catamount (score = 0.21929)
n02123045 tabby, tabby cat (score = 0.05281)
n02124075 Egyptian cat (score = 0.04727)
n03958227 plastic bag (score = 0.03218)