Caffe学习3：Layer

3、Layer
Layer（层）是Caffe中最庞大最繁杂的模块。由于Caffe强调模块化设计，因此只允许每个layer完成一类特定的计算，例如convolution操作、pooling、非线性变换、内积运算，以及数据加载、归一化和损失计算等。layer这个类可以说是里面最终的一个基本类了,深度网络也就是一层一层的layer叠起来，相互之间通过blob传输数据连接起来，
Caffe学习3：Layer

Caffe中与Layer相关的头文件有7个，

layer.hpp: 父类Layer，定义所有layer的基本接口。
data_layers.hpp: 继承自父类Layer，定义与输入数据操作相关的子Layer，例如DataLayer，HDF5DataLayer和ImageDataLayer等。
vision_layers.hpp: 继承自父类Layer，定义与特征表达相关的子Layer，例如ConvolutionLayer，PoolingLayer和LRNLayer等。
neuron_layers.hpp: 继承自父类Layer，定义与非线性变换相关的子Layer，例如ReLULayer，TanHLayer和SigmoidLayer等。
loss_layers.hpp: 继承自父类Layer，定义与输出误差计算相关的子Layer，例如EuclideanLossLayer，SoftmaxWithLossLayer和HingeLossLayer等。
common_layers.hpp: 继承自父类Layer，定义与中间结果数据变形、逐元素操作相关的子Layer，例如ConcatLayer，InnerProductLayer和SoftmaxLayer等。
layer_factory.hpp: Layer工厂模式类，负责维护现有可用layer和相应layer构造方法的映射表。

其中layer.hpp是抽象出来的基类，除了layer_factory.hpp，其他都是在其基础上的继承，也即剩下的五个头文件和上图中的五个部分。

在layer.hpp`头文件里，包含了这几个头文件：

#include "caffe/blob.hpp"
#include "caffe/common.hpp"
#include "caffe/proto/caffe.pb.h"
#include "caffe/util/device_alternate.hpp"

在device_alternate.hpp中，通过#ifdef CPU_ONLY定义了一些宏来取消GPU的调用：

#define STUB_GPU(classname)
#define STUB_GPU_FORWARD(classname, funcname)
#define STUB_GPU_BACKWARD(classname, funcname)

layer中有这三个主要参数：

LayerParameter layer_param_;      // 这个是protobuf文件中存储的layer参数
vector<share_ptr<Blob<Dtype>>> blobs_;        // 这个存储的是layer的参数，在程序中用的
vector<bool> param_propagate_down_;        // 这个bool表示是否计算各个blob参数的diff，即传播误差

Layer类的构建函数explicit Layer(const LayerParameter& param) : layer_param_(param)会尝试从protobuf文件读取参数。其三个主要接口：

virtual void SetUp(const vector<Blob<Dtype>*>& bottom, vector<Blob<Dtype>*>* top)
inline Dtype Forward(const vector<Blob<Dtype>*>& bottom, vector<Blob<Dtype>*>* top);
inline void Backward(const vector<Blob<Dtype>*>& top, const vector<bool>& propagate_down, const <Blob<Dtype>*>* bottom);

SetUp函数需要根据实际的参数设置进行实现，对各种类型的参数初始化；Forward和Backward对应前向计算和反向更新，输入统一都是bottom，输出为top，其中Backward里面有个propagate_down参数，用来表示该Layer是否反向传播参数。
在Forward和Backward的具体实现里，会根据Caffe::mode()进行对应的操作，即使用cpu或者gpu进行计算，两个都实现了对应的接口Forward_cpu、Forward_gpu和Backward_cpu、Backward_gpu，这些接口都是virtual，具体还是要根据layer的类型进行对应的计算（注意：有些layer并没有GPU计算的实现，所以封装时加入了CPU的计算作为后备）。另外，还实现了ToProto的接口，将Layer的参数写入到protocol buffer文件中。

每个layer有输入一些’bottom’ blobs, 输出一些’top’ blobs. 输入层是”data”和”label” blobs。
Caffe学习3：Layer
一个 layer 通过 bottom（底部）连接层接收数据，通过 top（顶部）连接层输出数据。
每一个 layer 都定义了 3 种重要的运算： setup（初始化设置）， forward（前向传播），backward（反向传播）。

Setup: 在模型初始化时重置 layers 及其相互之间的连接 ;
Forward: 从 bottom 层中接收数据，进行计算后将输出送入到 top 层中;
Backward: 给定相对于 top 层输出的梯度，计算其相对于输入的梯度，并传递到 bottom层。一个有参数的 layer 需要计算相对于各个参数的梯度值并存储在内部。

特别地，Forward 和 Backward 函数分别有 CPU 和 GPU 两种实现方式。如果没有实现 GPU版本，那么 layer 将转向作为备用选项的 CPU 方式。尽管这样会增加额外的数据传送成本（输入数据由 GPU 上复制到 CPU，之后输出数据从 CPU 又复制回到 GPU），但是对于做一些快速实验这样操作还是很方便的。

总的来说， Layer 承担了网络的两个核心操作： forward pass（前向传播） ——接收输入并计算输出； backward pass（反向传播） ——接收关于输出的梯度，计算相对于参数和输入的梯度并反向传播给在它前面的层。由此组成了每个 layer 的前向和反向通道。

由于 Caffe 网络的组合性和其代码的模块化，自定义 layer 是很容易的。只要定义好 layer的 setup（初始化设置）、 forward（前向通道）和 backward（反向通道），就可将 layer 纳入到网络中。

首先layer必须要实现一个forward function，前递函数当然功能可以自己定义啦，在forward中呢他会从input也就是Layer的bottom，对了caffe里面网络的前一层是叫bottom的，从bottom中获取blob，并且计算输出的Blob，当然他们也会实现一个反向传播，根据他们的input的blob以及output blob的error gradient 梯度误差计算得到该层的梯度误差。从公式中也可以看到： Caffe学习3：Layer

layer.hpp:

#ifndef CAFFE_LAYER_H_    
#define CAFFE_LAYER_H_    

#include <algorithm>    
#include <string>    
#include <vector>    

#include "caffe/blob.hpp"    
#include "caffe/common.hpp"    
#include "caffe/layer_factory.hpp"    
#include "caffe/proto/caffe.pb.h"    
#include "caffe/util/device_alternate.hpp"    

namespace caffe {    

/**  
 * @brief An interface for the units of computation which can be composed into a  
 *        Net.  
 *  
 * Layer%s must implement a Forward function, in which they take their input  
 * (bottom) Blob%s (if any) and compute their output Blob%s (if any).  
 * They may also implement a Backward function, in which they compute the error  
 * gradients with respect to their input Blob%s, given the error gradients with  
 * their output Blob%s.  
 */    
template <typename Dtype>    
class Layer {    
 public:    
/*  
首先获得当前网络的Phase，是train还是test，在初始化列表初始化LayerParameter,之后blobs_这里存放的是一个指向blob类的shared_ptr指针的一个vector，在这里是申请空间，然后将传入的layer_param中的blob拷贝过来。  
*/    
// 显示的构造函数不需要重写，任何初始工作在SetUp()中完成    
// 构造方法只复制层参数说明的值，如果层说明参数中提供了权值和偏置参数，也复制    
  explicit Layer(const LayerParameter& param)    
    : layer_param_(param) {    
      // Set phase and copy blobs (if there are any).    
// 训练还是测试？phase      
      phase_ = param.phase();    
      if (layer_param_.blobs_size() > 0) {    
// 将blobs_的大小设置为参数中的大小      
        blobs_.resize(layer_param_.blobs_size());    
        for (int i = 0; i < layer_param_.blobs_size(); ++i) {    
// 新建若干个Blob     
          blobs_[i].reset(new Blob<Dtype>());    
// 从blob文件中获取数据    
          blobs_[i]->FromProto(layer_param_.blobs(i));    
        }    
      }//用protobuf 传入的参数对blobs_ 做初始化，blobs_ 是一个vector 存放指向Blob类的智能指针。    

      #ifdef USE_MPI    
      //If this is a gather layer, all it subsequent layer doesn't need gradient sync.    
      //We will only change itself's property here,    
      //subsequent layers will be inferred in the Net    
    if (is_gathering()){    
        set_need_sync(false);    
      }else{    
        set_need_sync(true);    
      }    
      #endif    
    }    
  virtual ~Layer() {}    
////////////////初始化函数SetUp，每个Layer对象都必须遵循固定的调用模式,    
  /**  
   * @brief Implements common layer setup functionality.  
   * @brief 实现每个layer对象的setup函数  
   * @param bottom the preshaped input blobs  
   * @param bottom 层的输入数据，blob中的存储空间已申请  
   * @param top  
   *     the allocated but unshaped output blobs, to be shaped by Reshape  
   * @param top 层的输出数据，blob对象以构造但是其中的存储空间未申请，  
   *     具体空间大小需根据bottom blob大小和layer_param_共同决定，具体在Reshape函数现实  
   *  
   * Checks that the number of bottom and top blobs is correct.  
   * Calls LayerSetUp to do special layer setup for individual layer types,  
   * followed by Reshape to set up sizes of top blobs and internal buffers.  
   * Sets up the loss weight multiplier blobs for any non-zero loss weights.  
   * This method may not be overridden.  
   * 1. 检查输入输出blob个数是否满足要求，每个层能处理的输入输出数据不一样  
   * 2. 调用LayerSetUp函数初始化特殊的层，每个Layer子类需重写这个函数完成定制的初始化  
   * 3. 调用Reshape函数为top blob分配合适大小的存储空间  
   * 4. 为每个top blob设置损失权重乘子，非LossLayer为的top blob其值为零  
   *  
   * 此方法非虚函数，不用重写，模式固定  
   */    
  void SetUp(const vector<Blob<Dtype>*>& bottom,    
      const vector<Blob<Dtype>*>& top) {    
    CheckBlobCounts(bottom, top);    
    LayerSetUp(bottom, top);    
    Reshape(bottom, top);    
    SetLossWeights(top);    
  }    
/////////////////每个子类Layer必须重写的初始化函数LayerSetUp，    
  /**  
   * @brief Does layer-specific setup: your layer should implement this function  
   *        as well as Reshape.  
   * @brief 定制初始化，每个子类layer必须实现此虚函数  
   *  
   * @param bottom  
   *     the preshaped input blobs, whose data fields store the input data for  
   *     this layer  
   * @param bottom  
   *     输入blob, 数据成员data_和diff_存储了相关数据  
   * @param top  
   *     the allocated but unshaped output blobs  
   * @param top  
   *     输出blob, blob对象已构造但数据成员的空间尚未申请  
   *  
   * This method should do one-time layer specific setup. This includes reading  
   * and processing relevent parameters from the <code>layer_param_</code>.  
   * Setting up the shapes of top blobs and internal buffers should be done in  
   * <code>Reshape</code>, which will be called before the forward pass to  
   * adjust the top blob sizes.  
   * 此方法执行一次定制化的层初始化，包括从layer_param_读入并处理相关的层权值和偏置参数，  
   * 调用Reshape函数申请top blob的存储空间  
   */    
  virtual void LayerSetUp(const vector<Blob<Dtype>*>& bottom,    
      const vector<Blob<Dtype>*>& top) {}    
/////////////////////每个子类Layer必须重写的Reshape函数，完成top blob形状的设置并为其分配存储空间，    
   /**  
   * @brief Adjust the shapes of top blobs and internal buffers to accomodate  
   *        the shapes of the bottom blobs.  
   * @brief 根据bottom blob的形状和layer_param_计算top blob的形状并为其分配存储空间  
   *  
   * @param bottom the input blobs, with the requested input shapes  
   * @param top the top blobs, which should be reshaped as needed  
   *  
   * This method should reshape top blobs as needed according to the shapes  
   * of the bottom (input) blobs, as well as reshaping any internal buffers  
   * and making any other necessary adjustments so that the layer can  
   * accomodate the bottom blobs.  
   */    
  virtual void Reshape(const vector<Blob<Dtype>*>& bottom,    
      const vector<Blob<Dtype>*>& top) = 0;    

  /**  
   * @brief Given the bottom blobs, compute the top blobs and the loss.  
   *  
   * @param bottom  
   *     the input blobs, whose data fields store the input data for this layer  
   * @param top  
   *     the preshaped output blobs, whose data fields will store this layers'  
   *     outputs  
   * \return The total loss from the layer.  
   *  
   * The Forward wrapper calls the relevant device wrapper function  
   * (Forward_cpu or Forward_gpu) to compute the top blob values given the  
   * bottom blobs.  If the layer has any non-zero loss_weights, the wrapper  
   * then computes and returns the loss.  
   *  
   * Your layer should implement Forward_cpu and (optionally) Forward_gpu.  
   */    
//////////////前向传播函数Forward和反向传播函数Backward    
/*  
首先是Forward.这其实是一个装饰器，继承之后在调用的调用其相应的forward_cpu或者forward_gpu，根据输入的input data blob计算相应的output data blob，同时会反应这一层layer的total loss.  
*/    
  inline Dtype Forward(const vector<Blob<Dtype>*>& bottom,    
      const vector<Blob<Dtype>*>& top);    

  /**  
   * @brief Given the top blob error gradients, compute the bottom blob error  
   *        gradients.  
   *  
   * @param top  
   *     the output blobs, whose diff fields store the gradient of the error  
   *     with respect to themselves  
   * @param propagate_down  
   *     a vector with equal length to bottom, with each index indicating  
   *     whether to propagate the error gradients down to the bottom blob at  
   *     the corresponding index  
   * @param bottom  
   *     the input blobs, whose diff fields will store the gradient of the error  
   *     with respect to themselves after Backward is run  
   *  
   * The Backward wrapper calls the relevant device wrapper function  
   * (Backward_cpu or Backward_gpu) to compute the bottom blob diffs given the  
   * top blob diffs.  
   *  
   * Your layer should implement Forward_cpu and (optionally) Forward_gpu.  
   */    
/*  
BackWard，实现的是反向传播，也就是给定top blob额error gradient 计算得到bottom的error gradient。其输入时 output blobs ，在Ouput blobs里面的diff存储的就是其相应的error gradients。其中propagate_down这个参数跟Bottom的长度是一样的，每一个Index用来指定是否需要反向传播error gradients 到对应的bottom blob。而bottom 这里面的diff 区域存放的就是BackWard计算出来相应的gradient error.  
*/    
  inline void Backward(const vector<Blob<Dtype>*>& top,    
      const vector<bool>& propagate_down,    
      const vector<Blob<Dtype>*>& bottom);    

  /**  
   * @brief Returns the vector of learnable parameter blobs.  
   */    
  vector<shared_ptr<Blob<Dtype> > >& blobs() {    
    return blobs_;//返回vector  blobs_    
  }    

  /**  
   * @brief Returns the layer parameter.  
   */    
//返回layer parameter    
  const LayerParameter& layer_param() const { return layer_param_; }    

  /**  
   * @brief Writes the layer parameter to a protocol buffer  
   */    
//将layer plarameter 写入protobuf    
  virtual void ToProto(LayerParameter* param, bool write_diff = false);    

//返回 ,设置一个blob top 在给定 index 的 loss    
  /**  
   * @brief Returns the scalar loss associated with a top blob at a given index.  
   */    
  inline Dtype loss(const int top_index) const {    
    return (loss_.size() > top_index) ? loss_[top_index] : Dtype(0);    
  }    

  /**  
   * @brief Sets the loss associated with a top blob at a given index.  
   */    
  inline void set_loss(const int top_index, const Dtype value) {    
    if (loss_.size() <= top_index) {    
      loss_.resize(top_index + 1, Dtype(0));    
    }    
    loss_[top_index] = value;    
  }    
//一些返回特定参数的函数：    
  /**  
   * 获得bottom或者top blob的数量状态，比较简单，看名字即可  
   */    
    // 虚函数，而且还是内联的，返回层类型      
  virtual inline const char* type() const { return ""; }      

   // 虚函数，获得bottom blob的精确个数      
  virtual inline int ExactNumBottomBlobs() const { return -1; }      

   // 虚函数，获得bottom blob的最小个数      
  virtual inline int MinBottomBlobs() const { return -1; }      

   // 虚函数，获得bottom blob的最大个数      
  virtual inline int MaxBottomBlobs() const { return -1; }      

   // 虚函数，获得top blob的精确个数      
  virtual inline int ExactNumTopBlobs() const { return -1; }      

   // 虚函数，获得top blob的最小个数      
  virtual inline int MinTopBlobs() const { return -1; }      

   // 虚函数，获得top blob的最大个数      
  virtual inline int MaxTopBlobs() const { return -1; }      

   // 虚函数，bottom blob和top blob的个数是否一致      
  virtual inline bool EqualNumBottomTopBlobs() const { return false; }      

   // 返回当前层是否自动创建匿名top blobs      
   // 如果返回true，表明网络初始化的时候创建了了足够多的匿名top blobs      
   // 来满足ExactNumTopBlobs或者MinTopBlobs所要求的top blobs的个数      
  virtual inline bool AutoTopBlobs() const { return false; }      
/*  
AllowforceBackward用来设置是否强制梯度返回，因为有些层其实不需要梯度信息 ，后面两个函数分别查看以及设置是是否需要计算梯度。  
*/      

   // 对于一个给定的bottom blob，返回是否允许强制反传      
  virtual inline bool AllowForceBackward(const int bottom_index) const {      
    return true;      
  }      

//set_param_propagate_down，param_propagate_down 函数：设置对于那些bottom 需要反向传播。    
  /**  
   * @brief Specifies whether the layer should compute gradients w.r.t. a  
   *        parameter at a particular index given by param_id.  
   *  
   * You can safely ignore false values and always compute gradients  
   * for all parameters, but possibly with wasteful computation.  
   */    
  inline bool param_propagate_down(const int param_id) {    
    return (param_propagate_down_.size() > param_id) ?    
        param_propagate_down_[param_id] : false;    
  }    
  /**  
   * @brief Sets whether the layer should compute gradients w.r.t. a  
   *        parameter at a particular index given by param_id.  
   */    
  inline void set_param_propagate_down(const int param_id, const bool value) {    
    if (param_propagate_down_.size() <= param_id) {    
      param_propagate_down_.resize(param_id + 1, true);    
    }    
    param_propagate_down_[param_id] = value;    
  }    

  #ifdef USE_MPI    
  /**  
   * @brief Checks whether the layer accepts specifed parallel type  
   *  
   * If not supported, will halt the program with hints  
   */    
  inline virtual bool is_gathering() {return false;}    
  inline virtual bool is_scattering() {return false;}    
  inline bool need_sync(){return need_sync_;}    
  inline void set_need_sync(bool val){need_sync_ = val;}    
  #endif    


protected:    
  /** The protobuf that stores the layer parameters */    
  // 层说明参数，从protocal buffers格式的网络结构说明文件中读取    
  LayerParameter layer_param_;    
  /** The phase: TRAIN or TEST */    
  // 层状态，参与网络的训练还是测试    
  Phase phase_;    
  /** The vector that stores the learnable parameters as a set of blobs. */    
  // 层权值和偏置参数，使用向量是因为权值参数和偏置是分开保存在两个blob中的    
  vector<shared_ptr<Blob<Dtype> > > blobs_;    
  /** Vector indicating whether to compute the diff of each param blob. */    
  // 标志每个top blob是否需要计算反向传递的梯度值    
  vector<bool> param_propagate_down_;    

  /** The vector that indicates whether each top blob has a non-zero weight in  
   *  the objective function. */    
  // 非LossLayer为零，LossLayer中表示每个top blob计算的loss的权重    
  vector<Dtype> loss_;    

  #ifdef USE_MPI    
  /**  
   * For parallel use  
   */    
  bool need_sync_;    
  #endif    
/////////////////////////////这两个函数非虚函数，它们内部会调用如下虚函数完成数据前向传递和    
/////////////////////////////误差反向传播，根据执行环境的不同每个子类Layer必须重写CPU和GPU版本，    
  /** @brief Using the CPU device, compute the layer output. */    
  virtual void Forward_cpu(const vector<Blob<Dtype>*>& bottom,    
      const vector<Blob<Dtype>*>& top) = 0;    
  /**  
   * @brief Using the GPU device, compute the layer output.  
   *        Fall back to Forward_cpu() if unavailable.  
   */    
  virtual void Forward_gpu(const vector<Blob<Dtype>*>& bottom,    
      const vector<Blob<Dtype>*>& top) {    
    // LOG(WARNING) << "Using CPU code as backup.";    
    return Forward_cpu(bottom, top);    
  }    

  /**  
   * @brief Using the CPU device, compute the gradients for any parameters and  
   *        for the bottom blobs if propagate_down is true.  
   */    
  virtual void Backward_cpu(const vector<Blob<Dtype>*>& top,    
      const vector<bool>& propagate_down,    
      const vector<Blob<Dtype>*>& bottom) = 0;    
  /**  
   * @brief Using the GPU device, compute the gradients for any parameters and  
   *        for the bottom blobs if propagate_down is true.  
   *        Fall back to Backward_cpu() if unavailable.  
   */    
  virtual void Backward_gpu(const vector<Blob<Dtype>*>& top,    
      const vector<bool>& propagate_down,    
      const vector<Blob<Dtype>*>& bottom) {    
    // LOG(WARNING) << "Using CPU code as backup.";    
    Backward_cpu(top, propagate_down, bottom);    
  }    

  /**  
   * Called by the parent Layer's SetUp to check that the number of bottom  
   * and top Blobs provided as input match the expected numbers specified by  
   * the {ExactNum,Min,Max}{Bottom,Top}Blobs() functions.  
   */    
  virtual void CheckBlobCounts(const vector<Blob<Dtype>*>& bottom,    
                               const vector<Blob<Dtype>*>& top) {    
    if (ExactNumBottomBlobs() >= 0) {    
      CHECK_EQ(ExactNumBottomBlobs(), bottom.size())    
          << type() << " Layer takes " << ExactNumBottomBlobs()    
          << " bottom blob(s) as input.";    
    }// 保证输入bottom 数量和要求的相同    
    if (MinBottomBlobs() >= 0) {    
      CHECK_LE(MinBottomBlobs(), bottom.size())    
          << type() << " Layer takes at least " << MinBottomBlobs()    
          << " bottom blob(s) as input.";    
    }//保证输入的bottom数量大于或等于要求的最小数量    
    if (MaxBottomBlobs() >= 0) {    
      CHECK_GE(MaxBottomBlobs(), bottom.size())    
          << type() << " Layer takes at most " << MaxBottomBlobs()    
          << " bottom blob(s) as input.";    
    }//保证输入的bottom数量小于或等于要求的最大数量    
    if (ExactNumTopBlobs() >= 0) {    
      CHECK_EQ(ExactNumTopBlobs(), top.size())    
          << type() << " Layer produces " << ExactNumTopBlobs()    
          << " top blob(s) as output.";    
    }// 保证输入top数量和要求的相同    
    if (MinTopBlobs() >= 0) {    
      CHECK_LE(MinTopBlobs(), top.size())    
          << type() << " Layer produces at least " << MinTopBlobs()    
          << " top blob(s) as output.";    
    }//保证输入的top数量大于或等于要求的最小数量    
    if (MaxTopBlobs() >= 0) {    
      CHECK_GE(MaxTopBlobs(), top.size())    
          << type() << " Layer produces at most " << MaxTopBlobs()    
          << " top blob(s) as output.";    
    }//保证输入的top数量小于或等于要求的最大数量    
    if (EqualNumBottomTopBlobs()) {    
      CHECK_EQ(bottom.size(), top.size())    
          << type() << " Layer produces one top blob as output for each "    
          << "bottom blob input.";    
    }//保证输入的bottom数量和输出的top数量相同    
  }    

  /**  
   * Called by SetUp to initialize the weights associated with any top blobs in  
   * the loss function. Store non-zero loss weights in the diff blob.  
   */    
/*  
SetLoss是非常重要的一个步骤，是被SetUp调用来初始化top bottom的weights，并且存储非零的loss weights 在diff blob里面  
*/    
  inline void SetLossWeights(const vector<Blob<Dtype>*>& top) {    
    const int num_loss_weights = layer_param_.loss_weight_size();    
    if (num_loss_weights) {    
      CHECK_EQ(top.size(), num_loss_weights) << "loss_weight must be "    
          "unspecified or specified once per top blob.";    
      for (int top_id = 0; top_id < top.size(); ++top_id) {    
        const Dtype loss_weight = layer_param_.loss_weight(top_id);    
        if (loss_weight == Dtype(0)) { continue; }//如果为0不对loss进行操作    
        this->set_loss(top_id, loss_weight);    
        const int count = top[top_id]->count();    
        Dtype* loss_multiplier = top[top_id]->mutable_cpu_diff();    
        caffe_set(count, loss_weight, loss_multiplier);//将loss_multiplier设为loss_weight    
      }     
    }    
  }    

  DISABLE_COPY_AND_ASSIGN(Layer);    
};  // class Layer    

/*  
前传调用对应的Forward_cpu或者Forward_gpu而我们知道Forward_cpu是纯虚函数，必须要实而Forward_gpu是虚函数，如果不实现就调用 Forward_cpu函数了。前传（你必须实现自己的Forward_cpu，实现Forward_gpu是可选的）  
*/    
// Forward and backward wrappers. You should implement the cpu and    
// gpu specific implementations instead, and should not change these    
// functions.    
template <typename Dtype>    
inline Dtype Layer<Dtype>::Forward(const vector<Blob<Dtype>*>& bottom,    
    const vector<Blob<Dtype>*>& top) {    
  Dtype loss = 0;      
  // 根据bottom设置top的形状      
  Reshape(bottom, top);      
  // 设置运行模式CPU or GPU      
  switch (Caffe::mode()) {      
  case Caffe::CPU:      
    // 调用CPU的前传      
    Forward_cpu(bottom, top);      
    // 前传计算完之后计算损失（只有最后一层才进行计算，其余层都不用）      
    for (int top_id = 0; top_id < top.size(); ++top_id) {      
      if (!this->loss(top_id)) { continue; }      
      const int count = top[top_id]->count();      
      // 获取前传的数据      
      const Dtype* data = top[top_id]->cpu_data();      
      // 获取梯度（\frac{\partial Loss}{\partial net}）      
      const Dtype* loss_weights = top[top_id]->cpu_diff();      
      // data与loss_weight的点积，即得损失函数关于当前层权重的偏导了      
    // \frac{\partial Loss}{\partial net} * \frac{\partial net}{\frac{W}}      
    // = \frac{\partial Loss}{\partial W}      
      loss += caffe_cpu_dot(count, data, loss_weights);      
    }      
    break;      
  case Caffe::GPU:      
    // GPU前传      
    Forward_gpu(bottom, top);      
#ifndef CPU_ONLY      
    // 同上，只不过这里用GPU来计算点积了      
    for (int top_id = 0; top_id < top.size(); ++top_id) {      
      if (!this->loss(top_id)) { continue; }      
      const int count = top[top_id]->count();      
      // 获取GPU上的数据      
      const Dtype* data = top[top_id]->gpu_data();      
      const Dtype* loss_weights = top[top_id]->gpu_diff();      
      Dtype blob_loss = 0;      
      caffe_gpu_dot(count, data, loss_weights, &blob_loss);      
      loss += blob_loss;      
    }      
#endif      
    break;    
  default:    
    LOG(FATAL) << "Unknown caffe mode.";    
  }    
  return loss;    
}    

template <typename Dtype>    
inline void Layer<Dtype>::Backward(const vector<Blob<Dtype>*>& top,    
    const vector<bool>& propagate_down,    
    const vector<Blob<Dtype>*>& bottom) {    
  switch (Caffe::mode()) {    
  case Caffe::CPU:    
    Backward_cpu(top, propagate_down, bottom);    
//根据blob top 的error 梯度（diff）计算bottom 的 error 梯度。 propagate_down 是长度     
//和bottom 相同的vector ，用于控制是否需要对对应的bottom 元素传播梯度。具体layer具体定义。    
    break;    
  case Caffe::GPU:    
    Backward_gpu(top, propagate_down, bottom);    
    break;    
  default:    
    LOG(FATAL) << "Unknown caffe mode.";    
  }    
}    
////////////////Layer的序列化函数,将layer的层说明参数layer_param_，层权值和偏置    
////////////////参数blobs_复制到LayerParameter对象，便于写到磁盘，    
// Serialize LayerParameter to protocol buffer    
template <typename Dtype>    
void Layer<Dtype>::ToProto(LayerParameter* param, bool write_diff) {    
  param->Clear();    
  param->CopyFrom(layer_param_); // 复制层说明参数layer_param_    
  param->clear_blobs();    
  // 复制层权值和偏置参数blobs_    
  for (int i = 0; i < blobs_.size(); ++i) {    
    blobs_[i]->ToProto(param->add_blobs(), write_diff);    
  }    
}    

}  // namespace caffe    

#endif  // CAFFE_LAYER_H_

在caffe.proto文件中，主要有一个message是与layer相关的，如下：

enum Phase { // layer状态：train、test  
   TRAIN = 0;  
   TEST = 1;  
}  

// NOTE  
// Update the next available ID when you add a new LayerParameter field.  
//  
// LayerParameter next available layer-specific ID: 137 (last added: reduction_param)  
message LayerParameter { // Layer参数  
  optional string name = 1; // the layer name， layer名字，可由自己任意制定  
  optional string type = 2; // the layer type， layer类型，在具体层中写定，可以通过type()函数获得  
  repeated string bottom = 3; // the name of each bottom blob, bottom名字，可有多个  
  repeated string top = 4; // the name of each top blob，top名字，可有多个  

  // The train / test phase for computation.  
  optional Phase phase = 10; // layer状态：enum Phase {TRAIN = 0; TEST = 1;}  

  // The amount of weight to assign each top blob in the objective.  
  // Each layer assigns a default value, usually of either 0 or 1,  
  // to each top blob.  
  repeated float loss_weight = 5; // 个数必须与top blob一致  

  // Specifies training parameters (multipliers on global learning constants,  
  // and the name and other settings used for weight sharing).  
  repeated ParamSpec param = 6; // train时用到的参数  

  // The blobs containing the numeric parameters of the layer.  
  repeated BlobProto blobs = 7; // blobs个数  

  // Specifies on which bottoms the backpropagation should be skipped.  
  // The size must be either 0 or equal to the number of bottoms.  
  repeated bool propagate_down = 11; // 长度或者是0或者与bottoms个数一致  

  // Rules controlling whether and when a layer is included in the network,  
  // based on the current NetState.  You may specify a non-zero number of rules  
  // to include OR exclude, but not both.  If no include or exclude rules are  
  // specified, the layer is always included.  If the current NetState meets  
  // ANY (i.e., one or more) of the specified rules, the layer is  
  // included/excluded.  
  repeated NetStateRule include = 8; // net state rule  
  repeated NetStateRule exclude = 9; // net state rule  

  // Parameters for data pre-processing.  
  optional TransformationParameter transform_param = 100; // 对data进行预处理包括缩放、剪切等  

  // Parameters shared by loss layers.  
  optional LossParameter loss_param = 101; // loss parameters  

  // Layer type-specific parameters.  
  //  
  // Note: certain layers may have more than one computational engine  
  // for their implementation. These layers include an Engine type and  
  // engine parameter for selecting the implementation.  
  // The default for the engine is set by the ENGINE switch at compile-time.  
  // 具体layer参数  
  optional AccuracyParameter accuracy_param = 102;  
  optional ArgMaxParameter argmax_param = 103;  
  optional ConcatParameter concat_param = 104;  
  optional ContrastiveLossParameter contrastive_loss_param = 105;  
  optional ConvolutionParameter convolution_param = 106;  
  optional DataParameter data_param = 107;  
  optional DropoutParameter dropout_param = 108;  
  optional DummyDataParameter dummy_data_param = 109;  
  optional EltwiseParameter eltwise_param = 110;  
  optional ExpParameter exp_param = 111;  
  optional FlattenParameter flatten_param = 135;  
  optional HDF5DataParameter hdf5_data_param = 112;  
  optional HDF5OutputParameter hdf5_output_param = 113;  
  optional HingeLossParameter hinge_loss_param = 114;  
  optional ImageDataParameter image_data_param = 115;  
  optional InfogainLossParameter infogain_loss_param = 116;  
  optional InnerProductParameter inner_product_param = 117;  
  optional LogParameter log_param = 134;  
  optional LRNParameter lrn_param = 118;  
  optional MemoryDataParameter memory_data_param = 119;  
  optional MVNParameter mvn_param = 120;  
  optional PoolingParameter pooling_param = 121;  
  optional PowerParameter power_param = 122;  
  optional PReLUParameter prelu_param = 131;  
  optional PythonParameter python_param = 130;  
  optional ReductionParameter reduction_param = 136;  
  optional ReLUParameter relu_param = 123;  
  optional ReshapeParameter reshape_param = 133;  
  optional SigmoidParameter sigmoid_param = 124;  
  optional SoftmaxParameter softmax_param = 125;  
  optional SPPParameter spp_param = 132;  
  optional SliceParameter slice_param = 126;  
  optional TanHParameter tanh_param = 127;  
  optional ThresholdParameter threshold_param = 128;  
  optional WindowDataParameter window_data_param = 129;  
}

参考：
http://blog.csdn.net/langb2014/article/details/50988275
http://blog.csdn.net/fengbingchun/article/details/60871052
Caffe官方教程中译本_CaffeCN社区翻译(caffecn.cn)

相关推荐