簡介
上一篇文章介紹了OC實現softmax來簡單完成MNIST數據的訓練,但是準確率只有90%。最后也提到了可以通過添加CNN來提高準確率。那么CNN是什么?
卷積神經網絡(Convolutional Neural Network, CNN)是一種前饋神經網絡,它的人工神經元可以響應一部分覆蓋范圍內的周圍單元,對于大型圖像處理有出色表現。
卷積神經網絡由一個或多個卷積層和頂端的全連通層(對應經典的神經網絡)組成,同時也包括關聯權重和池化層(pooling layer)。這一結構使得卷積神經網絡能夠利用輸入數據的二維結構。與其他深度學習結構相比,卷積神經網絡在圖像和語音識別方面能夠給出更優的結果。這一模型也可以使用反向傳播算法進行訓練。相比較其他深度、前饋神經網絡,卷積神經網絡需要估計的參數更少,使之成為一種頗具吸引力的深度學習結構。
接下來介紹本人用OC實現的卷積神經網絡。
原理
卷積神經網絡核心在于局部感知、權值共享與池化三個方面。
- 局部感知:對于一張完整的圖像,通過一個感知器去捕捉它的局部信息,這樣可以降低訓練參數。如1000*1000的圖像,用10*10的感知器,全部掃描,只需要991*991個神經元。
- 權值共享:同一個感知器產生的功能和結構是相同的,是可以相互替代的,那么就可以大幅減少訓練參數。如上面所述,只需要10*10=100個參數訓練。
- 池化:也就是下采樣,對前面1000×1000的圖像經過10×10的卷積核卷積后,得到的是991×991的特征圖,如果使用2×2的池化規模,即每4個點組成的小方塊中,取最大的一個作為輸出,最終得到的是496×496大小的特征圖。
卷積神經網絡前饋流程主要包含:卷積、采樣(池化)、光柵化(全連接)、感知器(激活)。
- 卷積:實現圖像的局部感知與權值共享,如下圖所示,展示了一個3×3的卷積核在5×5的圖像上做卷積的過程。每個卷積都是一種特征提取方式,就像一個篩子,將圖像中符合條件的部分篩選出來。

計算方法如圖所示的卷積核[1,0,1,0,1,0,1,0,1],
第一個4 = 1*1+1*0+1*1+0*1+1*0+1*1+0*1+0*0+1*1。
池化:上面已經介紹過最大池化,還有均值池化(取一個小方塊里的均值),高斯池化與可訓練池化等。
光柵化:主要是將采樣的特征圖排成一個向量。
感知器:常用的有Relu、tanh、sigmoid等,具體的優劣勢、公式很多論文都有分析介紹過,這里就不多述。
卷積神經網絡的反向傳播更新,后面有機會再具體解釋,這里給出幾個公式:
池化:反向傳播損失的時候,最大池化將一點殘差更新到前饋流程中的最大值位置,其他3個位置填0;均值池化,將1個點的殘差平均到4個點上。
-
卷積:參數公式如下,其中,rot180是將一個矩陣旋轉180度; Oq是連接到該“神經中樞”前的池化層的輸出;對偏置的梯度即 Δp所有元素之和。
參數更新公式
損失傳播公式如下:
OC實現CNN
上面簡單介紹了CNN的相關知識,接下來看一下具體實現。
首先針對前面的Softmax實現中,要添加上CNN損失反傳等代碼,實現CNN+Softmax如下:
- (void)updateModel:(double *)index currentPos:(int)pos
{
for (int i = 0; i < _kType; i++) {
double delta;
if (i != _randomY[pos]) {
delta = 0.0 - index[i];
}
else
{
delta = 1.0 - index[i];
}
_bias[i] += _descentRate * delta;
double loss = _descentRate * delta / _randSize;
double *decay = malloc(sizeof(double) * _dim);
vDSP_vsmulD(_randomX[pos], 1, &loss, decay, 1, _dim);
double *backLoss = malloc(sizeof(double) * _dim);
vDSP_vsmulD((_theta + i * _dim), 1, &loss, backLoss, 1, _dim);
[_cnn backPropagation:backLoss];
vDSP_vaddD((_theta + i * _dim), 1, decay, 1, (_theta + i * _dim), 1, _dim);
if (decay != NULL) {
free(decay);
decay = NULL;
}
}
}
CNN主體實現代碼如下:
//
// MLCnn.m
// MNIST
//
// Created by Jiao Liu on 9/28/16.
// Copyright ? 2016 ChangHong. All rights reserved.
//
#import "MLCnn.h"
@implementation MLCnn
+ (double)truncated_normal:(double)mean dev:(double)stddev
{
double outP = 0.0;
do {
static int hasSpare = 0;
static double spare;
if (hasSpare) {
hasSpare = 0;
outP = mean + stddev * spare;
continue;
}
hasSpare = 1;
static double u,v,s;
do {
u = (rand() / ((double) RAND_MAX)) * 2.0 - 1.0;
v = (rand() / ((double) RAND_MAX)) * 2.0 - 1.0;
s = u * u + v * v;
} while ((s >= 1.0) || (s == 0.0));
s = sqrt(-2.0 * log(s) / s);
spare = v * s;
outP = mean + stddev * u * s;
} while (fabsl(outP) > 2*stddev);
return outP;
}
+ (double *)relu:(double *)x size:(int)size
{
double *zero = [MLCnn fillVector:0.0f size:size];
vDSP_vmaxD(x, 1, zero, 1, x, 1, size);
if (zero != NULL) {
free(zero);
zero = NULL;
}
return x;
}
+ (double *)fillVector:(double)num size:(int)size
{
double *outP = malloc(sizeof(double) * size);
vDSP_vfillD(&num, outP, 1, size);
return outP;
}
+ (double)max_pool:(double *)input dim:(int)dim row:(int)row col:(int)col stride:(NSArray *)stride
{
double maxV = input[dim * [stride[0] intValue] + row * 2 * [stride[1] intValue] + col * 2];
maxV = MAX(maxV, input[dim * [stride[0] intValue] + (row * 2 + 1) * [stride[1] intValue] + col * 2]);
maxV = MAX(maxV, input[dim * [stride[0] intValue] + row * 2 * [stride[1] intValue] + col * 2 + 1]);
maxV = MAX(maxV, input[dim * [stride[0] intValue] + (row * 2 + 1) * [stride[1] intValue] + col * 2 + 1]);
return maxV;
}
+ (double)mean_pool:(double *)input dim:(int)dim row:(int)row col:(int)col stride:(NSArray *)stride
{
double sum = input[dim * [stride[0] intValue] + row * 2 * [stride[1] intValue] + col * 2];
sum += input[dim * [stride[0] intValue] + (row * 2 + 1) * [stride[1] intValue] + col * 2];
sum += input[dim * [stride[0] intValue] + row * 2 * [stride[1] intValue] + col * 2 + 1];
sum += input[dim * [stride[0] intValue] + (row * 2 + 1) * [stride[1] intValue] + col * 2 + 1];
return sum / 4;
}
+ (void)conv_2d:(double *)input inputRow:(int)NR inputCol:(int)NC filter:(double *)filter output:(double *)output filterRow:(int)P filterCol:(int)Q
{
int outRow = NR - P + 1;
int outCol = NR - Q + 1;
for (int i = 0; i < outRow; i++) {
for (int j = 0; j < outCol; j++) {
double sum = 0;
for (int k = 0; k < P; k++) {
double *inner = malloc(sizeof(double) * Q);
vDSP_vmulD((input + (i + k) * NR + j), 1, (filter + k * Q), 1, inner, 1, Q);
vDSP_vswsumD(inner, 1, &sum, 1, 1, Q);
if (inner != NULL) {
free(inner);
inner = NULL;
}
}
output[i* outCol + j] = sum;
}
}
}
+ (double *)weight_init:(int)size
{
double *outP = malloc(sizeof(double) * size);
for (int i = 0; i < size; i++) {
outP[i] = [MLCnn truncated_normal:0 dev:0.1];
}
return outP;
}
+ (double *)bias_init:(int)size
{
return [MLCnn fillVector:0.1f size:size];
}
# pragma mark - CNN Main
- (id)initWithFilters:(NSArray *)filters fullConnectSize:(int)size row:(int)dimRow col:(int)dimCol keepRate:(double)rate
{
self = [super init];
if (self) {
_filters = filters;
_connectSize = size;
_numOfFilter = (int)[filters count];
_dimRow = dimRow;
_dimCol = dimCol;
_keepProb = rate;
_weight = malloc(sizeof(double) * (_numOfFilter + 1));
_bias = malloc(sizeof(double) * (_numOfFilter + 1));
_filteredImage = malloc(sizeof(double) * (_numOfFilter + 1));
_reluFlag = malloc(sizeof(double) * (_numOfFilter + 1));
_dropoutMask = malloc(sizeof(double) * (_connectSize));
int preDim = 1;
int row = dimRow;
int col = dimCol;
for (int i = 0; i < _numOfFilter; i++) {
_weight[i] = [MLCnn weight_init:[_filters[i][0] intValue] * [_filters[i][1] intValue] * [_filters[i][2] intValue] * preDim];
_bias[i] = [MLCnn bias_init:[_filters[i][2] intValue]];
row = (row - ([_filters[i][0] intValue] / 2) * 2) / 2;
col = (col - ([_filters[i][1] intValue] / 2) * 2) / 2;
preDim = [_filters[i][2] intValue];
_filteredImage[i] = NULL;
_reluFlag[i] = NULL;
}
_weight[_numOfFilter] = [MLCnn weight_init:row * col * preDim * _connectSize];
_bias[_numOfFilter] = [MLCnn bias_init:_connectSize];
_filteredImage[_numOfFilter] = NULL;
_reluFlag[_numOfFilter] = NULL;
_outRow = row;
_outCol = col;
}
return self;
}
- (void)dealloc
{
if (_weight != NULL) {
for (int i = 0; i < _numOfFilter + 1; i++) {
free(_weight[i]);
_weight[i] = NULL;
}
free(_weight);
_weight = NULL;
}
if (_bias != NULL) {
for (int i = 0; i < _numOfFilter + 1; i++) {
free(_bias[i]);
_bias[i] = NULL;
}
free(_bias);
_bias = NULL;
}
if (_filteredImage != NULL) {
for (int i = 1; i < _numOfFilter + 1; i++) {
free(_filteredImage[i]);
_filteredImage[i] = NULL;
}
free(_filteredImage);
_filteredImage = NULL;
}
if (_reluFlag != NULL) {
for (int i = 0; i < _numOfFilter + 1; i++) {
free(_reluFlag[i]);
_reluFlag[i] = NULL;
}
free(_reluFlag);
_reluFlag = NULL;
}
if (_dropoutMask != NULL) {
free(_dropoutMask);
_dropoutMask = NULL;
}
}
- (double *)filterImage:(double *)image state:(BOOL)isTraining
{
if (_numOfFilter == 0) {
return image;
}
int preDim = 1;
int row = _dimRow;
int col = _dimCol;
_filteredImage[0] = image;
for (int i = 0; i < _numOfFilter; i++) {
double *conv = [MLCnn fillVector:0.0f size:row * col * [_filters[i][2] intValue]];
// convolve
for (int k = 0; k < [_filters[i][2] intValue]; k++) {
double *inner = malloc(sizeof(double) * row * col);
for (int m = 0; m < preDim; m++) {
vDSP_imgfirD((_filteredImage[i] + m * row * col), row, col, (_weight[i] + k * [_filters[i][0] intValue] * [_filters[i][1] intValue] * preDim + m * [_filters[i][0] intValue] * [_filters[i][1] intValue]), inner, [_filters[i][0] intValue], [_filters[i][1] intValue]);
vDSP_vaddD((conv + k * row * col), 1, inner, 1, (conv + k * row * col), 1, row * col);
}
vDSP_vsaddD((conv + k * row * col), 1, &_bias[i][k], (conv + k * row * col), 1, row * col);
if (inner != NULL) {
free(inner);
inner = NULL;
}
}
int strideRow = [_filters[i][0] intValue] / 2;
int strideCol = [_filters[i][1] intValue] / 2;
row -= strideRow * 2;
col -= strideCol * 2;
if (_reluFlag[i] != NULL) {
free(_reluFlag[i]);
_reluFlag[i] = NULL;
}
_reluFlag[i] = malloc(sizeof(double) * row * col * [_filters[i][2] intValue]);
for (int k = 0; k < [_filters[i][2] intValue]; k++) {
for (int r = 0; r < row; ++r)
{
for (int c = 0; c < col; ++c)
{
_reluFlag[i][k * row *col + r * col + c] = conv[k * (row + strideRow * 2) * (col + strideCol * 2) + (r + strideRow) * (col + strideCol * 2) + c + strideCol];
}
}
}
// relu
_reluFlag[i] = [MLCnn relu:_reluFlag[i] size:row * col * [_filters[i][2] intValue]];
// pooling 2*2
if (_filteredImage[i+1] != NULL) {
free(_filteredImage[i+1]);
_filteredImage[i+1] = NULL;
}
_filteredImage[i+1] = malloc(sizeof(double) * row * col * [_filters[i][2] intValue] / 4);
for (int k = 0; k < [_filters[i][2] intValue]; k++) {
for (int m = 0; m < row / 2; m++) {
for (int n = 0; n < col / 2; n++) {
_filteredImage[i+1][k * row * col / 4 + m * col / 2 + n] = [MLCnn mean_pool:_reluFlag[i] dim:k row:m col:n stride:@[[NSNumber numberWithInt:row * col],[NSNumber numberWithInt:col]]];
}
}
}
row /= 2;
col /= 2;
preDim = [_filters[i][2] intValue];
if (conv != NULL) {
free(conv);
conv = NULL;
}
}
// full connect
if (_reluFlag[_numOfFilter] != NULL) {
free(_reluFlag[_numOfFilter]);
_reluFlag[_numOfFilter] = NULL;
}
_reluFlag[_numOfFilter] = malloc(sizeof(double) * _connectSize);
vDSP_mmulD(_weight[_numOfFilter], 1, _filteredImage[_numOfFilter], 1, _reluFlag[_numOfFilter], 1, _connectSize, 1, row * col * preDim);
vDSP_vaddD(_reluFlag[_numOfFilter], 1, _bias[_numOfFilter], 1, _reluFlag[_numOfFilter], 1, _connectSize);
_reluFlag[_numOfFilter] = [MLCnn relu:_reluFlag[_numOfFilter] size:_connectSize];
// dropOut
if (isTraining) {
for (int i = 0; i < _connectSize; i++) {
if ((double)rand()/RAND_MAX > _keepProb) {
_dropoutMask[i] = 0;
_reluFlag[_numOfFilter][i] = 0;
}
else
{
_dropoutMask[i] = 1;
}
}
}
else
{
vDSP_vsmulD(_reluFlag[_numOfFilter], 1, &_keepProb, _reluFlag[_numOfFilter], 1, _connectSize);
}
return _reluFlag[_numOfFilter];
}
- (void)backPropagation:(double *)loss
{
int row = _outRow;
int col = _outCol;
// dropOut
vDSP_vmulD(loss, 1, _dropoutMask, 1, loss, 1, _connectSize);
// deRelu
for (int i = 0; i < _connectSize; i++) {
if (_reluFlag[_numOfFilter][i] == 0) {
loss[i] = 0;
}
}
// update full-connect layer
vDSP_vaddD(loss, 1, _bias[_numOfFilter], 1, _bias[_numOfFilter], 1, _connectSize);
double *flayerLoss = malloc(sizeof(double) * row * col * [_filters[_numOfFilter - 1][2] intValue]);
double *transWeight = malloc(sizeof(double) * row * col * [_filters[_numOfFilter - 1][2] intValue] * _connectSize);
vDSP_mtransD(_weight[_numOfFilter], 1, transWeight, 1, row * col * [_filters[_numOfFilter - 1][2] intValue], _connectSize);
vDSP_mmulD(transWeight, 1, loss, 1, flayerLoss, 1, row * col * [_filters[_numOfFilter - 1][2] intValue], 1, _connectSize);
double *flayerWeight = malloc(sizeof(double) * row * col * [_filters[_numOfFilter - 1][2] intValue] * _connectSize);
vDSP_mmulD(loss, 1, _filteredImage[_numOfFilter], 1, flayerWeight, 1, _connectSize, row * col * [_filters[_numOfFilter - 1][2] intValue], 1);
vDSP_vaddD(_weight[_numOfFilter], 1, flayerWeight, 1, _weight[_numOfFilter], 1, row * col * [_filters[_numOfFilter - 1][2] intValue] * _connectSize);
if (loss != NULL) {
free(loss);
loss = NULL;
}
if (flayerWeight != NULL) {
free(flayerWeight);
flayerWeight = NULL;
}
if (transWeight != NULL) {
free(transWeight);
transWeight = NULL;
}
// update Conv & pooling layer
double *convBackLoss = flayerLoss;
for (int i = _numOfFilter - 1; i >= 0; i--) {
// unsampling
row *= 2;
col *= 2;
int preDim = i > 0 ? [_filters[i-1][2] intValue] : 1;
double *unsample = malloc(sizeof(double) * row * col * [_filters[i][2] intValue]);
for (int k = 0; k < [_filters[i][2] intValue]; k++) {
for (int m = 0; m < row / 2; m++) {
for (int n = 0; n < col / 2; n++) {
unsample[k*row*col + m*2*col + n*2] = unsample[k*row*col + m*2*col + n*2 + 1] = unsample[k*row*col + (m*2+1)*col + n*2] = unsample[k*row*col + (m*2+1)*col + n*2 + 1] = convBackLoss[k*row*col/4 + m*col/2 + n] / 4;
}
}
}
// deRelu
for (int k = 0; k < row * col * [_filters[i][2] intValue]; k++) {
if (_reluFlag[i][k] == 0) {
unsample[k] = 0;
}
}
// update conv bias
for (int k = 0; k < [_filters[i][2] intValue]; k++) {
double biasLoss = 0;
for (int m = 0; m < row / 2; m++) {
for (int n = 0; n < col / 2; n++) {
biasLoss += convBackLoss[k*row*col/4 + m*col/2 + n];
}
}
_bias[i][k] += biasLoss;
}
int strideRow = [_filters[i][0] intValue] / 2;
int strideCol = [_filters[i][1] intValue] / 2;
if (i > 0) { //if not the first layer calculate back loss
if (convBackLoss != NULL) {
free(convBackLoss);
convBackLoss = NULL;
}
convBackLoss = [MLCnn fillVector:0.0f size:(row + strideRow * 2) * (col + strideCol * 2) * preDim];
double *curLoss = [MLCnn fillVector:0.0f size:(row + strideRow * 2) * (col + strideCol * 2) * [_filters[i][2] intValue]];
for (int k = 0; k < [_filters[i][2] intValue]; k++) {
for (int p = 0; p < row; p++) {
for (int q = 0; q < col; q++) {
curLoss[k * (row + strideRow * 2) * (col + strideCol * 2) + (p + strideRow) * (col + strideCol * 2) + q + strideCol] = unsample[k * row * col + p * col + q];
}
}
}
// Δq′=(∑p∈CΔp?frot180(Θp))°?′(Oq′)
for (int k = 0; k < preDim; k++) {
double *inner = malloc(sizeof(double) * (row + strideRow * 2) * (col + strideCol * 2));
for (int m = 0; m < [_filters[i][2] intValue]; m++) {
double *reverseWeight = [MLCnn fillVector:0.0f size:[_filters[i][0] intValue] * [_filters[i][1] intValue]];
vDSP_vaddD(reverseWeight, 1, (_weight[i] + m * [_filters[i][0] intValue] * [_filters[i][1] intValue] * preDim + k * [_filters[i][0] intValue] * [_filters[i][1] intValue]), 1, reverseWeight, 1, [_filters[i][0] intValue] * [_filters[i][1] intValue]);
vDSP_vrvrsD(reverseWeight, 1, [_filters[i][0] intValue] * [_filters[i][1] intValue]);
vDSP_imgfirD((curLoss + m * (row + strideRow * 2) * (col + strideCol * 2)), row + strideRow * 2, col + strideCol * 2, reverseWeight, inner, [_filters[i][0] intValue], [_filters[i][1] intValue]);
vDSP_vaddD((convBackLoss + k * (row + strideRow * 2) * (col + strideCol * 2)), 1, inner, 1, (convBackLoss + k * (row + strideRow * 2) * (col + strideCol * 2)), 1, (row + strideRow * 2) * (col + strideCol * 2));
if (reverseWeight != NULL) {
free(reverseWeight);
reverseWeight = NULL;
}
}
if (inner != NULL) {
free(inner);
inner = NULL;
}
}
if (curLoss != NULL) {
free(curLoss);
curLoss = NULL;
}
}
// update conv weight
for (int k = 0; k < [_filters[i][2] intValue]; k++) {
// int strideRow = [_filters[i][0] intValue] / 2;
// int strideCol = [_filters[i][1] intValue] / 2;
// double *curLoss = malloc(sizeof(double) * (row - strideRow * 2) * (col - strideCol * 2));
// for (int p = 0; p < row - strideRow * 2; p++) {
// for (int q = 0; q < col - strideCol * 2; q++) {
// curLoss[p * (col - strideCol * 2) + q] = unsample[k * row * col + (p + strideRow) * col + q + strideCol];
// }
// }
// vDSP_vrvrsD(curLoss, 1, (row - strideRow * 2) * (col - strideCol * 2));
vDSP_vrvrsD((unsample + k * row * col), 1, row * col);
for (int m = 0; m < preDim; m++) {
double *inner = malloc(sizeof(double) * (row + strideRow * 2) * (col + strideCol * 2));
vDSP_imgfirD((_filteredImage[i] + m * (row + strideRow * 2) * (col + strideCol * 2)), (row + strideRow * 2), (col + strideCol * 2), (unsample + k * row * col), inner, row, col);
double *weightLoss = malloc(sizeof(double) * [_filters[i][0] intValue] * [_filters[i][1] intValue]);
int P = row / 2;
int Q = col / 2;
for (int r = P; r <= (row + strideRow * 2) - P; ++r)
{
for (int c = Q; c <= (col + strideCol * 2) - Q; ++c)
{
weightLoss[(r-P)*[_filters[i][1] intValue] + (c-Q)] = inner[r*col + c];
}
}
// [MLCnn conv_2d:(_filteredImage[i] + m * (row + strideRow * 2) * (col + strideCol * 2)) inputRow:(row + strideRow * 2) inputCol:(col + strideCol * 2) filter:(unsample + k * row * col) output:weightLoss filterRow:row filterCol:col];
vDSP_vrvrsD(weightLoss, 1, [_filters[i][0] intValue] * [_filters[i][1] intValue]);
vDSP_vaddD((_weight[i] + k * [_filters[i][0] intValue] * [_filters[i][1] intValue] * preDim + m * [_filters[i][0] intValue] * [_filters[i][1] intValue]), 1, weightLoss, 1, (_weight[i] + k * [_filters[i][0] intValue] * [_filters[i][1] intValue] * preDim + m * [_filters[i][0] intValue] * [_filters[i][1] intValue]), 1, [_filters[i][0] intValue] * [_filters[i][1] intValue]);
if (weightLoss != NULL) {
free(weightLoss);
weightLoss = NULL;
}
if (inner != NULL) {
free(inner);
inner = NULL;
}
}
}
row += strideRow * 2;
col += strideCol * 2;
if (unsample != NULL) {
free(unsample);
unsample = NULL;
}
}
if (convBackLoss != NULL) {
free(convBackLoss);
convBackLoss = NULL;
}
}
@end
這里我選用的激活函數是Relu,卷積核參數初始化用的是正態分布隨機95%區間內數字填充,池化選擇平均池化,也實現最大池化的方法。
最后我選擇卷積核5*5*10,5*5*20只迭代1000次的一個輸出結果如下:
正確率比僅僅使用Softmax有明顯提高。
結語
以上就是OC實現的一個簡單的卷積神經網絡,有興趣的朋友可以下載代碼,嘗試改變卷積核、迭代參數等,有可能得到更高的正確率??。