Keras版Faster_RCNN——loss function

code from GitHub: https://github.com/yhenon/keras-frcnn

训练中的loss

在模型的编译过程中:

1
2
3
4
5
6
7
8
optimizer = Adam(lr=1e-5)
optimizer_classifier = Adam(lr=1e-5)
model_rpn.compile(optimizer=optimizer, loss=[losses.rpn_loss_cls(num_anchors),
losses.rpn_loss_regr(num_anchors)])
model_classifier.compile(optimizer=optimizer_classifier, loss=[losses.class_loss_cls, losses.class_loss_regr(len(classes_count)-1)],
metrics={'dense_class_{}'.format(len(classes_count)):
'accuracy'})
model_all.compile(optimizer='sgd', loss='mae')

这个model_all的建立和编译就是为了最后方便保存整体的权重。

自定义的loss函数

看一下loss函数的定义公式:
$$
L({p_i}, {t_i})= \frac{1}{N_{cls}}\sum\limits_{i}L_{cls}(p_i,p_i^*)+\lambda \frac{1}{N_{reg}}\sum\limits_{i}p_i^*L_{reg}(t_i,t_i^*)
$$
$i$ : the index of an anchor in a mini-batch.

$p_i$ : the predicted probability of anchor i being an object.

$p_i^*$ : the ground-truth label, is 1 if the anchor is positive, and is 0 if the anchor is negative.

$t_i$ : a vector representing the 4 parameterized coordinates of the predicted bounding box.

  • $t_i = (t_x,t_y,t_w,t_h)$
    $$
    t_i:
    \begin{cases}t_x = (x-x_a)/w_a \\ t_y=(y-y_a)/h_a\\t_w=log(w/w_a)\\t_h=log(h/h_a)\end{cases}
    $$
    $x, y, w, h​$ denote the box’s center coordinates and its width and height. Variables $x, x_a , x^∗​$ are for the predicted box, anchor box, and groundtruth box respectively (likewise for $y, w, h​$). This can be thought of as bounding-box regression from an anchor box to a nearby ground-truth box.

$t_i^*$ : is that of the ground-truth box associated with a positive anchor.

$L_{cls}$ : the classification loss——log loss over two classes (object vs. not object).

$L_{reg}$ : $L_{reg}(t_i,t_i^*)=R(t_i-t_i^*)$

  • $R( )$ —— robust loss function ($smooth_{L_1}$) :

    $$
    smooth_{L_1}(x) =
    \begin{cases} 0.5x^2 & \text{if |x| < 1} \\
    |x|-0.5 & \text{otherwise}
    \end{cases}
    $$

  • $p_i^*L_{reg}$ : means the regression loss is activated only for positive anchors ($p_i^*=1$) and is disabled otherwise ($p_i^*=0$).

$N_{cls},N_{reg}$ : normalize two terms. In current implementation, $N_{cls}=256$ (mini-batch size), $N_{reg} = 2400$ (the number of anchor locations).

$\lambda​$ : balancing parameter, by default we set $\lambda = 10​$, and thus both cls and reg terms are roughly equally weighted. We show by experiments that the results are insensitive to the values of λ in a wide range. We also note that the normalization as above is not required and could be simplified.

from知乎

sigmoid和softmax是神经网络输出层使用的激活函数,分别用于两类判别和多类判别。

binary cross-entropy (log loss) 和categorical cross-entropy是相对应的损失函数。

程序实现

1
2
3
4
5
6
7
8
9
10
11
12
13
from keras import backend as K
from keras.objectives import categorical_crossentropy

if K.image_dim_ordering() == 'tf':
import tensorflow as tf

lambda_rpn_regr = 1.0
lambda_rpn_class = 1.0

lambda_cls_regr = 1.0
lambda_cls_class = 1.0

epsilon = 1e-4

导入后端,分类交叉熵计算函数,一些常量定义

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
def rpn_loss_regr(num_anchors):
def rpn_loss_regr_fixed_num(y_true, y_pred):
if K.image_dim_ordering() == 'th':
x = y_true[:, 4 * num_anchors:, :, :] - y_pred
x_abs = K.abs(x)
x_bool = K.less_equal(x_abs, 1.0)
return lambda_rpn_regr * K.sum(
y_true[:, :4 * num_anchors, :, :] * (x_bool * (0.5 * x * x) + (1 - x_bool) * (x_abs - 0.5))) / K.sum(epsilon + y_true[:, :4 * num_anchors, :, :])
else:
x = y_true[:, :, :, 4 * num_anchors:] - y_pred
x_abs = K.abs(x)
x_bool = K.cast(K.less_equal(x_abs, 1.0), tf.float32)

return lambda_rpn_regr * K.sum(
y_true[:, :, :, :4 * num_anchors] * (x_bool * (0.5 * x * x) + (1 - x_bool) * (x_abs - 0.5))) / K.sum(epsilon + y_true[:, :, :, :4 * num_anchors])

return rpn_loss_regr_fixed_num

参考廖雪峰的Python教程:闭包(返回函数)

num_anchors是9,9种尺度。

K.cast参考Keras 后端,将张量转换到不同的 dtype 并返回。keras.backend.less_equal(x, y) : 逐个元素比对 (x <= y) 的真值,返回一个布尔张量。

return里面除以K.sum(epsilon + y_true[:, :, :, :4 * num_anchors]),实际上是计算了positive box的个数,整体取平均,加上epsilon是防止除以0。

1
2
3
4
5
6
7
8
def rpn_loss_cls(num_anchors):
def rpn_loss_cls_fixed_num(y_true, y_pred):
if K.image_dim_ordering() == 'tf':
return lambda_rpn_class * K.sum(y_true[:, :, :, :num_anchors] * K.binary_crossentropy(y_pred[:, :, :, :], y_true[:, :, :, num_anchors:])) / K.sum(epsilon + y_true[:, :, :, :num_anchors])
else:
return lambda_rpn_class * K.sum(y_true[:, :num_anchors, :, :] * K.binary_crossentropy(y_pred[:, :, :, :], y_true[:, num_anchors:, :, :])) / K.sum(epsilon + y_true[:, :num_anchors, :, :])

return rpn_loss_cls_fixed_num

计算RPN的cls loss时,用的是binary_crossentropy loss (按是否包含物体分为两类)。

1
2
3
4
5
6
7
def class_loss_regr(num_classes):
def class_loss_regr_fixed_num(y_true, y_pred):
x = y_true[:, :, 4*num_classes:] - y_pred
x_abs = K.abs(x)
x_bool = K.cast(K.less_equal(x_abs, 1.0), 'float32')
return lambda_cls_regr * K.sum(y_true[:, :, :4*num_classes] * (x_bool * (0.5 * x * x) + (1 - x_bool) * (x_abs - 0.5))) / K.sum(epsilon + y_true[:, :, :4*num_classes])
return class_loss_regr_fixed_num
1
2
def class_loss_cls(y_true, y_pred):
return lambda_cls_class * K.mean(categorical_crossentropy(y_true[0, :, :], y_pred[0, :, :]))

计算classifier的cls loss时,用的是categorical_crossentropy (多类)。


----------over----------


文章标题:Keras版Faster_RCNN——loss function

文章作者:Ge垚

发布时间:2018年07月18日 - 09:07

最后更新:2018年07月20日 - 16:07

原始链接:http://geyao1995.com/Faster_rcnn代码笔记_loss/

许可协议: 署名-非商业性使用-禁止演绎 4.0 国际 转载请保留原文链接及作者。