code from GitHub: https://github.com/yhenon/keras-frcnn

# 自定义的loss函数

$$L({p_i}, {t_i})= \frac{1}{N_{cls}}\sum\limits_{i}L_{cls}(p_i,p_i^*)+\lambda \frac{1}{N_{reg}}\sum\limits_{i}p_i^*L_{reg}(t_i,t_i^*)$$
$i$ : the index of an anchor in a mini-batch.

$p_i$ : the predicted probability of anchor i being an object.

$p_i^*$ : the ground-truth label, is 1 if the anchor is positive, and is 0 if the anchor is negative.

$t_i$ : a vector representing the 4 parameterized coordinates of the predicted bounding box.

• $t_i = (t_x,t_y,t_w,t_h)$
$$t_i: \begin{cases}t_x = (x-x_a)/w_a \\ t_y=(y-y_a)/h_a\\t_w=log(w/w_a)\\t_h=log(h/h_a)\end{cases}$$
$x, y, w, h​$ denote the box’s center coordinates and its width and height. Variables $x, x_a , x^∗​$ are for the predicted box, anchor box, and groundtruth box respectively (likewise for $y, w, h​$). This can be thought of as bounding-box regression from an anchor box to a nearby ground-truth box.

$t_i^*$ : is that of the ground-truth box associated with a positive anchor.

$L_{cls}$ : the classification loss——log loss over two classes (object vs. not object).

$L_{reg}$ : $L_{reg}(t_i,t_i^*)=R(t_i-t_i^*)$

• $R( )$ —— robust loss function ($smooth_{L_1}$) :

$$smooth_{L_1}(x) = \begin{cases} 0.5x^2 & \text{if |x| < 1} \\ |x|-0.5 & \text{otherwise} \end{cases}$$

• $p_i^*L_{reg}$ : means the regression loss is activated only for positive anchors ($p_i^*=1$) and is disabled otherwise ($p_i^*=0$).

$N_{cls},N_{reg}$ : normalize two terms. In current implementation, $N_{cls}=256$ (mini-batch size), $N_{reg} = 2400$ (the number of anchor locations).

$\lambda​$ : balancing parameter, by default we set $\lambda = 10​$, and thus both cls and reg terms are roughly equally weighted. We show by experiments that the results are insensitive to the values of λ in a wide range. We also note that the normalization as above is not required and could be simplified.

sigmoid和softmax是神经网络输出层使用的激活函数，分别用于两类判别和多类判别。

binary cross-entropy (log loss) 和categorical cross-entropy是相对应的损失函数。

# 程序实现

num_anchors是9，9种尺度。

K.cast参考Keras 后端，将张量转换到不同的 dtype 并返回。keras.backend.less_equal(x, y) : 逐个元素比对 (x <= y) 的真值，返回一个布尔张量。

return里面除以K.sum(epsilon + y_true[:, :, :, :4 * num_anchors])，实际上是计算了positive box的个数，整体取平均，加上epsilon是防止除以0。

