分析Re3的image_demo.py

运行image_demo.py

分析网络的测试过程

Re3: Real-Time Recurrent Regression Networks for Visual Tracking of Generic Objects

Re3论文:https://arxiv.org/pdf/1705.06368.pdf

Re3代码:https://gitlab.com/danielgordon10/re3-tensorflow

作者主页:https://homes.cs.washington.edu/~xkcd/index.html

image_demo.py

1
2
3
4
5
6
7
8
9
10
11
12
13
14
import cv2
import glob
import numpy as np
import sys
import os.path

basedir = os.path.dirname(__file__)
sys.path.append(os.path.abspath(os.path.join(basedir, os.path.pardir)))
from tracker import re3_tracker

if not os.path.exists(os.path.join(basedir, 'data')):
import tarfile
tar = tarfile.open(os.path.join(basedir, 'data.tar.gz'))
tar.extractall(path=basedir)

os.path.dirname(__file__)是获取当前python脚本的绝对路径,这里的basedir就’/home/ubuntu/DeepLearningModel/re3-tensorflow-master/demo’.

os.path.join()是路径的拼接,os.path.pardir代表父目录(其实就是str类型的’..’,代表上一级目录)。os.path.abspath只是简单地删除...类似的东西,它会将’/home/ubuntu/DeepLearningModel/re3-tensorflow-master/demo/..’变成’/home/ubuntu/DeepLearningModel/re3-tensorflow-master’.

sys.path是一个列表,在这里的值如下:

1
2
3
4
5
6
7
8
9
10
11
['/home/ubuntu/DeepLearningModel/re3-tensorflow-master/demo', 
'/home/ubuntu/pycharm-community-2017.3.4/helpers/pydev',
'/home/ubuntu/pycharm-community-2017.3.4/helpers/pydev',
'/home/ubuntu/.PyCharmCE2017.3/system/cythonExtensions',
'/home/ubuntu/anaconda2/envs/tf15/lib/python35.zip',
'/home/ubuntu/anaconda2/envs/tf15/lib/python3.5',
'/home/ubuntu/anaconda2/envs/tf15/lib/python3.5/plat-linux',
'/home/ubuntu/anaconda2/envs/tf15/lib/python3.5/lib-dynload',
'/home/ubuntu/anaconda2/envs/tf15/lib/python3.5/site-packages',
'/home/ubuntu/anaconda2/envs/tf15/lib/python3.5/site-packages/IPython/extensions',
'/home/ubuntu/DeepLearningModel/re3-tensorflow-master']

注意最后一个元素’/home/ubuntu/DeepLearningModel/re3-tensorflow-master’是我们用sys.path.append添加进去的。

os.path.exists(os.path.join(basedir, 'data'))判断是否存在这个目录。没有的话就解压生成。注意这里的tarfile用法。

1
2
3
4
5
cv2.namedWindow('Image', cv2.WINDOW_NORMAL)
cv2.resizeWindow('Image', 640, 480)
tracker = re3_tracker.Re3Tracker()
image_paths = sorted(glob.glob(os.path.join(
os.path.dirname(__file__), 'data', '*.jpg')))

注意这里的glob.glob(),这个函数的输入是有目录信息的正则表达式(资料显示,支持 “ *, ?, []” 这三个通配符),找到该目录下对应的目录(文件)名,放入列表中。

image_paths就是按名称排序好的图片路径列表。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
initial_bbox = [175, 154, 251, 229]
tracker.track('ball', image_paths[0], initial_bbox)

for image_path in image_paths:
image = cv2.imread(image_path)
# Tracker expects RGB, but opencv loads BGR.
imageRGB = image[:,:,::-1]
bbox = tracker.track('ball', imageRGB)
cv2.rectangle(image,
(int(bbox[0]), int(bbox[1])),
(int(bbox[2]), int(bbox[3])),
[0,0,255], 2)
cv2.imshow('Image', image)
cv2.waitKey(1)

initial_bbox是demo提供的起始帧的物体位置框(ground truth box),可在demo->data->labels.txt找到每一帧的ground truth box。

参考cv2.rectangle( )[0,0,255]表示线条颜色,最后一个参数2表示画矩形框的方式,比如这个位置填-1表示将矩形纯色填充。

这段代码很简单:一帧一帧的读图片,通过tracker.track('ball', imageRGB)函数返回的bbox坐标信息。根据坐标信息在每一帧上画出矩形框。

注意对提供起始坐标的那一帧的处理方式不同:tracker.track('ball', image_paths[0], initial_bbox)

后面自动跟踪时:bbox = tracker.track('ball', imageRGB)

cv2.waitKey(1)表示图片的刷新率,参数1代表1ms,如果写0,就代表等待按键才显示下一帧。


对image_demo.py做一下优化,显示初始box位置,这样,用别的数据集进行测试能够更方便的看到初始box。

在for循环之前定义一个变量is_first_image = True,对for循环里进行如下修改:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
for image_path in image_paths:
image = cv2.imread(image_path)

# geyao add, to show initial box with different color
if is_first_image:
cv2.rectangle(image,
(int(initial_bbox[0]), int(initial_bbox[1])),
(int(initial_bbox[2]), int(initial_bbox[3])),
[255, 0, 0], 2)
cv2.imshow('Image', image)
cv2.waitKey(0)
is_first_image = False
# geyao add end

# Tracker expects RGB, but opencv loads BGR.
imageRGB = image[:,:,::-1]
bbox = tracker.track('ball', imageRGB)
cv2.rectangle(image,
(int(bbox[0]), int(bbox[1])),
(int(bbox[2]), int(bbox[3])),
[0,0,255], 2)
cv2.imshow('Image', image)
# cv2.waitKey(1) # origin
cv2.waitKey(10)

下面看一下track()函数,它在tracker/re3_tracker.py文件中定义。

re3_tracker.py

这个文件里将所有的方法都定义在Re3Tracker类中。

Re3Tracker类初始化函数

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
def __init__(self, gpu_id=GPU_ID):
os.environ['CUDA_VISIBLE_DEVICES'] = str(gpu_id)
basedir = os.path.dirname(__file__)
tf.Graph().as_default()
self.imagePlaceholder = tf.placeholder(tf.uint8, shape=(None, CROP_SIZE, CROP_SIZE, 3))
self.prevLstmState = tuple([tf.placeholder(tf.float32, shape=(None, LSTM_SIZE)) for _ in range(4)])
self.batch_size = tf.placeholder(tf.int32, shape=())
self.outputs, self.state1, self.state2 = network.inference(
self.imagePlaceholder, num_unrolls=1, batch_size=self.batch_size, train=False,
prevLstmState=self.prevLstmState)
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
self.sess = tf.Session(config=config)
ckpt = tf.train.get_checkpoint_state(os.path.join(basedir, '..', LOG_DIR, 'checkpoints'))
if ckpt is None:
raise IOError(
('Checkpoint model could not be found. '
'Did you download the pretrained weights? '
'Download them here: https://goo.gl/NWGXGM and read the Model section of the Readme.'))
tf_util.restore(self.sess, ckpt.model_checkpoint_path)

self.tracked_data = {}

self.time = 0
self.total_forward_count = -1

关于os.environ

windows:

1
2
3
4
5
6
os.environ['HOMEPATH']:当前用户主目录。
os.environ['TEMP']:临时目录路径。
os.environ['PATHEXT']:可执行文件。
os.environ['SYSTEMROOT']:系统主目录。
os.environ['LOGONSERVER']:机器名。
os.environ['PROMPT']:设置提示符。

linux:

1
2
3
4
5
os.environ['USER']:当前使用用户。
os.environ['LC_COLLATE']:路径扩展的结果排序时的字母顺序。
os.environ['SHELL']:使用shell的类型。
os.environ['LAN']:使用的语言。
os.environ['SSH_AUTH_SOCK']:ssh的执行路径。

print(os.environ['USER'])会输出当前用户名。(上面两段为网络资料)

os.environ['CUDA_VISIBLE_DEVICES'] = str(gpu_id)由于本机只有一块显卡,这里的gpu_id就是’0’。这句话就是制定使用哪块GPU(多GPU的情况有用,tensorflow默认使用所有GPU)。

注意,即使运行的是’/home/ubuntu/DeepLearningModel/re3-tensorflow-master/demo’目录下的image_demo.py. 这里通过os.path.dirname(__file__)获取到basedir仍是re3_tracker.py文件所在的位置:’/home/ubuntu/DeepLearningModel/re3-tensorflow-master/tracker’

tf.Graph().as_default()使用默认计算图。

network.py

network.pytracker目录下,分析其中的函数

inference( )函数

关于tf.variable_scope('re3', reuse=reuse),参考Sharing variables或中文版共享变量

The embedding fully-connected layer has 2048 units, and the LSTM layers have 1024 units each.

LSTM

网络采用了两层LSTM(两层是纵向)。注意这里面用的是LSTM with peephole connections(LSTM网络的变体),同时把图像特征输入到两层网络中:

Using the prior work of Greff et al. (“Lstm: A search space odyssey“), we opt for a two-layer, factored LSTM (the visual features are fed to both layers) with peephole connections.

注意第一层LSTM1的搭建,主要是以下几步:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
fc6_reshape = tf.reshape(fc6_out, tf.stack([batch_size, num_unrolls,
fc6_out.get_shape().as_list()[-1]]))
# fc6_reshape: Tensor("re3/fc6/Reshape:0", shape=(?, 1, 2048), dtype=float32)

lstm1 = CaffeLSTMCell(LSTM_SIZE, initializer=msra_initializer)
# lstm1: (num_units: 1024, output_size: 1024, state_size:{c=1024, h=1024})

state1 = tf.contrib.rnn.LSTMStateTuple(prevLstmState[0], prevLstmState[1])
# state1: LSTMStateTuple(c=<tf.Tensor 'Placeholder_1:0' shape=(?, 1024) dtype=float32>,
# h=<tf.Tensor 'Placeholder_2:0' shape=(?, 1024) dtype=float32>)

lstm1_outputs, state1 = tf.nn.dynamic_rnn(lstm1, fc6_reshape, initial_state=state1,
swap_memory=swap_memory)
# lstm1_outputs: Tensor("re3/lstm1/rnn/transpose_1:0", shape=(?, 1, 1024), dtype=float32)
# state1: LSTMStateTuple(c=<tf.Tensor 're3/lstm1/rnn/while/Exit_3:0' shape=(?, 1024) dtype=float32>,
# h=<tf.Tensor 're3/lstm1/rnn/while/Exit_4:0' shape=(?, 1024) dtype=float32>)

详细参考tf.nn.dynamic_rnn( ). 简要说明一下参数:

1
2
3
4
5
6
7
8
9
10
11
tf.nn.dynamic_rnn(
cell,
inputs,
sequence_length=None,
initial_state=None,
dtype=None,
parallel_iterations=None,
swap_memory=False,
time_major=False,
scope=None
)

第二层LSTM2的搭建,主要是以下几步:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
lstm2 = CaffeLSTMCell(LSTM_SIZE, initializer=msra_initializer)
# lstm2: (num_units: 1024, output_size: 1024, state_size:{c=1024, h=1024})

state2 = tf.contrib.rnn.LSTMStateTuple(prevLstmState[2], prevLstmState[3])
# state2: LSTMStateTuple(c=<tf.Tensor 'Placeholder_3:0' shape=(?, 1024) dtype=float32>,
# h=<tf.Tensor 'Placeholder_4:0' shape=(?, 1024) dtype=float32>)

lstm2_inputs = tf.concat([fc6_reshape, lstm1_outputs], 2)
# lstm2_inputs: Tensor("re3/lstm2/concat:0", shape=(?, 1, 3072), dtype=float32)
lstm2_outputs, state2 = tf.nn.dynamic_rnn(lstm2, lstm2_inputs, initial_state=state2,
swap_memory=swap_memory)
# lstm2_outputs: Tensor("re3/lstm2/rnn/transpose_1:0", shape=(?, 1, 1024), dtype=float32)
# state2: LSTMStateTuple(c=<tf.Tensor 're3/lstm2/rnn/while/Exit_3:0' shape=(?, 1024) dtype=float32>,
# h=<tf.Tensor 're3/lstm2/rnn/while/Exit_4:0' shape=(?, 1024) dtype=float32>)

fc6_reshape是卷积网络的输出。

lstm2_inputs = tf.concat([fc6_reshape, lstm1_outputs], 2),把图像特征结合第一层的输出一起输入到第二层网络中。

可以看出LSTM1中的input是fc6_reshape, shape=(?, 1, 2048),LSTM2中的input是lstm2_inputs, shape=(?, 1, 3072)

cell中每一个处理单元代表一个前馈网络层(共4个处理单元,因为有4个不同的权重矩阵,他们的维度都相同),就是经典的神经网络的结构,num_units就是每一层的隐藏神经元个数

参考资料[1]里例子:

1
2
3
4
5
6
7
8
9
10
11
12
13
import tensorflow as tf
import numpy as np
lstm_cell = tf.nn.rnn_cell.BasicLSTMCell(num_units=128)
print("output_size:",lstm_cell.output_size)
print("state_size:",lstm_cell.state_size)
print(lstm_cell.state_size.h)
print(lstm_cell.state_size.c)

Out:
output_size: 128
state_size: LSTMStateTuple(c=128, h=128)
128
128

LSTMStateTuple(c=128, h=128)中的c表示cell state,是$c^{\langle t \rangle}$,h表示hidden state,就是$a^{\langle t \rangle}$,所以$a^{\langle t \rangle}$和$c^{\langle t \rangle}$的维度相同。

查看 hidden state values,参考:How to extract the cell state and hidden state from an RNN model in tensorflow?

sess.run( )的返回值,参考:tf.Session.run(fetches, feed_dict=None),其中关键的一句话:

The value returned by run() has the same shape as the fetches argument, where the leaves are replaced by the corresponding values returned by TensorFlow.

即返回值是fetches的值

1
rawOutput, s1, s2 = self.sess.run([self.outputs, self.state1, self.state2], feed_dict=feed_dict)

根据上面代码,通过Debug,发现两层LSTM的两个state(s1、s2)中的$c^{\langle t \rangle}$和$a^{\langle t \rangle}$都是(1, 1024)的向量,但是网络的输出rawOutput是4个值(经过下面的转换变成两个点的坐标)。LSTM cell输出的$y^{\langle t \rangle}$和$a^{\langle t \rangle}$维度相同,为什么维度变成(1, 4)了呢,再看一下文章,发现这句话:

The second LSTM’s outputs are fed into a fully-connected layer with four output values representing the top left and bottom right corners of the object box in the crop coordinate frame, as is done in [18].

LSTM的输出后面还有一个FC层,这个FC层的输出是(1, 4)大小。

tf.get_variable( ): Gets an existing variable with these parameters or create a new one.

注意测试时的feed_dict:

1
2
3
4
5
feed_dict = {
self.imagePlaceholder : [croppedInput0, croppedInput1],
self.prevLstmState : lstmState,
self.batch_size : 1,
}

把前后两帧的crop一起输入网络中。所以在inference()函数开始的注释中说明了:

Data should be in order BxTx2xHxWxC where T is the number of unrolls.

After every 32 iterations, we reset the LSTM state. This is necessary because we train on sequences with a maximum length of 32 frames, and without this reset, the LSTM parameters tend to diverge from values the network has seen before. Rather than resetting the LSTM state to all zeros, we use the output from the first forward pass.

网络流程

1. 提取ROI

Re3Tracker类中,主要通过下面的函数提取ROI信息:

1
im_util.get_cropped_input(prevImage, pastBBox, CROP_PAD, CROP_SIZE)

1.1 对第一帧及输入box处理

对于第一帧,prevImage, pastBBox就是当前的image, box。这里CROP_PAD = 2,CROP_SIZE = 227和论文里一样:

The crops are each padded to be twice the size of the object’s bounding box to provide the network with context.
The crops are warped to be 227 × 227 pixels before being input into the network.

网络训练

  1. 改变batch和sequence length
  2. 增加数据集的遮挡
  3. We initially only use the ground truth crops, and as we double the number of unrolls, we increase the probability of using predicted crops to first 0.25, then subsequently 0.5 and 0.75.

网络测试

32帧之后状态清零:

After every 32 iterations, we reset the LSTM state. This is necessary because we train on sequences with a maximum length of 32 frames, and without this reset, the LSTM parameters tend to diverge from values the network has seen before. Rather than resetting the LSTM state to all zeros, we use the output from the first forward pass. This maintains an encoding of the tracked object, while allowing us to test on sequences much longer than the number of training unrolls.

MOT数据集的这个例子可以清楚地看到没有32帧状态清零时候,网络的运行状况会变得很糟糕:MOT/MOT16/train/MOT16-11/img1(在修改之后的代码中就是demo = 3的情况)。

使用TensorBoard查看网络和变量

1. 查看Graph

网络是在Re3Tracker类中的构造函数中定义的,所以只需要在构造函数最后(网络搭建之后),加上一句:

1
tf.summary.FileWriter(path, self.sess.graph)

在TensorBoard中可以看到graph:

2. 查看LSTM state

network()函数中定义state1变量的地方(with tf.variable_scope('lstm1'):下面)加上:

1
2
tf.summary.histogram('c', state1[0])
tf.summary.histogram('h', state1[1])

state2也一样。

Re3Tracker类的构造函数中加上:

1
2
self.merged = tf.summary.merge_all()
self.file_writer = tf.summary.FileWriter('../tensorboard_log/state')

还要不停存储更新的变量,将Re3Tracker类的track()方法中的run修改一下,再加上add_summary(...)

1
2
3
4
5
6
# rawOutput, s1, s2 = self.sess.run([self.outputs, self.state1, self.state2],
# feed_dict=feed_dict)
merged_varibles, rawOutput, s1, s2 = self.sess.run([self.merged, self.outputs,
self.state1, self.state2],
feed_dict=feed_dict)
self.file_writer.add_summary(merged_varibles, self.total_forward_count+2)

参考资料

  1. TensorFlow学习(十三):构造LSTM超长简明教程

----------over----------


文章标题:分析Re3的image_demo.py

文章作者:Ge垚

发布时间:2018年07月04日 - 16:07

最后更新:2018年09月26日 - 23:09

原始链接:http://geyao1995.com/Re3_image_demo/

许可协议: 署名-非商业性使用-禁止演绎 4.0 国际 转载请保留原文链接及作者。