os.path.dirname(__file__)是获取当前python脚本的绝对路径，这里的basedir就’/home/ubuntu/DeepLearningModel/re3-tensorflow-master/demo’.

os.path.join()是路径的拼接，os.path.pardir代表父目录(其实就是str类型的’..’，代表上一级目录)。os.path.abspath只是简单地删除...类似的东西，它会将’/home/ubuntu/DeepLearningModel/re3-tensorflow-master/demo/..’变成’/home/ubuntu/DeepLearningModel/re3-tensorflow-master’.

sys.path是一个列表，在这里的值如下：

os.path.exists(os.path.join(basedir, 'data'))判断是否存在这个目录。没有的话就解压生成。注意这里的tarfile用法。

image_paths就是按名称排序好的图片路径列表。

initial_bbox是demo提供的起始帧的物体位置框（ground truth box），可在demo->data->labels.txt找到每一帧的ground truth box。

cv2.waitKey(1)表示图片的刷新率，参数1代表1ms，如果写0，就代表等待按键才显示下一帧。

windows：

linux：

print(os.environ['USER'])会输出当前用户名。（上面两段为网络资料）

os.environ['CUDA_VISIBLE_DEVICES'] = str(gpu_id)由于本机只有一块显卡，这里的gpu_id就是’0’。这句话就是制定使用哪块GPU（多GPU的情况有用，tensorflow默认使用所有GPU）。

tf.Graph().as_default()使用默认计算图。

The embedding fully-connected layer has 2048 units, and the LSTM layers have 1024 units each.

### LSTM

Using the prior work of Greff et al. (“Lstm: A search space odyssey“), we opt for a two-layer, factored LSTM (the visual features are fed to both layers) with peephole connections.

fc6_reshape是卷积网络的输出。

lstm2_inputs = tf.concat([fc6_reshape, lstm1_outputs], 2)，把图像特征结合第一层的输出一起输入到第二层网络中。

cell中每一个处理单元代表一个前馈网络层（共4个处理单元，因为有4个不同的权重矩阵，他们的维度都相同），就是经典的神经网络的结构，num_units就是每一层的隐藏神经元个数

LSTMStateTuple(c=128, h=128)中的c表示cell state，是$c^{\langle t \rangle}$，h表示hidden state，就是$a^{\langle t \rangle}$，所以$a^{\langle t \rangle}$和$c^{\langle t \rangle}$的维度相同。

sess.run( )的返回值，参考：tf.Session.run(fetches, feed_dict=None)，其中关键的一句话：

The value returned by run() has the same shape as the fetches argument, where the leaves are replaced by the corresponding values returned by TensorFlow.

The second LSTM’s outputs are fed into a fully-connected layer with four output values representing the top left and bottom right corners of the object box in the crop coordinate frame, as is done in [18].

LSTM的输出后面还有一个FC层，这个FC层的输出是（1, 4）大小。

tf.get_variable( ): Gets an existing variable with these parameters or create a new one.

Data should be in order BxTx2xHxWxC where T is the number of unrolls.

After every 32 iterations, we reset the LSTM state. This is necessary because we train on sequences with a maximum length of 32 frames, and without this reset, the LSTM parameters tend to diverge from values the network has seen before. Rather than resetting the LSTM state to all zeros, we use the output from the first forward pass.

# 网络流程

Re3Tracker类中，主要通过下面的函数提取ROI信息：

The crops are each padded to be twice the size of the object’s bounding box to provide the network with context.
The crops are warped to be 227 × 227 pixels before being input into the network.

## 网络训练

1. 改变batch和sequence length
2. 增加数据集的遮挡
3. We initially only use the ground truth crops, and as we double the number of unrolls, we increase the probability of using predicted crops to first 0.25, then subsequently 0.5 and 0.75.

## 网络测试

32帧之后状态清零：

After every 32 iterations, we reset the LSTM state. This is necessary because we train on sequences with a maximum length of 32 frames, and without this reset, the LSTM parameters tend to diverge from values the network has seen before. Rather than resetting the LSTM state to all zeros, we use the output from the first forward pass. This maintains an encoding of the tracked object, while allowing us to test on sequences much longer than the number of training unrolls.

MOT数据集的这个例子可以清楚地看到没有32帧状态清零时候，网络的运行状况会变得很糟糕：MOT/MOT16/train/MOT16-11/img1（在修改之后的代码中就是demo = 3的情况）。

network()函数中定义state1变量的地方（with tf.variable_scope('lstm1'):下面）加上：

state2也一样。

Re3Tracker类的构造函数中加上：

