使用TensorBoard

Tensorflow Dev Summit 演示视频:https://youtu.be/eBbEDRsCmv4

官方教程:https://tensorflow.google.cn/guide/summaries_and_tensorboard

上面的教程有三部分(注意左侧栏):Visualizing Learning、Graph Visualization、Histogram Dashboard

Graph

保存用tensorboard查看graph,使用tf.summary.FileWriter('./logs/1', sess.graph)

1
2
3
4
5
6
7
8
9
10
11
12
13
model = build_rnn(len(vocab),
batch_size=batch_size,
num_steps=num_steps,
learning_rate=learning_rate,
lstm_size=lstm_size,
num_layers=num_layers)

with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
file_writer = tf.summary.FileWriter('./logs/1', sess.graph)
# 上面一句也可以写成如下两行:
file_writer = tf.summary.FileWriter('./logs/1')
file_writer.add_graph(sess.graph)

路径作为函数的一个参数。这里,想查看graph,所以另一个参数就填sess.graphtf.get_default_graph().

This will produce an event file in the current directory with a name in the following format:

1
events.out.tfevents.{timestamp}.{hostname}

使用conda的话,TensorBoard应该是随着tensorflow一起装好的,要想使用TensorBoard查看刚刚保存的graph,只需要在命令行告诉我们保存的数据文件路径:

1
tensorboard --logdir logs/1

出现提示信息:

1
2
Starting TensorBoard b'41' on port 6006
(You can navigate to http://127.0.1.1:6006)

http://127.0.1.1:6006复制到浏览器中打开,出现了TensorBoard的界面。

刚刚保存的是graph,所以这里点开上面一栏对应的GRAPHS

上图中的节点名称无法很好的区分,甚至有点混乱,可以借助name-scope使其更易懂。

Name Scopes

用法:

1
2
3
4
5
6
7
8
9
10
11
12
# Build the RNN layers
with tf.name_scope("RNN_layers"):
lstm = tf.contrib.rnn.BasicLSTMCell(lstm_size)
drop = tf.contrib.rnn.DropoutWrapper(lstm, output_keep_prob=keep_prob)
cell = tf.contrib.rnn.MultiRNNCell([drop] * num_layers)

with tf.name_scope("RNN_init_state"):
initial_state = cell.zero_state(batch_size, tf.float32)

# Run the data through the RNN layers
with tf.name_scope("RNN_forward"):
outputs, state = tf.nn.dynamic_rnn(cell, x_one_hot, initial_state=initial_state)

这时候再看看Board中的graph,就发现清晰多了:

train的时候用到所有节点的数据,所以train节点和所有节点相连。如果想让graph变得更直观一点,可以右键点击train节点,将其删除(它会跑到旁边,链接信息仍然存在,不过不是以线的形式,这个和芯片的原理图类似):

双击节点 (或点击节点右上角加号) 可以展开,查看更详细的内容:

可以看到是两层RNN(程序中用的是两层LSTM)的结构。

Variables (Weights/Bias/Costs/…)

histogram

用来查看矩阵的变化

1
2
3
4
5
6
7
8
# Now connect the RNN outputs to a softmax layer and calculate the cost
with tf.name_scope('logits'):
softmax_w = tf.Variable(tf.truncated_normal((lstm_size, num_classes), stddev=0.1),
name='softmax_w')
softmax_b = tf.Variable(tf.zeros(num_classes), name='softmax_b')
logits = tf.matmul(output, softmax_w) + softmax_b
tf.summary.histogram('softmax_w', softmax_w)
tf.summary.histogram('softmax_b', softmax_b)

scalar

用来查看标量的变化

1
2
3
4
with tf.name_scope('cost'):
loss = tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=y_reshaped, name='loss')
cost = tf.reduce_mean(loss, name='cost')
tf.summary.scalar('cost', cost)

merge

将histogram和scalar放到一个节点中

1
merged = tf.summary.merge_all()

save operating in different dir

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
train_writer = tf.summary.FileWriter('./logs/2/train', sess.graph)
test_writer = tf.summary.FileWriter('./logs/2/test')

# Use the line below to load a checkpoint and resume training
#saver.restore(sess, 'checkpoints/anna20.ckpt')

n_batches = int(train_x.shape[1]/num_steps)
iterations = n_batches * epochs
for e in range(epochs):

# Train network
new_state = sess.run(model.initial_state)
loss = 0
for b, (x, y) in enumerate(get_batch([train_x, train_y], num_steps), 1):
iteration = e*n_batches + b
start = time.time()
feed = {model.inputs: x,
model.targets: y,
model.keep_prob: 0.5,
model.initial_state: new_state}
summary, batch_loss, new_state, _ = sess.run([model.merged, model.cost,
model.final_state,
model.optimizer],
feed_dict=feed)
loss += batch_loss
end = time.time()
print('Epoch {}/{} '.format(e+1, epochs),
'Iteration {}/{}'.format(iteration, iterations),
'Training loss: {:.4f}'.format(loss/b),
'{:.4f} sec/batch'.format((end-start)))

train_writer.add_summary(summary, iteration)

if (iteration%save_every_n == 0) or (iteration == iterations):
# Check performance, notice dropout has been set to 1
val_loss = []
new_state = sess.run(model.initial_state)
for x, y in get_batch([val_x, val_y], num_steps):
feed = {model.inputs: x,
model.targets: y,
model.keep_prob: 1.,
model.initial_state: new_state}
summary, batch_loss, new_state = sess.run([model.merged, model.cost,
model.final_state],
feed_dict=feed)
val_loss.append(batch_loss)

test_writer.add_summary(summary, iteration)

print('Validation loss:', np.mean(val_loss),
'Saving checkpoint!')
#saver.save(sess, "checkpoints/anna/i{}_l{}_{:.3f}.ckpt".format(iteration, lstm_size, np.mean(val_loss)))

注意其中的三个board的关键操作:

  1. 创建两个FileWriter对象,分别用来保存训练和测试数据:

    1
    2
    3
    4
    ...
    train_writer = tf.summary.FileWriter('./logs/2/train', sess.graph)
    test_writer = tf.summary.FileWriter('./logs/2/test')
    ...
  2. 保存训练过程每一次迭代的数据:

    1
    2
    3
    4
    5
    6
    7
    ...
    summary, batch_loss, new_state, _ = sess.run([model.merged,
    model.cost,model.final_state,
    mmodel.optimizer], feed_dict=feed)
    ...
    train_writer.add_summary(summary, iteration)
    ...
  3. 保存测试过程每一次迭代的数据

    1
    2
    3
    4
    5
    6
    7
    ...
    summary, batch_loss, new_state = sess.run([model.merged, model.cost,
    model.final_state],
    feed_dict=feed)
    ...
    test_writer.add_summary(summary, iteration)
    ...

summary

用tensorboard查看变量的步骤:

  1. 在run之前,用tf.summary.histogram(...)记录矩阵或tf.summary.scalar(...)记录标量。

  2. merged = tf.summary.merge_all()将之前需要查看的内容放入merged中,不需要参数。

  3. file_writer = tf.summary.FileWriter(path)创建log对象,指定log文件保存路径。

  4. 在run之后(变量的值得到了更新),写入log中(通常放在循环里):

    1
    2
    3
    4
    5
    6
    for i in range(2001):
    batch = mnist.train.next_batch(100)
    if i % 5 == 0
    s = sess.run(merged, feed_dict={x: batch[0],
    y: batch[1]})
    file_writer.add_summary(s, i)

tips:

merge()也可以这样写,代替了merge_all()

1
2
3
4
loss_summary_op = tf.summary.merge([
tf.summary.scalar('loss', tfLoss),
tf.summary.scalar('l2_regularizer', tfLossFull - tfLoss),
])

指定端口

指定端口

1
tensorboard --logdir path --port=6007

----------over----------


文章标题:使用TensorBoard

文章作者:Ge垚

发布时间:2018年08月03日 - 13:08

最后更新:2018年11月07日 - 23:11

原始链接:http://geyao1995.com/使用TensorBoard/

许可协议: 署名-非商业性使用-禁止演绎 4.0 国际 转载请保留原文链接及作者。