Tensorflow 与 PyTorch 的 LSTM的权值排布差异

admin

July 6, 2018, 8 p.m.

Tensorflow:

# i = input_gate, j = new_input, f = forget_gate, o = output_gate
i, j, f, o = array_ops.split(
value=lstm_matrix, num_or_size_splits=4, axis=1)

参考：
https://github.com/tensorflow/tensorflow/blob/r1.8/tensorflow/python/ops/rnn_cell_impl.py#L836

PyTorch


    Attributes:
        weight_ih_l[k] : the learnable input-hidden weights of the :math:`\text{k}^{th}` layer
            `(W_ii|W_if|W_ig|W_io)`, of shape `(4*hidden_size x input_size)`
        weight_hh_l[k] : the learnable hidden-hidden weights of the :math:`\text{k}^{th}` layer
            `(W_hi|W_hf|W_hg|W_ho)`, of shape `(4*hidden_size x hidden_size)`
        bias_ih_l[k] : the learnable input-hidden bias of the :math:`\text{k}^{th}` layer
            `(b_ii|b_if|b_ig|b_io)`, of shape `(4*hidden_size)`
        bias_hh_l[k] : the learnable hidden-hidden bias of the :math:`\text{k}^{th}` layer
            `(b_hi|b_hf|b_hg|b_ho)`, of shape `(4*hidden_size)`

参考：
https://pytorch.org/docs/stable/_modules/torch/nn/modules/rnn.html#LSTMCell

可以看到forget gate 和 new memory gate 排布相反

Tensorflow 与 PyTorch 的 LSTM的权值排布差异

深度神经网络有感

Tensorflow 直接对于图片进行3通道卷积