LSTM equations with minibatches Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern) 2019 Moderator Election Q&A - Questionnaire 2019 Community Moderator Election ResultsTensorFlow and Categorical variablesKeras LSTM: use weights from Keras model to replicate predictions using numpyValueError: Error when checking target: expected dense_2 to have shape (1,) but got array with shape (0,)Reshaping big dataset with MinMaxScaler giving errorHow to define the shape of hidden and meory state in Numpy, keras?What does GlobalMaxPooling1D() do to output of LSTM unit in Keras?Understanding LSTM input shape for kerasLSTM Long Term Dependencies KerasUnderstanding LSTM structure3 dimensional array as input with Embedding Layer and LSTM in Keras
How do I overlay a PNG over two videos (one video overlays another) in one command using FFmpeg?
Can I take recommendation from someone I met at a conference?
Is it OK if I do not take the receipt in Germany?
2 sample t test for sample sizes - 30,000 and 150,000
tabularx column has extra padding at right?
What's the difference between using dependency injection with a container and using a service locator?
/bin/ls sorts differently than just ls
How is an IPA symbol that lacks a name (e.g. ɲ) called?
Is Bran literally the world's memory?
Is there a verb for listening stealthily?
Would I be safe to drive a 23 year old truck for 7 hours / 450 miles?
"Destructive force" carried by a B-52?
How to create a command for the "strange m" symbol in latex?
lm and glm function in R
Do chord progressions usually move by fifths?
Lights are flickering on and off after accidentally bumping into light switch
Putting Ant-Man on house arrest
Who can become a wight?
Why not use the yoke to control yaw, as well as pitch and roll?
Can gravitational waves pass through a black hole?
Why did Bronn offer to be Tyrion Lannister's champion in trial by combat?
How to get a single big right brace?
Why aren't these two solutions equivalent? Combinatorics problem
“Since the train was delayed for more than an hour, passengers were given a full refund.” – Why is there no article before “passengers”?
LSTM equations with minibatches
Announcing the arrival of Valued Associate #679: Cesar Manara
Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern)
2019 Moderator Election Q&A - Questionnaire
2019 Community Moderator Election ResultsTensorFlow and Categorical variablesKeras LSTM: use weights from Keras model to replicate predictions using numpyValueError: Error when checking target: expected dense_2 to have shape (1,) but got array with shape (0,)Reshaping big dataset with MinMaxScaler giving errorHow to define the shape of hidden and meory state in Numpy, keras?What does GlobalMaxPooling1D() do to output of LSTM unit in Keras?Understanding LSTM input shape for kerasLSTM Long Term Dependencies KerasUnderstanding LSTM structure3 dimensional array as input with Embedding Layer and LSTM in Keras
$begingroup$
I'm looking at the code behind a Keras LSTM, and I noticed something I find odd.
Suppose we're feeding in input of size (batch_size, time_steps, input_dim)
where:
batch_size
is the number of examples in the minibatch,time_steps
is the number of time steps to look back (i.e the window size)input_dim
is the number of input variables.
Suppose we want to input minibatches of size batch_size
=5, time_steps
=2, and input_dim
=1.
That is, we have a univariate time series with five examples per minibatch and we use the previous two values of the time series to predict the next value.
Say we want to accomplish this using an LSTM with size 3 (that is, this is the number of hidden units and the number of output units). Call this size units
.
In lines 1871-1995 of the code here, the LSTM is being built. The equation I'm confused about is on lines 1989-1990, which corresponds to the calculation of the input gate:
i = self.recurrent_activation(x_i + K.dot(h_tm1_i, self.recurrent_kernel_i))
The variable x_i
is calculated as:
x_i = K.dot(inputs_i, self.kernel_i)
where
inputs_i
= the input at time i, which has shape(batch_size, time_steps, input_dim)
, in our case would have shape(5, 2, 1)
self.kernel_i
= the weight matrix that is multiplied by the current input at time t, and has shape (input_dim, units), in our case would have shape(1, 3)
I understand this dot product is accomplished via broadcasting, and the final shape is (batch_size, time_steps, units)
, which in our case is (5, 2, 3)
.
Now let's examine the dot product:
K.dot(h_tm1_i, self.recurrent_kernel_i))
where
h_tm1_i
= the recurrent hidden state at timei
, and has shape equal to(batch_size, units)
according to line 295. In our case, that's(5, 3)
self.recurrent_kernel
= the weight matrix that is multiplied by the previous hidden state, and has shape(units, units)
, which in our case would be(3, 3)
.
Somehow via the magic of broadcasting, the dot product gives something with shape (5, 3)
We're left with needing to add x_i
and K.dot(h_tm1_i, self.recurrent_kernel_i))
, which have shapes (5, 2, 3)
and (5, 3)
respectively. When I try to do that myself in tensorflow, I get an error:
ValueError: Dimensions must be equal, but are 2 and 5 for 'add_1' (op: 'Add') with input shapes: [5,2,3], [5,3].
Clearly I've done something wrong somewhere, but I can't see my logic error. Can anyone help?
EDIT: To reproduce the error:
>>> import tensorflow as tf
>>> import keras
>>> from keras import backend as K
>>> inputs_i = tf.ones([5, 2, 1])
>>> kernel_i = tf.ones([1,3])
>>> h_tm1_i = tf.ones([5,3])
>>> rec_i = tf.ones([3,3])
>>> x_i = K.dot(inputs_i, kernel_i)
>>> x_i
<tf.Tensor 'Reshape_9:0' shape=(5, 2, 3) dtype=float32>
>>> K.dot(h_tm1_i, rec_i)
<tf.Tensor 'MatMul_4:0' shape=(5, 3) dtype=float32>
>>> x_i + K.dot(h_tm1, rec_i) #Raises ValueError
python neural-network deep-learning tensorflow lstm
$endgroup$
add a comment |
$begingroup$
I'm looking at the code behind a Keras LSTM, and I noticed something I find odd.
Suppose we're feeding in input of size (batch_size, time_steps, input_dim)
where:
batch_size
is the number of examples in the minibatch,time_steps
is the number of time steps to look back (i.e the window size)input_dim
is the number of input variables.
Suppose we want to input minibatches of size batch_size
=5, time_steps
=2, and input_dim
=1.
That is, we have a univariate time series with five examples per minibatch and we use the previous two values of the time series to predict the next value.
Say we want to accomplish this using an LSTM with size 3 (that is, this is the number of hidden units and the number of output units). Call this size units
.
In lines 1871-1995 of the code here, the LSTM is being built. The equation I'm confused about is on lines 1989-1990, which corresponds to the calculation of the input gate:
i = self.recurrent_activation(x_i + K.dot(h_tm1_i, self.recurrent_kernel_i))
The variable x_i
is calculated as:
x_i = K.dot(inputs_i, self.kernel_i)
where
inputs_i
= the input at time i, which has shape(batch_size, time_steps, input_dim)
, in our case would have shape(5, 2, 1)
self.kernel_i
= the weight matrix that is multiplied by the current input at time t, and has shape (input_dim, units), in our case would have shape(1, 3)
I understand this dot product is accomplished via broadcasting, and the final shape is (batch_size, time_steps, units)
, which in our case is (5, 2, 3)
.
Now let's examine the dot product:
K.dot(h_tm1_i, self.recurrent_kernel_i))
where
h_tm1_i
= the recurrent hidden state at timei
, and has shape equal to(batch_size, units)
according to line 295. In our case, that's(5, 3)
self.recurrent_kernel
= the weight matrix that is multiplied by the previous hidden state, and has shape(units, units)
, which in our case would be(3, 3)
.
Somehow via the magic of broadcasting, the dot product gives something with shape (5, 3)
We're left with needing to add x_i
and K.dot(h_tm1_i, self.recurrent_kernel_i))
, which have shapes (5, 2, 3)
and (5, 3)
respectively. When I try to do that myself in tensorflow, I get an error:
ValueError: Dimensions must be equal, but are 2 and 5 for 'add_1' (op: 'Add') with input shapes: [5,2,3], [5,3].
Clearly I've done something wrong somewhere, but I can't see my logic error. Can anyone help?
EDIT: To reproduce the error:
>>> import tensorflow as tf
>>> import keras
>>> from keras import backend as K
>>> inputs_i = tf.ones([5, 2, 1])
>>> kernel_i = tf.ones([1,3])
>>> h_tm1_i = tf.ones([5,3])
>>> rec_i = tf.ones([3,3])
>>> x_i = K.dot(inputs_i, kernel_i)
>>> x_i
<tf.Tensor 'Reshape_9:0' shape=(5, 2, 3) dtype=float32>
>>> K.dot(h_tm1_i, rec_i)
<tf.Tensor 'MatMul_4:0' shape=(5, 3) dtype=float32>
>>> x_i + K.dot(h_tm1, rec_i) #Raises ValueError
python neural-network deep-learning tensorflow lstm
$endgroup$
add a comment |
$begingroup$
I'm looking at the code behind a Keras LSTM, and I noticed something I find odd.
Suppose we're feeding in input of size (batch_size, time_steps, input_dim)
where:
batch_size
is the number of examples in the minibatch,time_steps
is the number of time steps to look back (i.e the window size)input_dim
is the number of input variables.
Suppose we want to input minibatches of size batch_size
=5, time_steps
=2, and input_dim
=1.
That is, we have a univariate time series with five examples per minibatch and we use the previous two values of the time series to predict the next value.
Say we want to accomplish this using an LSTM with size 3 (that is, this is the number of hidden units and the number of output units). Call this size units
.
In lines 1871-1995 of the code here, the LSTM is being built. The equation I'm confused about is on lines 1989-1990, which corresponds to the calculation of the input gate:
i = self.recurrent_activation(x_i + K.dot(h_tm1_i, self.recurrent_kernel_i))
The variable x_i
is calculated as:
x_i = K.dot(inputs_i, self.kernel_i)
where
inputs_i
= the input at time i, which has shape(batch_size, time_steps, input_dim)
, in our case would have shape(5, 2, 1)
self.kernel_i
= the weight matrix that is multiplied by the current input at time t, and has shape (input_dim, units), in our case would have shape(1, 3)
I understand this dot product is accomplished via broadcasting, and the final shape is (batch_size, time_steps, units)
, which in our case is (5, 2, 3)
.
Now let's examine the dot product:
K.dot(h_tm1_i, self.recurrent_kernel_i))
where
h_tm1_i
= the recurrent hidden state at timei
, and has shape equal to(batch_size, units)
according to line 295. In our case, that's(5, 3)
self.recurrent_kernel
= the weight matrix that is multiplied by the previous hidden state, and has shape(units, units)
, which in our case would be(3, 3)
.
Somehow via the magic of broadcasting, the dot product gives something with shape (5, 3)
We're left with needing to add x_i
and K.dot(h_tm1_i, self.recurrent_kernel_i))
, which have shapes (5, 2, 3)
and (5, 3)
respectively. When I try to do that myself in tensorflow, I get an error:
ValueError: Dimensions must be equal, but are 2 and 5 for 'add_1' (op: 'Add') with input shapes: [5,2,3], [5,3].
Clearly I've done something wrong somewhere, but I can't see my logic error. Can anyone help?
EDIT: To reproduce the error:
>>> import tensorflow as tf
>>> import keras
>>> from keras import backend as K
>>> inputs_i = tf.ones([5, 2, 1])
>>> kernel_i = tf.ones([1,3])
>>> h_tm1_i = tf.ones([5,3])
>>> rec_i = tf.ones([3,3])
>>> x_i = K.dot(inputs_i, kernel_i)
>>> x_i
<tf.Tensor 'Reshape_9:0' shape=(5, 2, 3) dtype=float32>
>>> K.dot(h_tm1_i, rec_i)
<tf.Tensor 'MatMul_4:0' shape=(5, 3) dtype=float32>
>>> x_i + K.dot(h_tm1, rec_i) #Raises ValueError
python neural-network deep-learning tensorflow lstm
$endgroup$
I'm looking at the code behind a Keras LSTM, and I noticed something I find odd.
Suppose we're feeding in input of size (batch_size, time_steps, input_dim)
where:
batch_size
is the number of examples in the minibatch,time_steps
is the number of time steps to look back (i.e the window size)input_dim
is the number of input variables.
Suppose we want to input minibatches of size batch_size
=5, time_steps
=2, and input_dim
=1.
That is, we have a univariate time series with five examples per minibatch and we use the previous two values of the time series to predict the next value.
Say we want to accomplish this using an LSTM with size 3 (that is, this is the number of hidden units and the number of output units). Call this size units
.
In lines 1871-1995 of the code here, the LSTM is being built. The equation I'm confused about is on lines 1989-1990, which corresponds to the calculation of the input gate:
i = self.recurrent_activation(x_i + K.dot(h_tm1_i, self.recurrent_kernel_i))
The variable x_i
is calculated as:
x_i = K.dot(inputs_i, self.kernel_i)
where
inputs_i
= the input at time i, which has shape(batch_size, time_steps, input_dim)
, in our case would have shape(5, 2, 1)
self.kernel_i
= the weight matrix that is multiplied by the current input at time t, and has shape (input_dim, units), in our case would have shape(1, 3)
I understand this dot product is accomplished via broadcasting, and the final shape is (batch_size, time_steps, units)
, which in our case is (5, 2, 3)
.
Now let's examine the dot product:
K.dot(h_tm1_i, self.recurrent_kernel_i))
where
h_tm1_i
= the recurrent hidden state at timei
, and has shape equal to(batch_size, units)
according to line 295. In our case, that's(5, 3)
self.recurrent_kernel
= the weight matrix that is multiplied by the previous hidden state, and has shape(units, units)
, which in our case would be(3, 3)
.
Somehow via the magic of broadcasting, the dot product gives something with shape (5, 3)
We're left with needing to add x_i
and K.dot(h_tm1_i, self.recurrent_kernel_i))
, which have shapes (5, 2, 3)
and (5, 3)
respectively. When I try to do that myself in tensorflow, I get an error:
ValueError: Dimensions must be equal, but are 2 and 5 for 'add_1' (op: 'Add') with input shapes: [5,2,3], [5,3].
Clearly I've done something wrong somewhere, but I can't see my logic error. Can anyone help?
EDIT: To reproduce the error:
>>> import tensorflow as tf
>>> import keras
>>> from keras import backend as K
>>> inputs_i = tf.ones([5, 2, 1])
>>> kernel_i = tf.ones([1,3])
>>> h_tm1_i = tf.ones([5,3])
>>> rec_i = tf.ones([3,3])
>>> x_i = K.dot(inputs_i, kernel_i)
>>> x_i
<tf.Tensor 'Reshape_9:0' shape=(5, 2, 3) dtype=float32>
>>> K.dot(h_tm1_i, rec_i)
<tf.Tensor 'MatMul_4:0' shape=(5, 3) dtype=float32>
>>> x_i + K.dot(h_tm1, rec_i) #Raises ValueError
python neural-network deep-learning tensorflow lstm
python neural-network deep-learning tensorflow lstm
edited 1 hour ago
StatsSorceress
asked 1 hour ago
StatsSorceressStatsSorceress
1,1473824
1,1473824
add a comment |
add a comment |
0
active
oldest
votes
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f49741%2flstm-equations-with-minibatches%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f49741%2flstm-equations-with-minibatches%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown