LSTM equations with minibatches Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern) 2019 Moderator Election Q&A - Questionnaire 2019 Community Moderator Election ResultsTensorFlow and Categorical variablesKeras LSTM: use weights from Keras model to replicate predictions using numpyValueError: Error when checking target: expected dense_2 to have shape (1,) but got array with shape (0,)Reshaping big dataset with MinMaxScaler giving errorHow to define the shape of hidden and meory state in Numpy, keras?What does GlobalMaxPooling1D() do to output of LSTM unit in Keras?Understanding LSTM input shape for kerasLSTM Long Term Dependencies KerasUnderstanding LSTM structure3 dimensional array as input with Embedding Layer and LSTM in Keras

How do I overlay a PNG over two videos (one video overlays another) in one command using FFmpeg?

Can I take recommendation from someone I met at a conference?

Is it OK if I do not take the receipt in Germany?

2 sample t test for sample sizes - 30,000 and 150,000

tabularx column has extra padding at right?

What's the difference between using dependency injection with a container and using a service locator?

/bin/ls sorts differently than just ls

How is an IPA symbol that lacks a name (e.g. ɲ) called?

Is Bran literally the world's memory?

Is there a verb for listening stealthily?

Would I be safe to drive a 23 year old truck for 7 hours / 450 miles?

"Destructive force" carried by a B-52?

How to create a command for the "strange m" symbol in latex?

lm and glm function in R

Do chord progressions usually move by fifths?

Lights are flickering on and off after accidentally bumping into light switch

Putting Ant-Man on house arrest

Who can become a wight?

Why not use the yoke to control yaw, as well as pitch and roll?

Can gravitational waves pass through a black hole?

Why did Bronn offer to be Tyrion Lannister's champion in trial by combat?

How to get a single big right brace?

Why aren't these two solutions equivalent? Combinatorics problem

“Since the train was delayed for more than an hour, passengers were given a full refund.” – Why is there no article before “passengers”?



LSTM equations with minibatches



Announcing the arrival of Valued Associate #679: Cesar Manara
Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern)
2019 Moderator Election Q&A - Questionnaire
2019 Community Moderator Election ResultsTensorFlow and Categorical variablesKeras LSTM: use weights from Keras model to replicate predictions using numpyValueError: Error when checking target: expected dense_2 to have shape (1,) but got array with shape (0,)Reshaping big dataset with MinMaxScaler giving errorHow to define the shape of hidden and meory state in Numpy, keras?What does GlobalMaxPooling1D() do to output of LSTM unit in Keras?Understanding LSTM input shape for kerasLSTM Long Term Dependencies KerasUnderstanding LSTM structure3 dimensional array as input with Embedding Layer and LSTM in Keras










0












$begingroup$


I'm looking at the code behind a Keras LSTM, and I noticed something I find odd.



Suppose we're feeding in input of size (batch_size, time_steps, input_dim)
where:




  • batch_size is the number of examples in the minibatch,


  • time_steps is the number of time steps to look back (i.e the window size)


  • input_dim is the number of input variables.

Suppose we want to input minibatches of size batch_size=5, time_steps=2, and input_dim=1.



That is, we have a univariate time series with five examples per minibatch and we use the previous two values of the time series to predict the next value.



Say we want to accomplish this using an LSTM with size 3 (that is, this is the number of hidden units and the number of output units). Call this size units.



In lines 1871-1995 of the code here, the LSTM is being built. The equation I'm confused about is on lines 1989-1990, which corresponds to the calculation of the input gate:



i = self.recurrent_activation(x_i + K.dot(h_tm1_i, self.recurrent_kernel_i))


The variable x_i is calculated as:



x_i = K.dot(inputs_i, self.kernel_i)


where




  • inputs_i = the input at time i, which has shape (batch_size, time_steps, input_dim), in our case would have shape (5, 2, 1)


  • self.kernel_i = the weight matrix that is multiplied by the current input at time t, and has shape (input_dim, units), in our case would have shape (1, 3)

I understand this dot product is accomplished via broadcasting, and the final shape is (batch_size, time_steps, units), which in our case is (5, 2, 3).



Now let's examine the dot product:



K.dot(h_tm1_i, self.recurrent_kernel_i))


where



  • h_tm1_i = the recurrent hidden state at time i, and has shape equal to (batch_size, units) according to line 295. In our case, that's (5, 3)


  • self.recurrent_kernel = the weight matrix that is multiplied by the previous hidden state, and has shape (units, units), which in our case would be (3, 3).


Somehow via the magic of broadcasting, the dot product gives something with shape (5, 3)



We're left with needing to add x_i and K.dot(h_tm1_i, self.recurrent_kernel_i)), which have shapes (5, 2, 3) and (5, 3) respectively. When I try to do that myself in tensorflow, I get an error:



ValueError: Dimensions must be equal, but are 2 and 5 for 'add_1' (op: 'Add') with input shapes: [5,2,3], [5,3].


Clearly I've done something wrong somewhere, but I can't see my logic error. Can anyone help?



EDIT: To reproduce the error:



>>> import tensorflow as tf
>>> import keras
>>> from keras import backend as K
>>> inputs_i = tf.ones([5, 2, 1])
>>> kernel_i = tf.ones([1,3])
>>> h_tm1_i = tf.ones([5,3])
>>> rec_i = tf.ones([3,3])
>>> x_i = K.dot(inputs_i, kernel_i)
>>> x_i
<tf.Tensor 'Reshape_9:0' shape=(5, 2, 3) dtype=float32>
>>> K.dot(h_tm1_i, rec_i)
<tf.Tensor 'MatMul_4:0' shape=(5, 3) dtype=float32>
>>> x_i + K.dot(h_tm1, rec_i) #Raises ValueError









share|improve this question











$endgroup$
















    0












    $begingroup$


    I'm looking at the code behind a Keras LSTM, and I noticed something I find odd.



    Suppose we're feeding in input of size (batch_size, time_steps, input_dim)
    where:




    • batch_size is the number of examples in the minibatch,


    • time_steps is the number of time steps to look back (i.e the window size)


    • input_dim is the number of input variables.

    Suppose we want to input minibatches of size batch_size=5, time_steps=2, and input_dim=1.



    That is, we have a univariate time series with five examples per minibatch and we use the previous two values of the time series to predict the next value.



    Say we want to accomplish this using an LSTM with size 3 (that is, this is the number of hidden units and the number of output units). Call this size units.



    In lines 1871-1995 of the code here, the LSTM is being built. The equation I'm confused about is on lines 1989-1990, which corresponds to the calculation of the input gate:



    i = self.recurrent_activation(x_i + K.dot(h_tm1_i, self.recurrent_kernel_i))


    The variable x_i is calculated as:



    x_i = K.dot(inputs_i, self.kernel_i)


    where




    • inputs_i = the input at time i, which has shape (batch_size, time_steps, input_dim), in our case would have shape (5, 2, 1)


    • self.kernel_i = the weight matrix that is multiplied by the current input at time t, and has shape (input_dim, units), in our case would have shape (1, 3)

    I understand this dot product is accomplished via broadcasting, and the final shape is (batch_size, time_steps, units), which in our case is (5, 2, 3).



    Now let's examine the dot product:



    K.dot(h_tm1_i, self.recurrent_kernel_i))


    where



    • h_tm1_i = the recurrent hidden state at time i, and has shape equal to (batch_size, units) according to line 295. In our case, that's (5, 3)


    • self.recurrent_kernel = the weight matrix that is multiplied by the previous hidden state, and has shape (units, units), which in our case would be (3, 3).


    Somehow via the magic of broadcasting, the dot product gives something with shape (5, 3)



    We're left with needing to add x_i and K.dot(h_tm1_i, self.recurrent_kernel_i)), which have shapes (5, 2, 3) and (5, 3) respectively. When I try to do that myself in tensorflow, I get an error:



    ValueError: Dimensions must be equal, but are 2 and 5 for 'add_1' (op: 'Add') with input shapes: [5,2,3], [5,3].


    Clearly I've done something wrong somewhere, but I can't see my logic error. Can anyone help?



    EDIT: To reproduce the error:



    >>> import tensorflow as tf
    >>> import keras
    >>> from keras import backend as K
    >>> inputs_i = tf.ones([5, 2, 1])
    >>> kernel_i = tf.ones([1,3])
    >>> h_tm1_i = tf.ones([5,3])
    >>> rec_i = tf.ones([3,3])
    >>> x_i = K.dot(inputs_i, kernel_i)
    >>> x_i
    <tf.Tensor 'Reshape_9:0' shape=(5, 2, 3) dtype=float32>
    >>> K.dot(h_tm1_i, rec_i)
    <tf.Tensor 'MatMul_4:0' shape=(5, 3) dtype=float32>
    >>> x_i + K.dot(h_tm1, rec_i) #Raises ValueError









    share|improve this question











    $endgroup$














      0












      0








      0





      $begingroup$


      I'm looking at the code behind a Keras LSTM, and I noticed something I find odd.



      Suppose we're feeding in input of size (batch_size, time_steps, input_dim)
      where:




      • batch_size is the number of examples in the minibatch,


      • time_steps is the number of time steps to look back (i.e the window size)


      • input_dim is the number of input variables.

      Suppose we want to input minibatches of size batch_size=5, time_steps=2, and input_dim=1.



      That is, we have a univariate time series with five examples per minibatch and we use the previous two values of the time series to predict the next value.



      Say we want to accomplish this using an LSTM with size 3 (that is, this is the number of hidden units and the number of output units). Call this size units.



      In lines 1871-1995 of the code here, the LSTM is being built. The equation I'm confused about is on lines 1989-1990, which corresponds to the calculation of the input gate:



      i = self.recurrent_activation(x_i + K.dot(h_tm1_i, self.recurrent_kernel_i))


      The variable x_i is calculated as:



      x_i = K.dot(inputs_i, self.kernel_i)


      where




      • inputs_i = the input at time i, which has shape (batch_size, time_steps, input_dim), in our case would have shape (5, 2, 1)


      • self.kernel_i = the weight matrix that is multiplied by the current input at time t, and has shape (input_dim, units), in our case would have shape (1, 3)

      I understand this dot product is accomplished via broadcasting, and the final shape is (batch_size, time_steps, units), which in our case is (5, 2, 3).



      Now let's examine the dot product:



      K.dot(h_tm1_i, self.recurrent_kernel_i))


      where



      • h_tm1_i = the recurrent hidden state at time i, and has shape equal to (batch_size, units) according to line 295. In our case, that's (5, 3)


      • self.recurrent_kernel = the weight matrix that is multiplied by the previous hidden state, and has shape (units, units), which in our case would be (3, 3).


      Somehow via the magic of broadcasting, the dot product gives something with shape (5, 3)



      We're left with needing to add x_i and K.dot(h_tm1_i, self.recurrent_kernel_i)), which have shapes (5, 2, 3) and (5, 3) respectively. When I try to do that myself in tensorflow, I get an error:



      ValueError: Dimensions must be equal, but are 2 and 5 for 'add_1' (op: 'Add') with input shapes: [5,2,3], [5,3].


      Clearly I've done something wrong somewhere, but I can't see my logic error. Can anyone help?



      EDIT: To reproduce the error:



      >>> import tensorflow as tf
      >>> import keras
      >>> from keras import backend as K
      >>> inputs_i = tf.ones([5, 2, 1])
      >>> kernel_i = tf.ones([1,3])
      >>> h_tm1_i = tf.ones([5,3])
      >>> rec_i = tf.ones([3,3])
      >>> x_i = K.dot(inputs_i, kernel_i)
      >>> x_i
      <tf.Tensor 'Reshape_9:0' shape=(5, 2, 3) dtype=float32>
      >>> K.dot(h_tm1_i, rec_i)
      <tf.Tensor 'MatMul_4:0' shape=(5, 3) dtype=float32>
      >>> x_i + K.dot(h_tm1, rec_i) #Raises ValueError









      share|improve this question











      $endgroup$




      I'm looking at the code behind a Keras LSTM, and I noticed something I find odd.



      Suppose we're feeding in input of size (batch_size, time_steps, input_dim)
      where:




      • batch_size is the number of examples in the minibatch,


      • time_steps is the number of time steps to look back (i.e the window size)


      • input_dim is the number of input variables.

      Suppose we want to input minibatches of size batch_size=5, time_steps=2, and input_dim=1.



      That is, we have a univariate time series with five examples per minibatch and we use the previous two values of the time series to predict the next value.



      Say we want to accomplish this using an LSTM with size 3 (that is, this is the number of hidden units and the number of output units). Call this size units.



      In lines 1871-1995 of the code here, the LSTM is being built. The equation I'm confused about is on lines 1989-1990, which corresponds to the calculation of the input gate:



      i = self.recurrent_activation(x_i + K.dot(h_tm1_i, self.recurrent_kernel_i))


      The variable x_i is calculated as:



      x_i = K.dot(inputs_i, self.kernel_i)


      where




      • inputs_i = the input at time i, which has shape (batch_size, time_steps, input_dim), in our case would have shape (5, 2, 1)


      • self.kernel_i = the weight matrix that is multiplied by the current input at time t, and has shape (input_dim, units), in our case would have shape (1, 3)

      I understand this dot product is accomplished via broadcasting, and the final shape is (batch_size, time_steps, units), which in our case is (5, 2, 3).



      Now let's examine the dot product:



      K.dot(h_tm1_i, self.recurrent_kernel_i))


      where



      • h_tm1_i = the recurrent hidden state at time i, and has shape equal to (batch_size, units) according to line 295. In our case, that's (5, 3)


      • self.recurrent_kernel = the weight matrix that is multiplied by the previous hidden state, and has shape (units, units), which in our case would be (3, 3).


      Somehow via the magic of broadcasting, the dot product gives something with shape (5, 3)



      We're left with needing to add x_i and K.dot(h_tm1_i, self.recurrent_kernel_i)), which have shapes (5, 2, 3) and (5, 3) respectively. When I try to do that myself in tensorflow, I get an error:



      ValueError: Dimensions must be equal, but are 2 and 5 for 'add_1' (op: 'Add') with input shapes: [5,2,3], [5,3].


      Clearly I've done something wrong somewhere, but I can't see my logic error. Can anyone help?



      EDIT: To reproduce the error:



      >>> import tensorflow as tf
      >>> import keras
      >>> from keras import backend as K
      >>> inputs_i = tf.ones([5, 2, 1])
      >>> kernel_i = tf.ones([1,3])
      >>> h_tm1_i = tf.ones([5,3])
      >>> rec_i = tf.ones([3,3])
      >>> x_i = K.dot(inputs_i, kernel_i)
      >>> x_i
      <tf.Tensor 'Reshape_9:0' shape=(5, 2, 3) dtype=float32>
      >>> K.dot(h_tm1_i, rec_i)
      <tf.Tensor 'MatMul_4:0' shape=(5, 3) dtype=float32>
      >>> x_i + K.dot(h_tm1, rec_i) #Raises ValueError






      python neural-network deep-learning tensorflow lstm






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited 1 hour ago







      StatsSorceress

















      asked 1 hour ago









      StatsSorceressStatsSorceress

      1,1473824




      1,1473824




















          0






          active

          oldest

          votes












          Your Answer








          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "557"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: false,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          imageUploader:
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          ,
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );













          draft saved

          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f49741%2flstm-equations-with-minibatches%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown

























          0






          active

          oldest

          votes








          0






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes















          draft saved

          draft discarded
















































          Thanks for contributing an answer to Data Science Stack Exchange!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid


          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.

          Use MathJax to format equations. MathJax reference.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f49741%2flstm-equations-with-minibatches%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          ValueError: Error when checking input: expected conv2d_13_input to have shape (3, 150, 150) but got array with shape (150, 150, 3)2019 Community Moderator ElectionError when checking : expected dense_1_input to have shape (None, 5) but got array with shape (200, 1)Error 'Expected 2D array, got 1D array instead:'ValueError: Error when checking input: expected lstm_41_input to have 3 dimensions, but got array with shape (40000,100)ValueError: Error when checking target: expected dense_1 to have shape (7,) but got array with shape (1,)ValueError: Error when checking target: expected dense_2 to have shape (1,) but got array with shape (0,)Keras exception: ValueError: Error when checking input: expected conv2d_1_input to have shape (150, 150, 3) but got array with shape (256, 256, 3)Steps taking too long to completewhen checking input: expected dense_1_input to have shape (13328,) but got array with shape (317,)ValueError: Error when checking target: expected dense_3 to have shape (None, 1) but got array with shape (7715, 40000)Keras exception: Error when checking input: expected dense_input to have shape (2,) but got array with shape (1,)

          Ружовы пелікан Змест Знешні выгляд | Пашырэнне | Асаблівасці біялогіі | Літаратура | НавігацыяДагледжаная версіяправерана1 зменаДагледжаная версіяправерана1 змена/ 22697590 Сістэматыкана ВіківідахВыявына Вікісховішчы174693363011049382

          Illegal assignment from SObject to ContactFetching String, Id from Map - Illegal Assignment Id to Field / ObjectError: Compile Error: Illegal assignment from String to BooleanError: List has no rows for assignment to SObjectError on Test Class - System.QueryException: List has no rows for assignment to SObjectRemote action problemDML requires SObject or SObject list type error“Illegal assignment from List to List”Test Class Fail: Batch Class: System.QueryException: List has no rows for assignment to SObjectMapping to a user'List has no rows for assignment to SObject' Mystery