Too much inputs = overfitting?2019 Community Moderator ElectionHow to fight underfitting in a deep neural netWhat is the “dying ReLU” problem in neural networks?Train a classifier for a game with feedback on chosen move instead of true labelsNeural Network - Adjust number of hidden layers and neuronsHow to know the model has started overfitting?Convnet training error does not decreaseOverfitting problem in modelMulti-inputs Convolutional Neural Network takes different number of imagesLoss is bad, but accuracy increases?Accuracy keep changing by changing randomState of classifier

Can I make popcorn with any corn?

How is the claim "I am in New York only if I am in America" the same as "If I am in New York, then I am in America?

Client team has low performances and low technical skills: we always fix their work and now they stop collaborate with us. How to solve?

What does "Puller Prush Person" mean?

Is it unprofessional to ask if a job posting on GlassDoor is real?

Why do I get two different answers for this counting problem?

What is a clear way to write a bar that has an extra beat?

Has there ever been an airliner design involving reducing generator load by installing solar panels?

Did Shadowfax go to Valinor?

Convert two switches to a dual stack, and add outlet - possible here?

How can bays and straits be determined in a procedurally generated map?

Can I ask the recruiters in my resume to put the reason why I am rejected?

What typically incentivizes a professor to change jobs to a lower ranking university?

If human space travel is limited by the G force vulnerability, is there a way to counter G forces?

When a company launches a new product do they "come out" with a new product or do they "come up" with a new product?

Why can't we play rap on piano?

Does detail obscure or enhance action?

A case of the sniffles

"You are your self first supporter", a more proper way to say it

What doth I be?

Filter any system log file by date or date range

Roll the carpet

Why are electrically insulating heatsinks so rare? Is it just cost?

Is it legal for company to use my work email to pretend I still work there?



Too much inputs = overfitting?



2019 Community Moderator ElectionHow to fight underfitting in a deep neural netWhat is the “dying ReLU” problem in neural networks?Train a classifier for a game with feedback on chosen move instead of true labelsNeural Network - Adjust number of hidden layers and neuronsHow to know the model has started overfitting?Convnet training error does not decreaseOverfitting problem in modelMulti-inputs Convolutional Neural Network takes different number of imagesLoss is bad, but accuracy increases?Accuracy keep changing by changing randomState of classifier










3












$begingroup$


First question : can I mix different sorts of inputs types for example, height and age (of course my inputs are normalized)? in general, can we mix different types of inputs in a neural network ?



Second question : can too much different inputs cause overfitting ?



I am using 120 inputs neurons and 20,000 train data and I am overfitting at 53% accuracy (bad)...



Thank you.










share|improve this question











$endgroup$











  • $begingroup$
    What do you mean by "overfitting at 53%". Do you mean your test accuracy is just 53% and training accuracy is good?
    $endgroup$
    – ab123
    Jun 25 '18 at 7:11










  • $begingroup$
    @ab123 I mean that at 53%, my validation ~= 53%; but when training > 53%, my validation drop to 40%, etc...
    $endgroup$
    – Fang 1Gao
    Jun 25 '18 at 15:37















3












$begingroup$


First question : can I mix different sorts of inputs types for example, height and age (of course my inputs are normalized)? in general, can we mix different types of inputs in a neural network ?



Second question : can too much different inputs cause overfitting ?



I am using 120 inputs neurons and 20,000 train data and I am overfitting at 53% accuracy (bad)...



Thank you.










share|improve this question











$endgroup$











  • $begingroup$
    What do you mean by "overfitting at 53%". Do you mean your test accuracy is just 53% and training accuracy is good?
    $endgroup$
    – ab123
    Jun 25 '18 at 7:11










  • $begingroup$
    @ab123 I mean that at 53%, my validation ~= 53%; but when training > 53%, my validation drop to 40%, etc...
    $endgroup$
    – Fang 1Gao
    Jun 25 '18 at 15:37













3












3








3


3



$begingroup$


First question : can I mix different sorts of inputs types for example, height and age (of course my inputs are normalized)? in general, can we mix different types of inputs in a neural network ?



Second question : can too much different inputs cause overfitting ?



I am using 120 inputs neurons and 20,000 train data and I am overfitting at 53% accuracy (bad)...



Thank you.










share|improve this question











$endgroup$




First question : can I mix different sorts of inputs types for example, height and age (of course my inputs are normalized)? in general, can we mix different types of inputs in a neural network ?



Second question : can too much different inputs cause overfitting ?



I am using 120 inputs neurons and 20,000 train data and I am overfitting at 53% accuracy (bad)...



Thank you.







neural-network classification feature-engineering overfitting feature-construction






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Jun 25 '18 at 3:53









ebrahimi

74421022




74421022










asked Jun 24 '18 at 23:09









Fang 1GaoFang 1Gao

161




161











  • $begingroup$
    What do you mean by "overfitting at 53%". Do you mean your test accuracy is just 53% and training accuracy is good?
    $endgroup$
    – ab123
    Jun 25 '18 at 7:11










  • $begingroup$
    @ab123 I mean that at 53%, my validation ~= 53%; but when training > 53%, my validation drop to 40%, etc...
    $endgroup$
    – Fang 1Gao
    Jun 25 '18 at 15:37
















  • $begingroup$
    What do you mean by "overfitting at 53%". Do you mean your test accuracy is just 53% and training accuracy is good?
    $endgroup$
    – ab123
    Jun 25 '18 at 7:11










  • $begingroup$
    @ab123 I mean that at 53%, my validation ~= 53%; but when training > 53%, my validation drop to 40%, etc...
    $endgroup$
    – Fang 1Gao
    Jun 25 '18 at 15:37















$begingroup$
What do you mean by "overfitting at 53%". Do you mean your test accuracy is just 53% and training accuracy is good?
$endgroup$
– ab123
Jun 25 '18 at 7:11




$begingroup$
What do you mean by "overfitting at 53%". Do you mean your test accuracy is just 53% and training accuracy is good?
$endgroup$
– ab123
Jun 25 '18 at 7:11












$begingroup$
@ab123 I mean that at 53%, my validation ~= 53%; but when training > 53%, my validation drop to 40%, etc...
$endgroup$
– Fang 1Gao
Jun 25 '18 at 15:37




$begingroup$
@ab123 I mean that at 53%, my validation ~= 53%; but when training > 53%, my validation drop to 40%, etc...
$endgroup$
– Fang 1Gao
Jun 25 '18 at 15:37










2 Answers
2






active

oldest

votes


















2












$begingroup$

Yes, you can mix any different sort of inputs when the scales of the features are similar, which is achieved by normalising the feature vectors.



I assume you mean too many features when you say 'too much input'



If you mean the size (number of training examples) of input data, size of input data is not directly related to overfitting. Overfitting depends on model complexity. It happens when model tries to fit to the noise of the input data and hence becomes too specific that it can't generalize well to new training data.



Any model that is "sufficiently" complex (for eg. one that contains many hidden layers, large number of neurons in each layer, whose weights are not regularized) can easily converge to give very little loss on training data (unless it converges to a different sub-optimal local minima), but will give poor accuracy on test data. But in general, on the contrary, lack of enough data often leads to overfitting because the model tries to learn based on very few specimens which are less diverse. It's like showing a child a samples of balls containing only white and orange table tennis balls, and asking him/her to identify a blue colored ball.



Too many features can lead to overfitting because it can increase model complexity. There is greater chance of redundancy in features and of features that are not at all related to prediction.



For eg. if you're predicting quality of a tennis ball, the feature chosen as colour of the ball is irrelevant, but the network will learn from training examples and there is a chance that since people like yellow colored balls to play with, they play more often with them and those balls don't last long.






share|improve this answer











$endgroup$




















    0












    $begingroup$

    Based on my experience so far, having too many features as inputs to your NN ,tends to degrade performance *full disclaimer i'm no expert, but smarter people than me have coined a term called The curse of dimensionality. Here is a paragraph I took from Medium Curse of dimensionality and feature reduction




    The curse of dimensionality occurs because the sample density decreases exponentially with the increase of the dimensionality




    Good now we know that having too many features is bad for our model performance (or feature to sample ratio which increases significantly) what can we do to solve it?



    Right now I can think of 3 ways



    1. Feature Selection


    2. Feature Extraction


    3. Ensemble learning of different sub series of those features (yummmyyy :) )






    share|improve this answer










    New contributor




    jetychill is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.






    $endgroup$













      Your Answer





      StackExchange.ifUsing("editor", function ()
      return StackExchange.using("mathjaxEditing", function ()
      StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
      StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
      );
      );
      , "mathjax-editing");

      StackExchange.ready(function()
      var channelOptions =
      tags: "".split(" "),
      id: "557"
      ;
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function()
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled)
      StackExchange.using("snippets", function()
      createEditor();
      );

      else
      createEditor();

      );

      function createEditor()
      StackExchange.prepareEditor(
      heartbeatType: 'answer',
      autoActivateHeartbeat: false,
      convertImagesToLinks: false,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: null,
      bindNavPrevention: true,
      postfix: "",
      imageUploader:
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      ,
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      );



      );













      draft saved

      draft discarded


















      StackExchange.ready(
      function ()
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f33580%2ftoo-much-inputs-overfitting%23new-answer', 'question_page');

      );

      Post as a guest















      Required, but never shown

























      2 Answers
      2






      active

      oldest

      votes








      2 Answers
      2






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes









      2












      $begingroup$

      Yes, you can mix any different sort of inputs when the scales of the features are similar, which is achieved by normalising the feature vectors.



      I assume you mean too many features when you say 'too much input'



      If you mean the size (number of training examples) of input data, size of input data is not directly related to overfitting. Overfitting depends on model complexity. It happens when model tries to fit to the noise of the input data and hence becomes too specific that it can't generalize well to new training data.



      Any model that is "sufficiently" complex (for eg. one that contains many hidden layers, large number of neurons in each layer, whose weights are not regularized) can easily converge to give very little loss on training data (unless it converges to a different sub-optimal local minima), but will give poor accuracy on test data. But in general, on the contrary, lack of enough data often leads to overfitting because the model tries to learn based on very few specimens which are less diverse. It's like showing a child a samples of balls containing only white and orange table tennis balls, and asking him/her to identify a blue colored ball.



      Too many features can lead to overfitting because it can increase model complexity. There is greater chance of redundancy in features and of features that are not at all related to prediction.



      For eg. if you're predicting quality of a tennis ball, the feature chosen as colour of the ball is irrelevant, but the network will learn from training examples and there is a chance that since people like yellow colored balls to play with, they play more often with them and those balls don't last long.






      share|improve this answer











      $endgroup$

















        2












        $begingroup$

        Yes, you can mix any different sort of inputs when the scales of the features are similar, which is achieved by normalising the feature vectors.



        I assume you mean too many features when you say 'too much input'



        If you mean the size (number of training examples) of input data, size of input data is not directly related to overfitting. Overfitting depends on model complexity. It happens when model tries to fit to the noise of the input data and hence becomes too specific that it can't generalize well to new training data.



        Any model that is "sufficiently" complex (for eg. one that contains many hidden layers, large number of neurons in each layer, whose weights are not regularized) can easily converge to give very little loss on training data (unless it converges to a different sub-optimal local minima), but will give poor accuracy on test data. But in general, on the contrary, lack of enough data often leads to overfitting because the model tries to learn based on very few specimens which are less diverse. It's like showing a child a samples of balls containing only white and orange table tennis balls, and asking him/her to identify a blue colored ball.



        Too many features can lead to overfitting because it can increase model complexity. There is greater chance of redundancy in features and of features that are not at all related to prediction.



        For eg. if you're predicting quality of a tennis ball, the feature chosen as colour of the ball is irrelevant, but the network will learn from training examples and there is a chance that since people like yellow colored balls to play with, they play more often with them and those balls don't last long.






        share|improve this answer











        $endgroup$















          2












          2








          2





          $begingroup$

          Yes, you can mix any different sort of inputs when the scales of the features are similar, which is achieved by normalising the feature vectors.



          I assume you mean too many features when you say 'too much input'



          If you mean the size (number of training examples) of input data, size of input data is not directly related to overfitting. Overfitting depends on model complexity. It happens when model tries to fit to the noise of the input data and hence becomes too specific that it can't generalize well to new training data.



          Any model that is "sufficiently" complex (for eg. one that contains many hidden layers, large number of neurons in each layer, whose weights are not regularized) can easily converge to give very little loss on training data (unless it converges to a different sub-optimal local minima), but will give poor accuracy on test data. But in general, on the contrary, lack of enough data often leads to overfitting because the model tries to learn based on very few specimens which are less diverse. It's like showing a child a samples of balls containing only white and orange table tennis balls, and asking him/her to identify a blue colored ball.



          Too many features can lead to overfitting because it can increase model complexity. There is greater chance of redundancy in features and of features that are not at all related to prediction.



          For eg. if you're predicting quality of a tennis ball, the feature chosen as colour of the ball is irrelevant, but the network will learn from training examples and there is a chance that since people like yellow colored balls to play with, they play more often with them and those balls don't last long.






          share|improve this answer











          $endgroup$



          Yes, you can mix any different sort of inputs when the scales of the features are similar, which is achieved by normalising the feature vectors.



          I assume you mean too many features when you say 'too much input'



          If you mean the size (number of training examples) of input data, size of input data is not directly related to overfitting. Overfitting depends on model complexity. It happens when model tries to fit to the noise of the input data and hence becomes too specific that it can't generalize well to new training data.



          Any model that is "sufficiently" complex (for eg. one that contains many hidden layers, large number of neurons in each layer, whose weights are not regularized) can easily converge to give very little loss on training data (unless it converges to a different sub-optimal local minima), but will give poor accuracy on test data. But in general, on the contrary, lack of enough data often leads to overfitting because the model tries to learn based on very few specimens which are less diverse. It's like showing a child a samples of balls containing only white and orange table tennis balls, and asking him/her to identify a blue colored ball.



          Too many features can lead to overfitting because it can increase model complexity. There is greater chance of redundancy in features and of features that are not at all related to prediction.



          For eg. if you're predicting quality of a tennis ball, the feature chosen as colour of the ball is irrelevant, but the network will learn from training examples and there is a chance that since people like yellow colored balls to play with, they play more often with them and those balls don't last long.







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Jun 25 '18 at 7:52

























          answered Jun 25 '18 at 7:45









          ab123ab123

          15716




          15716





















              0












              $begingroup$

              Based on my experience so far, having too many features as inputs to your NN ,tends to degrade performance *full disclaimer i'm no expert, but smarter people than me have coined a term called The curse of dimensionality. Here is a paragraph I took from Medium Curse of dimensionality and feature reduction




              The curse of dimensionality occurs because the sample density decreases exponentially with the increase of the dimensionality




              Good now we know that having too many features is bad for our model performance (or feature to sample ratio which increases significantly) what can we do to solve it?



              Right now I can think of 3 ways



              1. Feature Selection


              2. Feature Extraction


              3. Ensemble learning of different sub series of those features (yummmyyy :) )






              share|improve this answer










              New contributor




              jetychill is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
              Check out our Code of Conduct.






              $endgroup$

















                0












                $begingroup$

                Based on my experience so far, having too many features as inputs to your NN ,tends to degrade performance *full disclaimer i'm no expert, but smarter people than me have coined a term called The curse of dimensionality. Here is a paragraph I took from Medium Curse of dimensionality and feature reduction




                The curse of dimensionality occurs because the sample density decreases exponentially with the increase of the dimensionality




                Good now we know that having too many features is bad for our model performance (or feature to sample ratio which increases significantly) what can we do to solve it?



                Right now I can think of 3 ways



                1. Feature Selection


                2. Feature Extraction


                3. Ensemble learning of different sub series of those features (yummmyyy :) )






                share|improve this answer










                New contributor




                jetychill is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                Check out our Code of Conduct.






                $endgroup$















                  0












                  0








                  0





                  $begingroup$

                  Based on my experience so far, having too many features as inputs to your NN ,tends to degrade performance *full disclaimer i'm no expert, but smarter people than me have coined a term called The curse of dimensionality. Here is a paragraph I took from Medium Curse of dimensionality and feature reduction




                  The curse of dimensionality occurs because the sample density decreases exponentially with the increase of the dimensionality




                  Good now we know that having too many features is bad for our model performance (or feature to sample ratio which increases significantly) what can we do to solve it?



                  Right now I can think of 3 ways



                  1. Feature Selection


                  2. Feature Extraction


                  3. Ensemble learning of different sub series of those features (yummmyyy :) )






                  share|improve this answer










                  New contributor




                  jetychill is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                  Check out our Code of Conduct.






                  $endgroup$



                  Based on my experience so far, having too many features as inputs to your NN ,tends to degrade performance *full disclaimer i'm no expert, but smarter people than me have coined a term called The curse of dimensionality. Here is a paragraph I took from Medium Curse of dimensionality and feature reduction




                  The curse of dimensionality occurs because the sample density decreases exponentially with the increase of the dimensionality




                  Good now we know that having too many features is bad for our model performance (or feature to sample ratio which increases significantly) what can we do to solve it?



                  Right now I can think of 3 ways



                  1. Feature Selection


                  2. Feature Extraction


                  3. Ensemble learning of different sub series of those features (yummmyyy :) )







                  share|improve this answer










                  New contributor




                  jetychill is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                  Check out our Code of Conduct.









                  share|improve this answer



                  share|improve this answer








                  edited 5 hours ago









                  Stephen Rauch

                  1,52551330




                  1,52551330






                  New contributor




                  jetychill is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                  Check out our Code of Conduct.









                  answered 6 hours ago









                  jetychilljetychill

                  1




                  1




                  New contributor




                  jetychill is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                  Check out our Code of Conduct.





                  New contributor





                  jetychill is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                  Check out our Code of Conduct.






                  jetychill is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                  Check out our Code of Conduct.



























                      draft saved

                      draft discarded
















































                      Thanks for contributing an answer to Data Science Stack Exchange!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid


                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.

                      Use MathJax to format equations. MathJax reference.


                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function ()
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f33580%2ftoo-much-inputs-overfitting%23new-answer', 'question_page');

                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      Ружовы пелікан Змест Знешні выгляд | Пашырэнне | Асаблівасці біялогіі | Літаратура | НавігацыяДагледжаная версіяправерана1 зменаДагледжаная версіяправерана1 змена/ 22697590 Сістэматыкана ВіківідахВыявына Вікісховішчы174693363011049382

                      ValueError: Error when checking input: expected conv2d_13_input to have shape (3, 150, 150) but got array with shape (150, 150, 3)2019 Community Moderator ElectionError when checking : expected dense_1_input to have shape (None, 5) but got array with shape (200, 1)Error 'Expected 2D array, got 1D array instead:'ValueError: Error when checking input: expected lstm_41_input to have 3 dimensions, but got array with shape (40000,100)ValueError: Error when checking target: expected dense_1 to have shape (7,) but got array with shape (1,)ValueError: Error when checking target: expected dense_2 to have shape (1,) but got array with shape (0,)Keras exception: ValueError: Error when checking input: expected conv2d_1_input to have shape (150, 150, 3) but got array with shape (256, 256, 3)Steps taking too long to completewhen checking input: expected dense_1_input to have shape (13328,) but got array with shape (317,)ValueError: Error when checking target: expected dense_3 to have shape (None, 1) but got array with shape (7715, 40000)Keras exception: Error when checking input: expected dense_input to have shape (2,) but got array with shape (1,)

                      Illegal assignment from SObject to ContactFetching String, Id from Map - Illegal Assignment Id to Field / ObjectError: Compile Error: Illegal assignment from String to BooleanError: List has no rows for assignment to SObjectError on Test Class - System.QueryException: List has no rows for assignment to SObjectRemote action problemDML requires SObject or SObject list type error“Illegal assignment from List to List”Test Class Fail: Batch Class: System.QueryException: List has no rows for assignment to SObjectMapping to a user'List has no rows for assignment to SObject' Mystery