Merging sparse and dense data in machine learning to improve the performance The 2019 Stack Overflow Developer Survey Results Are InText categorization: combining different kind of featuresPredictive modeling based on RFM scoring indicatorsNeural Networks getting stuck at local optimaClassifier and Technique to use for large number of categoriesAn abstract idea for the performance diffs between SLP and MLPScientific name for the problem given in the description. (In machine learning what kind of problem it is termed as)How to combine sparse text features with user smile for sentiment classification?Calculating optimal path(s) to increase probability of a classificationscalable tools to build kNN graph over sparse dataFeature Importance Python

What do hard-Brexiteers want with respect to the Irish border?

Why doesn't shell automatically fix "useless use of cat"?

Why was M87 targeted for the Event Horizon Telescope instead of Sagittarius A*?

Worn-tile Scrabble

Pokemon Turn Based battle (Python)

RequirePermission not working

What is the meaning of Triage in Cybersec world?

How to charge AirPods to keep battery healthy?

How do you keep chess fun when your opponent constantly beats you?

How did passengers keep warm on sail ships?

How do PCB vias affect signal quality?

How to translate "being like"?

A word that means fill it to the required quantity

Getting crown tickets for Statue of Liberty

Did Scotland spend $250,000 for the slogan "Welcome to Scotland"?

Are there any other methods to apply to solving simultaneous equations?

writing variables above the numbers in tikz picture

Why couldn't they take pictures of a closer black hole?

What does もの mean in this sentence?

How to quickly solve partial fractions equation?

How to obtain a position of last non-zero element

Why not take a picture of a closer black hole?

What is the motivation for a law requiring 2 parties to consent for recording a conversation

What is this sharp, curved notch on my knife for?



Merging sparse and dense data in machine learning to improve the performance



The 2019 Stack Overflow Developer Survey Results Are InText categorization: combining different kind of featuresPredictive modeling based on RFM scoring indicatorsNeural Networks getting stuck at local optimaClassifier and Technique to use for large number of categoriesAn abstract idea for the performance diffs between SLP and MLPScientific name for the problem given in the description. (In machine learning what kind of problem it is termed as)How to combine sparse text features with user smile for sentiment classification?Calculating optimal path(s) to increase probability of a classificationscalable tools to build kNN graph over sparse dataFeature Importance Python










16












$begingroup$


I have sparse features which are predictive, also I have some dense features which are also predictive. I need to combine these features together to improve the overall performance of the classifier.



Now, the thing is when I try to combine these together, the dense features tend to dominate more over sparse features, hence giving only 1% improvement in AUC compared to model with only dense features.



Has somebody come across similar problems? Really appreciate the inputs, kind of stuck. I have already tried lot of different classifiers, combination of classifiers, feature transformations and processing with different algorithms.



Thanks in advance for the help.



Edit:



I have already tried the suggestions which are given in the comments. What I have observed is, for almost 45% of the data, sparse features perform really well, I get the AUC of around 0.9 with only sparse features, but for the remaining ones dense features perform well with AUC of around 0.75. I kind of tried separating out these datasets, but I get the AUC of 0.6, so, I can't simply train a model and decide which features to use.



Regarding the code snippet, I have tried out so many things, that I am not sure what exactly to share :(










share|improve this question











$endgroup$











  • $begingroup$
    How sparse are your features? Are they 1% filled or even less?
    $endgroup$
    – João Almeida
    Apr 6 '16 at 12:35






  • 2




    $begingroup$
    Also you should note that if your features are sparse then they should only help classify a small part of your dataset, which means overall the accuracy shouldn't change significantly. This is kind of a guess, as I don't know what are the characteristics of your dataset.
    $endgroup$
    – João Almeida
    Apr 6 '16 at 12:40










  • $begingroup$
    @JoãoAlmeida They are not that sparse. They are around 5% filled. The problem is when I look at the difference in the predictions from two models, where the predictions differ, model with sparse features tend to perform better, that's why I expected it to see the boost in AUC as well when I combined them with dense features. I am getting a boost, but seems very low.
    $endgroup$
    – Sagar Waghmode
    Apr 7 '16 at 10:46










  • $begingroup$
    hum... I don't have any idea for you then
    $endgroup$
    – João Almeida
    Apr 7 '16 at 10:51















16












$begingroup$


I have sparse features which are predictive, also I have some dense features which are also predictive. I need to combine these features together to improve the overall performance of the classifier.



Now, the thing is when I try to combine these together, the dense features tend to dominate more over sparse features, hence giving only 1% improvement in AUC compared to model with only dense features.



Has somebody come across similar problems? Really appreciate the inputs, kind of stuck. I have already tried lot of different classifiers, combination of classifiers, feature transformations and processing with different algorithms.



Thanks in advance for the help.



Edit:



I have already tried the suggestions which are given in the comments. What I have observed is, for almost 45% of the data, sparse features perform really well, I get the AUC of around 0.9 with only sparse features, but for the remaining ones dense features perform well with AUC of around 0.75. I kind of tried separating out these datasets, but I get the AUC of 0.6, so, I can't simply train a model and decide which features to use.



Regarding the code snippet, I have tried out so many things, that I am not sure what exactly to share :(










share|improve this question











$endgroup$











  • $begingroup$
    How sparse are your features? Are they 1% filled or even less?
    $endgroup$
    – João Almeida
    Apr 6 '16 at 12:35






  • 2




    $begingroup$
    Also you should note that if your features are sparse then they should only help classify a small part of your dataset, which means overall the accuracy shouldn't change significantly. This is kind of a guess, as I don't know what are the characteristics of your dataset.
    $endgroup$
    – João Almeida
    Apr 6 '16 at 12:40










  • $begingroup$
    @JoãoAlmeida They are not that sparse. They are around 5% filled. The problem is when I look at the difference in the predictions from two models, where the predictions differ, model with sparse features tend to perform better, that's why I expected it to see the boost in AUC as well when I combined them with dense features. I am getting a boost, but seems very low.
    $endgroup$
    – Sagar Waghmode
    Apr 7 '16 at 10:46










  • $begingroup$
    hum... I don't have any idea for you then
    $endgroup$
    – João Almeida
    Apr 7 '16 at 10:51













16












16








16


6



$begingroup$


I have sparse features which are predictive, also I have some dense features which are also predictive. I need to combine these features together to improve the overall performance of the classifier.



Now, the thing is when I try to combine these together, the dense features tend to dominate more over sparse features, hence giving only 1% improvement in AUC compared to model with only dense features.



Has somebody come across similar problems? Really appreciate the inputs, kind of stuck. I have already tried lot of different classifiers, combination of classifiers, feature transformations and processing with different algorithms.



Thanks in advance for the help.



Edit:



I have already tried the suggestions which are given in the comments. What I have observed is, for almost 45% of the data, sparse features perform really well, I get the AUC of around 0.9 with only sparse features, but for the remaining ones dense features perform well with AUC of around 0.75. I kind of tried separating out these datasets, but I get the AUC of 0.6, so, I can't simply train a model and decide which features to use.



Regarding the code snippet, I have tried out so many things, that I am not sure what exactly to share :(










share|improve this question











$endgroup$




I have sparse features which are predictive, also I have some dense features which are also predictive. I need to combine these features together to improve the overall performance of the classifier.



Now, the thing is when I try to combine these together, the dense features tend to dominate more over sparse features, hence giving only 1% improvement in AUC compared to model with only dense features.



Has somebody come across similar problems? Really appreciate the inputs, kind of stuck. I have already tried lot of different classifiers, combination of classifiers, feature transformations and processing with different algorithms.



Thanks in advance for the help.



Edit:



I have already tried the suggestions which are given in the comments. What I have observed is, for almost 45% of the data, sparse features perform really well, I get the AUC of around 0.9 with only sparse features, but for the remaining ones dense features perform well with AUC of around 0.75. I kind of tried separating out these datasets, but I get the AUC of 0.6, so, I can't simply train a model and decide which features to use.



Regarding the code snippet, I have tried out so many things, that I am not sure what exactly to share :(







machine-learning classification predictive-modeling scikit-learn supervised-learning






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Apr 18 '16 at 4:42







Sagar Waghmode

















asked Apr 6 '16 at 5:14









Sagar WaghmodeSagar Waghmode

13117




13117











  • $begingroup$
    How sparse are your features? Are they 1% filled or even less?
    $endgroup$
    – João Almeida
    Apr 6 '16 at 12:35






  • 2




    $begingroup$
    Also you should note that if your features are sparse then they should only help classify a small part of your dataset, which means overall the accuracy shouldn't change significantly. This is kind of a guess, as I don't know what are the characteristics of your dataset.
    $endgroup$
    – João Almeida
    Apr 6 '16 at 12:40










  • $begingroup$
    @JoãoAlmeida They are not that sparse. They are around 5% filled. The problem is when I look at the difference in the predictions from two models, where the predictions differ, model with sparse features tend to perform better, that's why I expected it to see the boost in AUC as well when I combined them with dense features. I am getting a boost, but seems very low.
    $endgroup$
    – Sagar Waghmode
    Apr 7 '16 at 10:46










  • $begingroup$
    hum... I don't have any idea for you then
    $endgroup$
    – João Almeida
    Apr 7 '16 at 10:51
















  • $begingroup$
    How sparse are your features? Are they 1% filled or even less?
    $endgroup$
    – João Almeida
    Apr 6 '16 at 12:35






  • 2




    $begingroup$
    Also you should note that if your features are sparse then they should only help classify a small part of your dataset, which means overall the accuracy shouldn't change significantly. This is kind of a guess, as I don't know what are the characteristics of your dataset.
    $endgroup$
    – João Almeida
    Apr 6 '16 at 12:40










  • $begingroup$
    @JoãoAlmeida They are not that sparse. They are around 5% filled. The problem is when I look at the difference in the predictions from two models, where the predictions differ, model with sparse features tend to perform better, that's why I expected it to see the boost in AUC as well when I combined them with dense features. I am getting a boost, but seems very low.
    $endgroup$
    – Sagar Waghmode
    Apr 7 '16 at 10:46










  • $begingroup$
    hum... I don't have any idea for you then
    $endgroup$
    – João Almeida
    Apr 7 '16 at 10:51















$begingroup$
How sparse are your features? Are they 1% filled or even less?
$endgroup$
– João Almeida
Apr 6 '16 at 12:35




$begingroup$
How sparse are your features? Are they 1% filled or even less?
$endgroup$
– João Almeida
Apr 6 '16 at 12:35




2




2




$begingroup$
Also you should note that if your features are sparse then they should only help classify a small part of your dataset, which means overall the accuracy shouldn't change significantly. This is kind of a guess, as I don't know what are the characteristics of your dataset.
$endgroup$
– João Almeida
Apr 6 '16 at 12:40




$begingroup$
Also you should note that if your features are sparse then they should only help classify a small part of your dataset, which means overall the accuracy shouldn't change significantly. This is kind of a guess, as I don't know what are the characteristics of your dataset.
$endgroup$
– João Almeida
Apr 6 '16 at 12:40












$begingroup$
@JoãoAlmeida They are not that sparse. They are around 5% filled. The problem is when I look at the difference in the predictions from two models, where the predictions differ, model with sparse features tend to perform better, that's why I expected it to see the boost in AUC as well when I combined them with dense features. I am getting a boost, but seems very low.
$endgroup$
– Sagar Waghmode
Apr 7 '16 at 10:46




$begingroup$
@JoãoAlmeida They are not that sparse. They are around 5% filled. The problem is when I look at the difference in the predictions from two models, where the predictions differ, model with sparse features tend to perform better, that's why I expected it to see the boost in AUC as well when I combined them with dense features. I am getting a boost, but seems very low.
$endgroup$
– Sagar Waghmode
Apr 7 '16 at 10:46












$begingroup$
hum... I don't have any idea for you then
$endgroup$
– João Almeida
Apr 7 '16 at 10:51




$begingroup$
hum... I don't have any idea for you then
$endgroup$
– João Almeida
Apr 7 '16 at 10:51










6 Answers
6






active

oldest

votes


















6












$begingroup$

This seems like a job for Principal Component Analysis. In Scikit is PCA implemented well and it helped me many times.



PCA, in a certain way, combines your features. By limiting the number of components, you fetch your model with noise-less data (in the best case). Because your model is as good as your data are.



Consider below a simple example.



from sklearn.pipeline import Pipeline
pipe_rf = Pipeline([('pca', PCA(n_components=80)),
('clf',RandomForestClassifier(n_estimators=100))])
pipe_rf.fit(X_train_s,y_train_s)

pred = pipe_rf.predict(X_test)


Why I picked 80? When I plot cumulative variance, I got this below, which tells me that with ~80 components, I reach almost all the variance.
cumulative variance



So I would say give it a try, use it in your models. It should help.






share|improve this answer











$endgroup$




















    3





    +25







    $begingroup$

    The best way to combine features is through ensemble methods.
    Basically there are three different methods: bagging, boosting and stacking.
    You can either use Adabbost augmented with feature selection (in this consider both sparse and dense features) or stacking based (random feature - random subspace)
    I prefer the second option you can train a set of base learners ( decisions. Trees) by using random subsets and random feature ( keep training base learners until you cover the whole set of features)
    The next step is to test the Training set to generate the meta data. Use this meta data to train a meta classifier.
    The meta classifier will figure out which feature is more important and what kind of relationship should be utilized






    share|improve this answer









    $endgroup$












    • $begingroup$
      Can you please share the relevant documentation? Didn't exactly get you what you meant?
      $endgroup$
      – Sagar Waghmode
      Apr 13 '16 at 6:04










    • $begingroup$
      You can read an article about staking " issues in stacking techniques, 1999" read about stackingC . It is very important to know that I am talking about the whole vector (e.g. 1x36 in case of Hog) as a one feature, but not the dimensions within it. You need to track which feature used with which base learner. Be careful about the overfitting problem
      $endgroup$
      – Bashar Haddad
      Apr 13 '16 at 16:15










    • $begingroup$
      If you give more details about the database , number of classes, number of samples , code , what things you have tried , what things you noticed, do you have data imbalance problem, noisy samples ,... Etc . All these details are important and can help in selecting the best method. Give me more details if this ok and I may help in a better way
      $endgroup$
      – Bashar Haddad
      Apr 13 '16 at 16:19


















    1












    $begingroup$

    The variable groups may be multicollinear or the conversion between sparse and dense might go wrong. Have you thought about using a voting classifier/ ensemble classification? http://scikit-learn.org/stable/modules/ensemble.html
    That way you could deal with both above problems.






    share|improve this answer









    $endgroup$












    • $begingroup$
      I have already tried out the ensemble techniques as well as voting classifiers. Still no luck.
      $endgroup$
      – Sagar Waghmode
      Apr 12 '16 at 8:15










    • $begingroup$
      So do you see a lot of overlap then between the predictions from the two datasets? May be there indeed is no new information? I.e. the data tells the same story.
      $endgroup$
      – Diego
      Apr 12 '16 at 9:20










    • $begingroup$
      yes, I have done exactly that. Though the predictions are not entirely different, the number of samples where predictions differ are quite high (around 15-20%) of the data. For these samples model with sparse features performs better than that of model with dense features. My point is if sparse features perform better, why don't they come as important features in any of the models which I have tried so far.
      $endgroup$
      – Sagar Waghmode
      Apr 12 '16 at 9:31










    • $begingroup$
      What predictor algorithm do you use?
      $endgroup$
      – Diego
      Apr 12 '16 at 12:21










    • $begingroup$
      I have tried out quite a few algorithms and settled on Gradient Boosted Model, also I do use Random Forests quite a lot for my problem.
      $endgroup$
      – Sagar Waghmode
      Apr 12 '16 at 17:27


















    1












    $begingroup$

    In addition to some of the suggestions above, I would recommend using a two-step modeling approach.



    1. Use the sparse features first and develop the best model.

    2. Calculate the predicted probability from that model.

    3. Feed that probability estimate into the second model (as an input feature), which would incorporate the dense features. In other words, use all dense features and the probability estimate for building the second model.

    4. The final classification will then be based on the second model.





    share|improve this answer









    $endgroup$




















      0












      $begingroup$

      Try PCA only on sparse features, and combine PCA output with dense features.



      So you'll get dense set of (original) features + dense set of features (which were originally sparse).



      +1 for the question. Please update us with the results.






      share|improve this answer









      $endgroup$












      • $begingroup$
        Wow, this has actually brought down AUC :( Not sure, what it means, need to check the feature importance and all. But my philosophy is, out of around 2.3k sparse features, I used 1k features which were explaining 0.97 variance ratio, this loss of information may have brought down AUC.
        $endgroup$
        – Sagar Waghmode
        Apr 18 '16 at 10:17










      • $begingroup$
        Interesting. Thanks for sharing. We have very similar dataset to yours (1k-2k sparse features). Just out of curiosity, how many principal componenets you have generated? If that number is too low, this may explain why AUC went down.
        $endgroup$
        – Tagar
        Apr 18 '16 at 15:22










      • $begingroup$
        As I said already, I have generated 1k principal components which were explaining 0.97 variance.
        $endgroup$
        – Sagar Waghmode
        Apr 18 '16 at 17:55


















      0












      $begingroup$

      i met the same problem, maybe simply put dense and sparse feature in a single model is not a good choice. maybe you can try wide and deep model. wide for sparse features and deep for dense features, if you tried this method, please tell me the answer.






      share|improve this answer








      New contributor




      Jianye Ji is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






      $endgroup$













        Your Answer





        StackExchange.ifUsing("editor", function ()
        return StackExchange.using("mathjaxEditing", function ()
        StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
        StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
        );
        );
        , "mathjax-editing");

        StackExchange.ready(function()
        var channelOptions =
        tags: "".split(" "),
        id: "557"
        ;
        initTagRenderer("".split(" "), "".split(" "), channelOptions);

        StackExchange.using("externalEditor", function()
        // Have to fire editor after snippets, if snippets enabled
        if (StackExchange.settings.snippets.snippetsEnabled)
        StackExchange.using("snippets", function()
        createEditor();
        );

        else
        createEditor();

        );

        function createEditor()
        StackExchange.prepareEditor(
        heartbeatType: 'answer',
        autoActivateHeartbeat: false,
        convertImagesToLinks: false,
        noModals: true,
        showLowRepImageUploadWarning: true,
        reputationToPostImages: null,
        bindNavPrevention: true,
        postfix: "",
        imageUploader:
        brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
        contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
        allowUrls: true
        ,
        onDemand: true,
        discardSelector: ".discard-answer"
        ,immediatelyShowMarkdownHelp:true
        );



        );













        draft saved

        draft discarded


















        StackExchange.ready(
        function ()
        StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f11060%2fmerging-sparse-and-dense-data-in-machine-learning-to-improve-the-performance%23new-answer', 'question_page');

        );

        Post as a guest















        Required, but never shown

























        6 Answers
        6






        active

        oldest

        votes








        6 Answers
        6






        active

        oldest

        votes









        active

        oldest

        votes






        active

        oldest

        votes









        6












        $begingroup$

        This seems like a job for Principal Component Analysis. In Scikit is PCA implemented well and it helped me many times.



        PCA, in a certain way, combines your features. By limiting the number of components, you fetch your model with noise-less data (in the best case). Because your model is as good as your data are.



        Consider below a simple example.



        from sklearn.pipeline import Pipeline
        pipe_rf = Pipeline([('pca', PCA(n_components=80)),
        ('clf',RandomForestClassifier(n_estimators=100))])
        pipe_rf.fit(X_train_s,y_train_s)

        pred = pipe_rf.predict(X_test)


        Why I picked 80? When I plot cumulative variance, I got this below, which tells me that with ~80 components, I reach almost all the variance.
        cumulative variance



        So I would say give it a try, use it in your models. It should help.






        share|improve this answer











        $endgroup$

















          6












          $begingroup$

          This seems like a job for Principal Component Analysis. In Scikit is PCA implemented well and it helped me many times.



          PCA, in a certain way, combines your features. By limiting the number of components, you fetch your model with noise-less data (in the best case). Because your model is as good as your data are.



          Consider below a simple example.



          from sklearn.pipeline import Pipeline
          pipe_rf = Pipeline([('pca', PCA(n_components=80)),
          ('clf',RandomForestClassifier(n_estimators=100))])
          pipe_rf.fit(X_train_s,y_train_s)

          pred = pipe_rf.predict(X_test)


          Why I picked 80? When I plot cumulative variance, I got this below, which tells me that with ~80 components, I reach almost all the variance.
          cumulative variance



          So I would say give it a try, use it in your models. It should help.






          share|improve this answer











          $endgroup$















            6












            6








            6





            $begingroup$

            This seems like a job for Principal Component Analysis. In Scikit is PCA implemented well and it helped me many times.



            PCA, in a certain way, combines your features. By limiting the number of components, you fetch your model with noise-less data (in the best case). Because your model is as good as your data are.



            Consider below a simple example.



            from sklearn.pipeline import Pipeline
            pipe_rf = Pipeline([('pca', PCA(n_components=80)),
            ('clf',RandomForestClassifier(n_estimators=100))])
            pipe_rf.fit(X_train_s,y_train_s)

            pred = pipe_rf.predict(X_test)


            Why I picked 80? When I plot cumulative variance, I got this below, which tells me that with ~80 components, I reach almost all the variance.
            cumulative variance



            So I would say give it a try, use it in your models. It should help.






            share|improve this answer











            $endgroup$



            This seems like a job for Principal Component Analysis. In Scikit is PCA implemented well and it helped me many times.



            PCA, in a certain way, combines your features. By limiting the number of components, you fetch your model with noise-less data (in the best case). Because your model is as good as your data are.



            Consider below a simple example.



            from sklearn.pipeline import Pipeline
            pipe_rf = Pipeline([('pca', PCA(n_components=80)),
            ('clf',RandomForestClassifier(n_estimators=100))])
            pipe_rf.fit(X_train_s,y_train_s)

            pred = pipe_rf.predict(X_test)


            Why I picked 80? When I plot cumulative variance, I got this below, which tells me that with ~80 components, I reach almost all the variance.
            cumulative variance



            So I would say give it a try, use it in your models. It should help.







            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited Dec 29 '17 at 12:56

























            answered Apr 13 '16 at 12:54









            HonzaBHonzaB

            1,206514




            1,206514





















                3





                +25







                $begingroup$

                The best way to combine features is through ensemble methods.
                Basically there are three different methods: bagging, boosting and stacking.
                You can either use Adabbost augmented with feature selection (in this consider both sparse and dense features) or stacking based (random feature - random subspace)
                I prefer the second option you can train a set of base learners ( decisions. Trees) by using random subsets and random feature ( keep training base learners until you cover the whole set of features)
                The next step is to test the Training set to generate the meta data. Use this meta data to train a meta classifier.
                The meta classifier will figure out which feature is more important and what kind of relationship should be utilized






                share|improve this answer









                $endgroup$












                • $begingroup$
                  Can you please share the relevant documentation? Didn't exactly get you what you meant?
                  $endgroup$
                  – Sagar Waghmode
                  Apr 13 '16 at 6:04










                • $begingroup$
                  You can read an article about staking " issues in stacking techniques, 1999" read about stackingC . It is very important to know that I am talking about the whole vector (e.g. 1x36 in case of Hog) as a one feature, but not the dimensions within it. You need to track which feature used with which base learner. Be careful about the overfitting problem
                  $endgroup$
                  – Bashar Haddad
                  Apr 13 '16 at 16:15










                • $begingroup$
                  If you give more details about the database , number of classes, number of samples , code , what things you have tried , what things you noticed, do you have data imbalance problem, noisy samples ,... Etc . All these details are important and can help in selecting the best method. Give me more details if this ok and I may help in a better way
                  $endgroup$
                  – Bashar Haddad
                  Apr 13 '16 at 16:19















                3





                +25







                $begingroup$

                The best way to combine features is through ensemble methods.
                Basically there are three different methods: bagging, boosting and stacking.
                You can either use Adabbost augmented with feature selection (in this consider both sparse and dense features) or stacking based (random feature - random subspace)
                I prefer the second option you can train a set of base learners ( decisions. Trees) by using random subsets and random feature ( keep training base learners until you cover the whole set of features)
                The next step is to test the Training set to generate the meta data. Use this meta data to train a meta classifier.
                The meta classifier will figure out which feature is more important and what kind of relationship should be utilized






                share|improve this answer









                $endgroup$












                • $begingroup$
                  Can you please share the relevant documentation? Didn't exactly get you what you meant?
                  $endgroup$
                  – Sagar Waghmode
                  Apr 13 '16 at 6:04










                • $begingroup$
                  You can read an article about staking " issues in stacking techniques, 1999" read about stackingC . It is very important to know that I am talking about the whole vector (e.g. 1x36 in case of Hog) as a one feature, but not the dimensions within it. You need to track which feature used with which base learner. Be careful about the overfitting problem
                  $endgroup$
                  – Bashar Haddad
                  Apr 13 '16 at 16:15










                • $begingroup$
                  If you give more details about the database , number of classes, number of samples , code , what things you have tried , what things you noticed, do you have data imbalance problem, noisy samples ,... Etc . All these details are important and can help in selecting the best method. Give me more details if this ok and I may help in a better way
                  $endgroup$
                  – Bashar Haddad
                  Apr 13 '16 at 16:19













                3





                +25







                3





                +25



                3




                +25



                $begingroup$

                The best way to combine features is through ensemble methods.
                Basically there are three different methods: bagging, boosting and stacking.
                You can either use Adabbost augmented with feature selection (in this consider both sparse and dense features) or stacking based (random feature - random subspace)
                I prefer the second option you can train a set of base learners ( decisions. Trees) by using random subsets and random feature ( keep training base learners until you cover the whole set of features)
                The next step is to test the Training set to generate the meta data. Use this meta data to train a meta classifier.
                The meta classifier will figure out which feature is more important and what kind of relationship should be utilized






                share|improve this answer









                $endgroup$



                The best way to combine features is through ensemble methods.
                Basically there are three different methods: bagging, boosting and stacking.
                You can either use Adabbost augmented with feature selection (in this consider both sparse and dense features) or stacking based (random feature - random subspace)
                I prefer the second option you can train a set of base learners ( decisions. Trees) by using random subsets and random feature ( keep training base learners until you cover the whole set of features)
                The next step is to test the Training set to generate the meta data. Use this meta data to train a meta classifier.
                The meta classifier will figure out which feature is more important and what kind of relationship should be utilized







                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Apr 12 '16 at 4:44









                Bashar HaddadBashar Haddad

                1,2721413




                1,2721413











                • $begingroup$
                  Can you please share the relevant documentation? Didn't exactly get you what you meant?
                  $endgroup$
                  – Sagar Waghmode
                  Apr 13 '16 at 6:04










                • $begingroup$
                  You can read an article about staking " issues in stacking techniques, 1999" read about stackingC . It is very important to know that I am talking about the whole vector (e.g. 1x36 in case of Hog) as a one feature, but not the dimensions within it. You need to track which feature used with which base learner. Be careful about the overfitting problem
                  $endgroup$
                  – Bashar Haddad
                  Apr 13 '16 at 16:15










                • $begingroup$
                  If you give more details about the database , number of classes, number of samples , code , what things you have tried , what things you noticed, do you have data imbalance problem, noisy samples ,... Etc . All these details are important and can help in selecting the best method. Give me more details if this ok and I may help in a better way
                  $endgroup$
                  – Bashar Haddad
                  Apr 13 '16 at 16:19
















                • $begingroup$
                  Can you please share the relevant documentation? Didn't exactly get you what you meant?
                  $endgroup$
                  – Sagar Waghmode
                  Apr 13 '16 at 6:04










                • $begingroup$
                  You can read an article about staking " issues in stacking techniques, 1999" read about stackingC . It is very important to know that I am talking about the whole vector (e.g. 1x36 in case of Hog) as a one feature, but not the dimensions within it. You need to track which feature used with which base learner. Be careful about the overfitting problem
                  $endgroup$
                  – Bashar Haddad
                  Apr 13 '16 at 16:15










                • $begingroup$
                  If you give more details about the database , number of classes, number of samples , code , what things you have tried , what things you noticed, do you have data imbalance problem, noisy samples ,... Etc . All these details are important and can help in selecting the best method. Give me more details if this ok and I may help in a better way
                  $endgroup$
                  – Bashar Haddad
                  Apr 13 '16 at 16:19















                $begingroup$
                Can you please share the relevant documentation? Didn't exactly get you what you meant?
                $endgroup$
                – Sagar Waghmode
                Apr 13 '16 at 6:04




                $begingroup$
                Can you please share the relevant documentation? Didn't exactly get you what you meant?
                $endgroup$
                – Sagar Waghmode
                Apr 13 '16 at 6:04












                $begingroup$
                You can read an article about staking " issues in stacking techniques, 1999" read about stackingC . It is very important to know that I am talking about the whole vector (e.g. 1x36 in case of Hog) as a one feature, but not the dimensions within it. You need to track which feature used with which base learner. Be careful about the overfitting problem
                $endgroup$
                – Bashar Haddad
                Apr 13 '16 at 16:15




                $begingroup$
                You can read an article about staking " issues in stacking techniques, 1999" read about stackingC . It is very important to know that I am talking about the whole vector (e.g. 1x36 in case of Hog) as a one feature, but not the dimensions within it. You need to track which feature used with which base learner. Be careful about the overfitting problem
                $endgroup$
                – Bashar Haddad
                Apr 13 '16 at 16:15












                $begingroup$
                If you give more details about the database , number of classes, number of samples , code , what things you have tried , what things you noticed, do you have data imbalance problem, noisy samples ,... Etc . All these details are important and can help in selecting the best method. Give me more details if this ok and I may help in a better way
                $endgroup$
                – Bashar Haddad
                Apr 13 '16 at 16:19




                $begingroup$
                If you give more details about the database , number of classes, number of samples , code , what things you have tried , what things you noticed, do you have data imbalance problem, noisy samples ,... Etc . All these details are important and can help in selecting the best method. Give me more details if this ok and I may help in a better way
                $endgroup$
                – Bashar Haddad
                Apr 13 '16 at 16:19











                1












                $begingroup$

                The variable groups may be multicollinear or the conversion between sparse and dense might go wrong. Have you thought about using a voting classifier/ ensemble classification? http://scikit-learn.org/stable/modules/ensemble.html
                That way you could deal with both above problems.






                share|improve this answer









                $endgroup$












                • $begingroup$
                  I have already tried out the ensemble techniques as well as voting classifiers. Still no luck.
                  $endgroup$
                  – Sagar Waghmode
                  Apr 12 '16 at 8:15










                • $begingroup$
                  So do you see a lot of overlap then between the predictions from the two datasets? May be there indeed is no new information? I.e. the data tells the same story.
                  $endgroup$
                  – Diego
                  Apr 12 '16 at 9:20










                • $begingroup$
                  yes, I have done exactly that. Though the predictions are not entirely different, the number of samples where predictions differ are quite high (around 15-20%) of the data. For these samples model with sparse features performs better than that of model with dense features. My point is if sparse features perform better, why don't they come as important features in any of the models which I have tried so far.
                  $endgroup$
                  – Sagar Waghmode
                  Apr 12 '16 at 9:31










                • $begingroup$
                  What predictor algorithm do you use?
                  $endgroup$
                  – Diego
                  Apr 12 '16 at 12:21










                • $begingroup$
                  I have tried out quite a few algorithms and settled on Gradient Boosted Model, also I do use Random Forests quite a lot for my problem.
                  $endgroup$
                  – Sagar Waghmode
                  Apr 12 '16 at 17:27















                1












                $begingroup$

                The variable groups may be multicollinear or the conversion between sparse and dense might go wrong. Have you thought about using a voting classifier/ ensemble classification? http://scikit-learn.org/stable/modules/ensemble.html
                That way you could deal with both above problems.






                share|improve this answer









                $endgroup$












                • $begingroup$
                  I have already tried out the ensemble techniques as well as voting classifiers. Still no luck.
                  $endgroup$
                  – Sagar Waghmode
                  Apr 12 '16 at 8:15










                • $begingroup$
                  So do you see a lot of overlap then between the predictions from the two datasets? May be there indeed is no new information? I.e. the data tells the same story.
                  $endgroup$
                  – Diego
                  Apr 12 '16 at 9:20










                • $begingroup$
                  yes, I have done exactly that. Though the predictions are not entirely different, the number of samples where predictions differ are quite high (around 15-20%) of the data. For these samples model with sparse features performs better than that of model with dense features. My point is if sparse features perform better, why don't they come as important features in any of the models which I have tried so far.
                  $endgroup$
                  – Sagar Waghmode
                  Apr 12 '16 at 9:31










                • $begingroup$
                  What predictor algorithm do you use?
                  $endgroup$
                  – Diego
                  Apr 12 '16 at 12:21










                • $begingroup$
                  I have tried out quite a few algorithms and settled on Gradient Boosted Model, also I do use Random Forests quite a lot for my problem.
                  $endgroup$
                  – Sagar Waghmode
                  Apr 12 '16 at 17:27













                1












                1








                1





                $begingroup$

                The variable groups may be multicollinear or the conversion between sparse and dense might go wrong. Have you thought about using a voting classifier/ ensemble classification? http://scikit-learn.org/stable/modules/ensemble.html
                That way you could deal with both above problems.






                share|improve this answer









                $endgroup$



                The variable groups may be multicollinear or the conversion between sparse and dense might go wrong. Have you thought about using a voting classifier/ ensemble classification? http://scikit-learn.org/stable/modules/ensemble.html
                That way you could deal with both above problems.







                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Apr 12 '16 at 4:30









                DiegoDiego

                52528




                52528











                • $begingroup$
                  I have already tried out the ensemble techniques as well as voting classifiers. Still no luck.
                  $endgroup$
                  – Sagar Waghmode
                  Apr 12 '16 at 8:15










                • $begingroup$
                  So do you see a lot of overlap then between the predictions from the two datasets? May be there indeed is no new information? I.e. the data tells the same story.
                  $endgroup$
                  – Diego
                  Apr 12 '16 at 9:20










                • $begingroup$
                  yes, I have done exactly that. Though the predictions are not entirely different, the number of samples where predictions differ are quite high (around 15-20%) of the data. For these samples model with sparse features performs better than that of model with dense features. My point is if sparse features perform better, why don't they come as important features in any of the models which I have tried so far.
                  $endgroup$
                  – Sagar Waghmode
                  Apr 12 '16 at 9:31










                • $begingroup$
                  What predictor algorithm do you use?
                  $endgroup$
                  – Diego
                  Apr 12 '16 at 12:21










                • $begingroup$
                  I have tried out quite a few algorithms and settled on Gradient Boosted Model, also I do use Random Forests quite a lot for my problem.
                  $endgroup$
                  – Sagar Waghmode
                  Apr 12 '16 at 17:27
















                • $begingroup$
                  I have already tried out the ensemble techniques as well as voting classifiers. Still no luck.
                  $endgroup$
                  – Sagar Waghmode
                  Apr 12 '16 at 8:15










                • $begingroup$
                  So do you see a lot of overlap then between the predictions from the two datasets? May be there indeed is no new information? I.e. the data tells the same story.
                  $endgroup$
                  – Diego
                  Apr 12 '16 at 9:20










                • $begingroup$
                  yes, I have done exactly that. Though the predictions are not entirely different, the number of samples where predictions differ are quite high (around 15-20%) of the data. For these samples model with sparse features performs better than that of model with dense features. My point is if sparse features perform better, why don't they come as important features in any of the models which I have tried so far.
                  $endgroup$
                  – Sagar Waghmode
                  Apr 12 '16 at 9:31










                • $begingroup$
                  What predictor algorithm do you use?
                  $endgroup$
                  – Diego
                  Apr 12 '16 at 12:21










                • $begingroup$
                  I have tried out quite a few algorithms and settled on Gradient Boosted Model, also I do use Random Forests quite a lot for my problem.
                  $endgroup$
                  – Sagar Waghmode
                  Apr 12 '16 at 17:27















                $begingroup$
                I have already tried out the ensemble techniques as well as voting classifiers. Still no luck.
                $endgroup$
                – Sagar Waghmode
                Apr 12 '16 at 8:15




                $begingroup$
                I have already tried out the ensemble techniques as well as voting classifiers. Still no luck.
                $endgroup$
                – Sagar Waghmode
                Apr 12 '16 at 8:15












                $begingroup$
                So do you see a lot of overlap then between the predictions from the two datasets? May be there indeed is no new information? I.e. the data tells the same story.
                $endgroup$
                – Diego
                Apr 12 '16 at 9:20




                $begingroup$
                So do you see a lot of overlap then between the predictions from the two datasets? May be there indeed is no new information? I.e. the data tells the same story.
                $endgroup$
                – Diego
                Apr 12 '16 at 9:20












                $begingroup$
                yes, I have done exactly that. Though the predictions are not entirely different, the number of samples where predictions differ are quite high (around 15-20%) of the data. For these samples model with sparse features performs better than that of model with dense features. My point is if sparse features perform better, why don't they come as important features in any of the models which I have tried so far.
                $endgroup$
                – Sagar Waghmode
                Apr 12 '16 at 9:31




                $begingroup$
                yes, I have done exactly that. Though the predictions are not entirely different, the number of samples where predictions differ are quite high (around 15-20%) of the data. For these samples model with sparse features performs better than that of model with dense features. My point is if sparse features perform better, why don't they come as important features in any of the models which I have tried so far.
                $endgroup$
                – Sagar Waghmode
                Apr 12 '16 at 9:31












                $begingroup$
                What predictor algorithm do you use?
                $endgroup$
                – Diego
                Apr 12 '16 at 12:21




                $begingroup$
                What predictor algorithm do you use?
                $endgroup$
                – Diego
                Apr 12 '16 at 12:21












                $begingroup$
                I have tried out quite a few algorithms and settled on Gradient Boosted Model, also I do use Random Forests quite a lot for my problem.
                $endgroup$
                – Sagar Waghmode
                Apr 12 '16 at 17:27




                $begingroup$
                I have tried out quite a few algorithms and settled on Gradient Boosted Model, also I do use Random Forests quite a lot for my problem.
                $endgroup$
                – Sagar Waghmode
                Apr 12 '16 at 17:27











                1












                $begingroup$

                In addition to some of the suggestions above, I would recommend using a two-step modeling approach.



                1. Use the sparse features first and develop the best model.

                2. Calculate the predicted probability from that model.

                3. Feed that probability estimate into the second model (as an input feature), which would incorporate the dense features. In other words, use all dense features and the probability estimate for building the second model.

                4. The final classification will then be based on the second model.





                share|improve this answer









                $endgroup$

















                  1












                  $begingroup$

                  In addition to some of the suggestions above, I would recommend using a two-step modeling approach.



                  1. Use the sparse features first and develop the best model.

                  2. Calculate the predicted probability from that model.

                  3. Feed that probability estimate into the second model (as an input feature), which would incorporate the dense features. In other words, use all dense features and the probability estimate for building the second model.

                  4. The final classification will then be based on the second model.





                  share|improve this answer









                  $endgroup$















                    1












                    1








                    1





                    $begingroup$

                    In addition to some of the suggestions above, I would recommend using a two-step modeling approach.



                    1. Use the sparse features first and develop the best model.

                    2. Calculate the predicted probability from that model.

                    3. Feed that probability estimate into the second model (as an input feature), which would incorporate the dense features. In other words, use all dense features and the probability estimate for building the second model.

                    4. The final classification will then be based on the second model.





                    share|improve this answer









                    $endgroup$



                    In addition to some of the suggestions above, I would recommend using a two-step modeling approach.



                    1. Use the sparse features first and develop the best model.

                    2. Calculate the predicted probability from that model.

                    3. Feed that probability estimate into the second model (as an input feature), which would incorporate the dense features. In other words, use all dense features and the probability estimate for building the second model.

                    4. The final classification will then be based on the second model.






                    share|improve this answer












                    share|improve this answer



                    share|improve this answer










                    answered Apr 13 '16 at 17:24









                    VishalVishal

                    1634




                    1634





















                        0












                        $begingroup$

                        Try PCA only on sparse features, and combine PCA output with dense features.



                        So you'll get dense set of (original) features + dense set of features (which were originally sparse).



                        +1 for the question. Please update us with the results.






                        share|improve this answer









                        $endgroup$












                        • $begingroup$
                          Wow, this has actually brought down AUC :( Not sure, what it means, need to check the feature importance and all. But my philosophy is, out of around 2.3k sparse features, I used 1k features which were explaining 0.97 variance ratio, this loss of information may have brought down AUC.
                          $endgroup$
                          – Sagar Waghmode
                          Apr 18 '16 at 10:17










                        • $begingroup$
                          Interesting. Thanks for sharing. We have very similar dataset to yours (1k-2k sparse features). Just out of curiosity, how many principal componenets you have generated? If that number is too low, this may explain why AUC went down.
                          $endgroup$
                          – Tagar
                          Apr 18 '16 at 15:22










                        • $begingroup$
                          As I said already, I have generated 1k principal components which were explaining 0.97 variance.
                          $endgroup$
                          – Sagar Waghmode
                          Apr 18 '16 at 17:55















                        0












                        $begingroup$

                        Try PCA only on sparse features, and combine PCA output with dense features.



                        So you'll get dense set of (original) features + dense set of features (which were originally sparse).



                        +1 for the question. Please update us with the results.






                        share|improve this answer









                        $endgroup$












                        • $begingroup$
                          Wow, this has actually brought down AUC :( Not sure, what it means, need to check the feature importance and all. But my philosophy is, out of around 2.3k sparse features, I used 1k features which were explaining 0.97 variance ratio, this loss of information may have brought down AUC.
                          $endgroup$
                          – Sagar Waghmode
                          Apr 18 '16 at 10:17










                        • $begingroup$
                          Interesting. Thanks for sharing. We have very similar dataset to yours (1k-2k sparse features). Just out of curiosity, how many principal componenets you have generated? If that number is too low, this may explain why AUC went down.
                          $endgroup$
                          – Tagar
                          Apr 18 '16 at 15:22










                        • $begingroup$
                          As I said already, I have generated 1k principal components which were explaining 0.97 variance.
                          $endgroup$
                          – Sagar Waghmode
                          Apr 18 '16 at 17:55













                        0












                        0








                        0





                        $begingroup$

                        Try PCA only on sparse features, and combine PCA output with dense features.



                        So you'll get dense set of (original) features + dense set of features (which were originally sparse).



                        +1 for the question. Please update us with the results.






                        share|improve this answer









                        $endgroup$



                        Try PCA only on sparse features, and combine PCA output with dense features.



                        So you'll get dense set of (original) features + dense set of features (which were originally sparse).



                        +1 for the question. Please update us with the results.







                        share|improve this answer












                        share|improve this answer



                        share|improve this answer










                        answered Apr 18 '16 at 6:11









                        TagarTagar

                        153111




                        153111











                        • $begingroup$
                          Wow, this has actually brought down AUC :( Not sure, what it means, need to check the feature importance and all. But my philosophy is, out of around 2.3k sparse features, I used 1k features which were explaining 0.97 variance ratio, this loss of information may have brought down AUC.
                          $endgroup$
                          – Sagar Waghmode
                          Apr 18 '16 at 10:17










                        • $begingroup$
                          Interesting. Thanks for sharing. We have very similar dataset to yours (1k-2k sparse features). Just out of curiosity, how many principal componenets you have generated? If that number is too low, this may explain why AUC went down.
                          $endgroup$
                          – Tagar
                          Apr 18 '16 at 15:22










                        • $begingroup$
                          As I said already, I have generated 1k principal components which were explaining 0.97 variance.
                          $endgroup$
                          – Sagar Waghmode
                          Apr 18 '16 at 17:55
















                        • $begingroup$
                          Wow, this has actually brought down AUC :( Not sure, what it means, need to check the feature importance and all. But my philosophy is, out of around 2.3k sparse features, I used 1k features which were explaining 0.97 variance ratio, this loss of information may have brought down AUC.
                          $endgroup$
                          – Sagar Waghmode
                          Apr 18 '16 at 10:17










                        • $begingroup$
                          Interesting. Thanks for sharing. We have very similar dataset to yours (1k-2k sparse features). Just out of curiosity, how many principal componenets you have generated? If that number is too low, this may explain why AUC went down.
                          $endgroup$
                          – Tagar
                          Apr 18 '16 at 15:22










                        • $begingroup$
                          As I said already, I have generated 1k principal components which were explaining 0.97 variance.
                          $endgroup$
                          – Sagar Waghmode
                          Apr 18 '16 at 17:55















                        $begingroup$
                        Wow, this has actually brought down AUC :( Not sure, what it means, need to check the feature importance and all. But my philosophy is, out of around 2.3k sparse features, I used 1k features which were explaining 0.97 variance ratio, this loss of information may have brought down AUC.
                        $endgroup$
                        – Sagar Waghmode
                        Apr 18 '16 at 10:17




                        $begingroup$
                        Wow, this has actually brought down AUC :( Not sure, what it means, need to check the feature importance and all. But my philosophy is, out of around 2.3k sparse features, I used 1k features which were explaining 0.97 variance ratio, this loss of information may have brought down AUC.
                        $endgroup$
                        – Sagar Waghmode
                        Apr 18 '16 at 10:17












                        $begingroup$
                        Interesting. Thanks for sharing. We have very similar dataset to yours (1k-2k sparse features). Just out of curiosity, how many principal componenets you have generated? If that number is too low, this may explain why AUC went down.
                        $endgroup$
                        – Tagar
                        Apr 18 '16 at 15:22




                        $begingroup$
                        Interesting. Thanks for sharing. We have very similar dataset to yours (1k-2k sparse features). Just out of curiosity, how many principal componenets you have generated? If that number is too low, this may explain why AUC went down.
                        $endgroup$
                        – Tagar
                        Apr 18 '16 at 15:22












                        $begingroup$
                        As I said already, I have generated 1k principal components which were explaining 0.97 variance.
                        $endgroup$
                        – Sagar Waghmode
                        Apr 18 '16 at 17:55




                        $begingroup$
                        As I said already, I have generated 1k principal components which were explaining 0.97 variance.
                        $endgroup$
                        – Sagar Waghmode
                        Apr 18 '16 at 17:55











                        0












                        $begingroup$

                        i met the same problem, maybe simply put dense and sparse feature in a single model is not a good choice. maybe you can try wide and deep model. wide for sparse features and deep for dense features, if you tried this method, please tell me the answer.






                        share|improve this answer








                        New contributor




                        Jianye Ji is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                        Check out our Code of Conduct.






                        $endgroup$

















                          0












                          $begingroup$

                          i met the same problem, maybe simply put dense and sparse feature in a single model is not a good choice. maybe you can try wide and deep model. wide for sparse features and deep for dense features, if you tried this method, please tell me the answer.






                          share|improve this answer








                          New contributor




                          Jianye Ji is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                          Check out our Code of Conduct.






                          $endgroup$















                            0












                            0








                            0





                            $begingroup$

                            i met the same problem, maybe simply put dense and sparse feature in a single model is not a good choice. maybe you can try wide and deep model. wide for sparse features and deep for dense features, if you tried this method, please tell me the answer.






                            share|improve this answer








                            New contributor




                            Jianye Ji is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                            Check out our Code of Conduct.






                            $endgroup$



                            i met the same problem, maybe simply put dense and sparse feature in a single model is not a good choice. maybe you can try wide and deep model. wide for sparse features and deep for dense features, if you tried this method, please tell me the answer.







                            share|improve this answer








                            New contributor




                            Jianye Ji is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                            Check out our Code of Conduct.









                            share|improve this answer



                            share|improve this answer






                            New contributor




                            Jianye Ji is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                            Check out our Code of Conduct.









                            answered 35 mins ago









                            Jianye JiJianye Ji

                            1




                            1




                            New contributor




                            Jianye Ji is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                            Check out our Code of Conduct.





                            New contributor





                            Jianye Ji is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                            Check out our Code of Conduct.






                            Jianye Ji is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                            Check out our Code of Conduct.



























                                draft saved

                                draft discarded
















































                                Thanks for contributing an answer to Data Science Stack Exchange!


                                • Please be sure to answer the question. Provide details and share your research!

                                But avoid


                                • Asking for help, clarification, or responding to other answers.

                                • Making statements based on opinion; back them up with references or personal experience.

                                Use MathJax to format equations. MathJax reference.


                                To learn more, see our tips on writing great answers.




                                draft saved


                                draft discarded














                                StackExchange.ready(
                                function ()
                                StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f11060%2fmerging-sparse-and-dense-data-in-machine-learning-to-improve-the-performance%23new-answer', 'question_page');

                                );

                                Post as a guest















                                Required, but never shown





















































                                Required, but never shown














                                Required, but never shown












                                Required, but never shown







                                Required, but never shown

































                                Required, but never shown














                                Required, but never shown












                                Required, but never shown







                                Required, but never shown







                                Popular posts from this blog

                                Францішак Багушэвіч Змест Сям'я | Біяграфія | Творчасць | Мова Багушэвіча | Ацэнкі дзейнасці | Цікавыя факты | Спадчына | Выбраная бібліяграфія | Ушанаванне памяці | У філатэліі | Зноскі | Літаратура | Спасылкі | НавігацыяЛяхоўскі У. Рупіўся дзеля Бога і людзей: Жыццёвы шлях Лявона Вітан-Дубейкаўскага // Вольскі і Памідораў з песняй пра немца Адвакат, паэт, народны заступнік Ашмянскі веснікВ Минске появится площадь Богушевича и улица Сырокомли, Белорусская деловая газета, 19 июля 2001 г.Айцец беларускай нацыянальнай ідэі паўстаў у бронзе Сяргей Аляксандравіч Адашкевіч (1918, Мінск). 80-я гады. Бюст «Францішак Багушэвіч».Яўген Мікалаевіч Ціхановіч. «Партрэт Францішка Багушэвіча»Мікола Мікалаевіч Купава. «Партрэт зачынальніка новай беларускай літаратуры Францішка Багушэвіча»Уладзімір Іванавіч Мелехаў. На помніку «Змагарам за родную мову» Барэльеф «Францішак Багушэвіч»Памяць пра Багушэвіча на Віленшчыне Страчаная сталіца. Беларускія шыльды на вуліцах Вільні«Krynica». Ideologia i przywódcy białoruskiego katolicyzmuФранцішак БагушэвічТворы на knihi.comТворы Францішка Багушэвіча на bellib.byСодаль Уладзімір. Францішак Багушэвіч на Лідчыне;Луцкевіч Антон. Жыцьцё і творчасьць Фр. Багушэвіча ў успамінах ягоных сучасьнікаў // Запісы Беларускага Навуковага таварыства. Вільня, 1938. Сшытак 1. С. 16-34.Большая российская1188761710000 0000 5537 633Xn9209310021619551927869394п

                                Беларусь Змест Назва Гісторыя Геаграфія Сімволіка Дзяржаўны лад Палітычныя партыі Міжнароднае становішча і знешняя палітыка Адміністрацыйны падзел Насельніцтва Эканоміка Культура і грамадства Сацыяльная сфера Узброеныя сілы Заўвагі Літаратура Спасылкі НавігацыяHGЯOiТоп-2011 г. (па версіі ej.by)Топ-2013 г. (па версіі ej.by)Топ-2016 г. (па версіі ej.by)Топ-2017 г. (па версіі ej.by)Нацыянальны статыстычны камітэт Рэспублікі БеларусьШчыльнасць насельніцтва па краінахhttp://naviny.by/rubrics/society/2011/09/16/ic_articles_116_175144/А. Калечыц, У. Ксяндзоў. Спробы засялення краю неандэртальскім чалавекам.І ў Менску былі мамантыА. Калечыц, У. Ксяндзоў. Старажытны каменны век (палеаліт). Першапачатковае засяленне тэрыторыіГ. Штыхаў. Балты і славяне ў VI—VIII стст.М. Клімаў. Полацкае княства ў IX—XI стст.Г. Штыхаў, В. Ляўко. Палітычная гісторыя Полацкай зямліГ. Штыхаў. Дзяржаўны лад у землях-княствахГ. Штыхаў. Дзяржаўны лад у землях-княствахБеларускія землі ў складзе Вялікага Княства ЛітоўскагаЛюблінская унія 1569 г."The Early Stages of Independence"Zapomniane prawdy25 гадоў таму было аб'яўлена, што Язэп Пілсудскі — беларус (фота)Наша вадаДакументы ЧАЭС: Забруджванне тэрыторыі Беларусі « ЧАЭС Зона адчужэнняСведения о политических партиях, зарегистрированных в Республике Беларусь // Министерство юстиции Республики БеларусьСтатыстычны бюлетэнь „Полаўзроставая структура насельніцтва Рэспублікі Беларусь на 1 студзеня 2012 года і сярэднегадовая колькасць насельніцтва за 2011 год“Индекс человеческого развития Беларуси — не было бы нижеБеларусь занимает первое место в СНГ по индексу развития с учетом гендерного факцёраНацыянальны статыстычны камітэт Рэспублікі БеларусьКанстытуцыя РБ. Артыкул 17Трансфармацыйныя задачы БеларусіВыйсце з крызісу — далейшае рэфармаванне Беларускі рубель — сусветны лідар па дэвальвацыяхПра змену коштаў у кастрычніку 2011 г.Бядней за беларусаў у СНД толькі таджыкіСярэдні заробак у верасні дасягнуў 2,26 мільёна рублёўЭканомікаГаласуем за ТОП-100 беларускай прозыСучасныя беларускія мастакіАрхитектура Беларуси BELARUS.BYА. Каханоўскі. Культура Беларусі ўсярэдзіне XVII—XVIII ст.Анталогія беларускай народнай песні, гуказапісы спеваўБеларускія Музычныя IнструментыБеларускі рок, які мы страцілі. Топ-10 гуртоў«Мясцовы час» — нязгаслая легенда беларускай рок-музыкіСЯРГЕЙ БУДКІН. МЫ НЯ ЗНАЕМ СВАЁЙ МУЗЫКІМ. А. Каладзінскі. НАРОДНЫ ТЭАТРМагнацкія культурныя цэнтрыПублічная дыскусія «Беларуская новая пьеса: без беларускай мовы ці беларуская?»Беларускія драматургі па-ранейшаму лепш ставяцца за мяжой, чым на радзіме«Працэс незалежнага кіно пайшоў, і дзяржаву турбуе яго непадкантрольнасць»Беларускія філосафы ў пошуках прасторыВсе идём в библиотекуАрхіваванаАб Нацыянальнай праграме даследавання і выкарыстання касмічнай прасторы ў мірных мэтах на 2008—2012 гадыУ космас — разам.У суседнім з Барысаўскім раёне пабудуюць Камандна-вымяральны пунктСвяты і абрады беларусаў«Мірныя бульбашы з малой краіны» — 5 непраўдзівых стэрэатыпаў пра БеларусьМ. Раманюк. Беларускае народнае адзеннеУ Беларусі скарачаецца колькасць злачынстваўЛукашэнка незадаволены мінскімі ўладамі Крадзяжы складаюць у Мінску каля 70% злачынстваў Узровень злачыннасці ў Мінскай вобласці — адзін з самых высокіх у краіне Генпракуратура аналізуе стан са злачыннасцю ў Беларусі па каэфіцыенце злачыннасці У Беларусі стабілізавалася крымінагеннае становішча, лічыць генпракурорЗамежнікі сталі здзяйсняць у Беларусі больш злачынстваўМУС Беларусі турбуе рост рэцыдыўнай злачыннасціЯ з ЖЭСа. Дазволіце вас абкрасці! Рэйтынг усіх службаў і падраздзяленняў ГУУС Мінгарвыканкама вырасАб КДБ РБГісторыя Аператыўна-аналітычнага цэнтра РБГісторыя ДКФРТаможняagentura.ruБеларусьBelarus.by — Афіцыйны сайт Рэспублікі БеларусьСайт урада БеларусіRadzima.org — Збор архітэктурных помнікаў, гісторыя Беларусі«Глобус Беларуси»Гербы и флаги БеларусиАсаблівасці каменнага веку на БеларусіА. Калечыц, У. Ксяндзоў. Старажытны каменны век (палеаліт). Першапачатковае засяленне тэрыторыіУ. Ксяндзоў. Сярэдні каменны век (мезаліт). Засяленне краю плямёнамі паляўнічых, рыбакоў і збіральнікаўА. Калечыц, М. Чарняўскі. Плямёны на тэрыторыі Беларусі ў новым каменным веку (неаліце)А. Калечыц, У. Ксяндзоў, М. Чарняўскі. Гаспадарчыя заняткі ў каменным векуЭ. Зайкоўскі. Духоўная культура ў каменным векуАсаблівасці бронзавага веку на БеларусіФарміраванне супольнасцей ранняга перыяду бронзавага векуФотографии БеларусиРоля беларускіх зямель ва ўтварэнні і ўмацаванні ВКЛВ. Фадзеева. З гісторыі развіцця беларускай народнай вышыўкіDMOZGran catalanaБольшая российскаяBritannica (анлайн)Швейцарскі гістарычны15325917611952699xDA123282154079143-90000 0001 2171 2080n9112870100577502ge128882171858027501086026362074122714179пппппп

                                ValueError: Expected n_neighbors <= n_samples, but n_samples = 1, n_neighbors = 6 (SMOTE) The 2019 Stack Overflow Developer Survey Results Are InCan SMOTE be applied over sequence of words (sentences)?ValueError when doing validation with random forestsSMOTE and multi class oversamplingLogic behind SMOTE-NC?ValueError: Error when checking target: expected dense_1 to have shape (7,) but got array with shape (1,)SmoteBoost: Should SMOTE be ran individually for each iteration/tree in the boosting?solving multi-class imbalance classification using smote and OSSUsing SMOTE for Synthetic Data generation to improve performance on unbalanced dataproblem of entry format for a simple model in KerasSVM SMOTE fit_resample() function runs forever with no result