How to obtain original feature names after using one-hot encodingwhat is difference between one hot encoding and leave one out encoding?Decision Tree generating leaves for only one caseWhy don't tree ensembles require one-hot-encoding?Should columns with close to zero variance be removed before or after one hot encoding?adding logic combinations of boolean features in classificationAlways drop the first column after performing One Hot Encoding?Interpreting lasso logistic regression feature coefficients in multiclass problemValue of features is zero in Decision tree ClassifierDecision Tree Classifier For Minimizing Arbitrary Cost Functionreceive value error decision tree classifier after one-hot encoding

Can fracking help reduce CO2?

How to creep the reader out with what seems like a normal person?

Is creating your own "experiment" considered cheating during a physics exam?

Why was the Spitfire's elliptical wing almost uncopied by other aircraft of World War 2?

Is it possible to measure lightning discharges as Nikola Tesla?

Mysql fixing root password

When India mathematicians did know Euclid's Elements?

What does YCWCYODFTRFDTY mean?

Was it really necessary for the Lunar Module to have 2 stages?

Subtleties of choosing the sequence of tenses in Russian

Do I have to worry about players making “bad” choices on level up?

Will tsunami waves travel forever if there was no land?

Is it possible to dynamically set properties of an `Object` using Apex?

How does a swashbuckler fight with two weapons and safely dart away?

What is the point of Germany's 299 "party seats" in the Bundestag?

gnu parallel how to use with ffmpeg

Packing rectangles: Does rotation ever help?

Do generators produce a fixed load?

What is the strongest case that can be made in favour of the UK regaining some control over fishing policy after Brexit?

Can solid acids and bases have pH values? If not, how are they classified as acids or bases?

Bayes Nash Equilibria in Battle of Sexes

What are the spoon bit of a spoon and fork bit of a fork called?

Where did the extra Pym particles come from in Endgame?

Binary Numbers Magic Trick



How to obtain original feature names after using one-hot encoding


what is difference between one hot encoding and leave one out encoding?Decision Tree generating leaves for only one caseWhy don't tree ensembles require one-hot-encoding?Should columns with close to zero variance be removed before or after one hot encoding?adding logic combinations of boolean features in classificationAlways drop the first column after performing One Hot Encoding?Interpreting lasso logistic regression feature coefficients in multiclass problemValue of features is zero in Decision tree ClassifierDecision Tree Classifier For Minimizing Arbitrary Cost Functionreceive value error decision tree classifier after one-hot encoding













3












$begingroup$


This question is on an implementation aspect of sklearn DecisionTreeClassifier



How do I get the feature names ranked in descending order, from the feature_importances_ returned by the sklearn DecisionTreeClassifier?



The problem is that the input features to the classifier are not the original ones - they are numerical encoded one from pandas DataFrame get_dummies.



For example I take the mushroom dataset from the UCI repository.
Features in the dataset include - cap_shape, cap_surface, cap_color, odor etc.



pandas dataframe getdummies encodes these into multiple features based on values of the original features.
say cap_shape has values b,c,f,k .. after encoding new columns are cap_shape_b, cap_shape_c, cap_shape_f. Similar transformation happens for other features.



After training, the classifier tells me that the top two features are:
cap_shape_b, cap_shape_c, cap_shape_f, odor_a,odor_c,odor_f,odor_l.
From this result thrown by the classifier, I want my function to return the original features, that is, cap_shape and odor.










share|improve this question











$endgroup$
















    3












    $begingroup$


    This question is on an implementation aspect of sklearn DecisionTreeClassifier



    How do I get the feature names ranked in descending order, from the feature_importances_ returned by the sklearn DecisionTreeClassifier?



    The problem is that the input features to the classifier are not the original ones - they are numerical encoded one from pandas DataFrame get_dummies.



    For example I take the mushroom dataset from the UCI repository.
    Features in the dataset include - cap_shape, cap_surface, cap_color, odor etc.



    pandas dataframe getdummies encodes these into multiple features based on values of the original features.
    say cap_shape has values b,c,f,k .. after encoding new columns are cap_shape_b, cap_shape_c, cap_shape_f. Similar transformation happens for other features.



    After training, the classifier tells me that the top two features are:
    cap_shape_b, cap_shape_c, cap_shape_f, odor_a,odor_c,odor_f,odor_l.
    From this result thrown by the classifier, I want my function to return the original features, that is, cap_shape and odor.










    share|improve this question











    $endgroup$














      3












      3








      3





      $begingroup$


      This question is on an implementation aspect of sklearn DecisionTreeClassifier



      How do I get the feature names ranked in descending order, from the feature_importances_ returned by the sklearn DecisionTreeClassifier?



      The problem is that the input features to the classifier are not the original ones - they are numerical encoded one from pandas DataFrame get_dummies.



      For example I take the mushroom dataset from the UCI repository.
      Features in the dataset include - cap_shape, cap_surface, cap_color, odor etc.



      pandas dataframe getdummies encodes these into multiple features based on values of the original features.
      say cap_shape has values b,c,f,k .. after encoding new columns are cap_shape_b, cap_shape_c, cap_shape_f. Similar transformation happens for other features.



      After training, the classifier tells me that the top two features are:
      cap_shape_b, cap_shape_c, cap_shape_f, odor_a,odor_c,odor_f,odor_l.
      From this result thrown by the classifier, I want my function to return the original features, that is, cap_shape and odor.










      share|improve this question











      $endgroup$




      This question is on an implementation aspect of sklearn DecisionTreeClassifier



      How do I get the feature names ranked in descending order, from the feature_importances_ returned by the sklearn DecisionTreeClassifier?



      The problem is that the input features to the classifier are not the original ones - they are numerical encoded one from pandas DataFrame get_dummies.



      For example I take the mushroom dataset from the UCI repository.
      Features in the dataset include - cap_shape, cap_surface, cap_color, odor etc.



      pandas dataframe getdummies encodes these into multiple features based on values of the original features.
      say cap_shape has values b,c,f,k .. after encoding new columns are cap_shape_b, cap_shape_c, cap_shape_f. Similar transformation happens for other features.



      After training, the classifier tells me that the top two features are:
      cap_shape_b, cap_shape_c, cap_shape_f, odor_a,odor_c,odor_f,odor_l.
      From this result thrown by the classifier, I want my function to return the original features, that is, cap_shape and odor.







      feature-selection decision-trees dummy-variables






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited 46 mins ago









      Stephen Rauch

      1,53551330




      1,53551330










      asked Apr 29 '18 at 14:22









      S DattaS Datta

      163




      163




















          3 Answers
          3






          active

          oldest

          votes


















          1












          $begingroup$

          Consider using the one-hot encoder in category_encoders module for your encoding. It has an inverse_transform method which I believe will transform your one-hot encoded data back to its original form.






          share|improve this answer









          $endgroup$




















            0












            $begingroup$

            As shown in these docs: http://scikit-learn.org/stable/modules/tree.html#tips-on-practical-use at the section "Classification".



            You can export your tree using graphviz (it states that you have to install the graphviz package, too). And this way you're able to visualize the tree built by the algorithm.
            About the problem of the input features being transformed from the original ones it's a problem the algorithm can't help you with but you should be able to manage that by yourself if you've made the transformations yourself.



            Any further doubt, comment.






            share|improve this answer









            $endgroup$












            • $begingroup$
              Thank you for your reply. I have provided an example in the question. Hope this helps clarify what I am looking for.
              $endgroup$
              – S Datta
              Apr 30 '18 at 13:21











            • $begingroup$
              I saw your edit, if you build a mapping of the dummy variables you've created, you can create a function to return the original values but again, the classifier won't be able to predict based on the original values only the transformed features you've feed it on.
              $endgroup$
              – Felipe Bormann
              Apr 30 '18 at 13:45


















            0












            $begingroup$

            If you just need names of the original features you can use a regex to parse them out. You can easily decide a naming convention for transformed features (using the prefix parameter in get_dummies). After getting the scores, you can traverse the list of features in ascending/descending order and parse the column names using regex, use an ordered dict to store the results.



            If you need the whole dataset transformed back, then go with the inverse_transform method mentioned in other answers.






            share|improve this answer











            $endgroup$













              Your Answer








              StackExchange.ready(function()
              var channelOptions =
              tags: "".split(" "),
              id: "557"
              ;
              initTagRenderer("".split(" "), "".split(" "), channelOptions);

              StackExchange.using("externalEditor", function()
              // Have to fire editor after snippets, if snippets enabled
              if (StackExchange.settings.snippets.snippetsEnabled)
              StackExchange.using("snippets", function()
              createEditor();
              );

              else
              createEditor();

              );

              function createEditor()
              StackExchange.prepareEditor(
              heartbeatType: 'answer',
              autoActivateHeartbeat: false,
              convertImagesToLinks: false,
              noModals: true,
              showLowRepImageUploadWarning: true,
              reputationToPostImages: null,
              bindNavPrevention: true,
              postfix: "",
              imageUploader:
              brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
              contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
              allowUrls: true
              ,
              onDemand: true,
              discardSelector: ".discard-answer"
              ,immediatelyShowMarkdownHelp:true
              );



              );













              draft saved

              draft discarded


















              StackExchange.ready(
              function ()
              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f31005%2fhow-to-obtain-original-feature-names-after-using-one-hot-encoding%23new-answer', 'question_page');

              );

              Post as a guest















              Required, but never shown

























              3 Answers
              3






              active

              oldest

              votes








              3 Answers
              3






              active

              oldest

              votes









              active

              oldest

              votes






              active

              oldest

              votes









              1












              $begingroup$

              Consider using the one-hot encoder in category_encoders module for your encoding. It has an inverse_transform method which I believe will transform your one-hot encoded data back to its original form.






              share|improve this answer









              $endgroup$

















                1












                $begingroup$

                Consider using the one-hot encoder in category_encoders module for your encoding. It has an inverse_transform method which I believe will transform your one-hot encoded data back to its original form.






                share|improve this answer









                $endgroup$















                  1












                  1








                  1





                  $begingroup$

                  Consider using the one-hot encoder in category_encoders module for your encoding. It has an inverse_transform method which I believe will transform your one-hot encoded data back to its original form.






                  share|improve this answer









                  $endgroup$



                  Consider using the one-hot encoder in category_encoders module for your encoding. It has an inverse_transform method which I believe will transform your one-hot encoded data back to its original form.







                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Jun 29 '18 at 14:38









                  bradSbradS

                  720213




                  720213





















                      0












                      $begingroup$

                      As shown in these docs: http://scikit-learn.org/stable/modules/tree.html#tips-on-practical-use at the section "Classification".



                      You can export your tree using graphviz (it states that you have to install the graphviz package, too). And this way you're able to visualize the tree built by the algorithm.
                      About the problem of the input features being transformed from the original ones it's a problem the algorithm can't help you with but you should be able to manage that by yourself if you've made the transformations yourself.



                      Any further doubt, comment.






                      share|improve this answer









                      $endgroup$












                      • $begingroup$
                        Thank you for your reply. I have provided an example in the question. Hope this helps clarify what I am looking for.
                        $endgroup$
                        – S Datta
                        Apr 30 '18 at 13:21











                      • $begingroup$
                        I saw your edit, if you build a mapping of the dummy variables you've created, you can create a function to return the original values but again, the classifier won't be able to predict based on the original values only the transformed features you've feed it on.
                        $endgroup$
                        – Felipe Bormann
                        Apr 30 '18 at 13:45















                      0












                      $begingroup$

                      As shown in these docs: http://scikit-learn.org/stable/modules/tree.html#tips-on-practical-use at the section "Classification".



                      You can export your tree using graphviz (it states that you have to install the graphviz package, too). And this way you're able to visualize the tree built by the algorithm.
                      About the problem of the input features being transformed from the original ones it's a problem the algorithm can't help you with but you should be able to manage that by yourself if you've made the transformations yourself.



                      Any further doubt, comment.






                      share|improve this answer









                      $endgroup$












                      • $begingroup$
                        Thank you for your reply. I have provided an example in the question. Hope this helps clarify what I am looking for.
                        $endgroup$
                        – S Datta
                        Apr 30 '18 at 13:21











                      • $begingroup$
                        I saw your edit, if you build a mapping of the dummy variables you've created, you can create a function to return the original values but again, the classifier won't be able to predict based on the original values only the transformed features you've feed it on.
                        $endgroup$
                        – Felipe Bormann
                        Apr 30 '18 at 13:45













                      0












                      0








                      0





                      $begingroup$

                      As shown in these docs: http://scikit-learn.org/stable/modules/tree.html#tips-on-practical-use at the section "Classification".



                      You can export your tree using graphviz (it states that you have to install the graphviz package, too). And this way you're able to visualize the tree built by the algorithm.
                      About the problem of the input features being transformed from the original ones it's a problem the algorithm can't help you with but you should be able to manage that by yourself if you've made the transformations yourself.



                      Any further doubt, comment.






                      share|improve this answer









                      $endgroup$



                      As shown in these docs: http://scikit-learn.org/stable/modules/tree.html#tips-on-practical-use at the section "Classification".



                      You can export your tree using graphviz (it states that you have to install the graphviz package, too). And this way you're able to visualize the tree built by the algorithm.
                      About the problem of the input features being transformed from the original ones it's a problem the algorithm can't help you with but you should be able to manage that by yourself if you've made the transformations yourself.



                      Any further doubt, comment.







                      share|improve this answer












                      share|improve this answer



                      share|improve this answer










                      answered Apr 29 '18 at 17:35









                      Felipe BormannFelipe Bormann

                      36117




                      36117











                      • $begingroup$
                        Thank you for your reply. I have provided an example in the question. Hope this helps clarify what I am looking for.
                        $endgroup$
                        – S Datta
                        Apr 30 '18 at 13:21











                      • $begingroup$
                        I saw your edit, if you build a mapping of the dummy variables you've created, you can create a function to return the original values but again, the classifier won't be able to predict based on the original values only the transformed features you've feed it on.
                        $endgroup$
                        – Felipe Bormann
                        Apr 30 '18 at 13:45
















                      • $begingroup$
                        Thank you for your reply. I have provided an example in the question. Hope this helps clarify what I am looking for.
                        $endgroup$
                        – S Datta
                        Apr 30 '18 at 13:21











                      • $begingroup$
                        I saw your edit, if you build a mapping of the dummy variables you've created, you can create a function to return the original values but again, the classifier won't be able to predict based on the original values only the transformed features you've feed it on.
                        $endgroup$
                        – Felipe Bormann
                        Apr 30 '18 at 13:45















                      $begingroup$
                      Thank you for your reply. I have provided an example in the question. Hope this helps clarify what I am looking for.
                      $endgroup$
                      – S Datta
                      Apr 30 '18 at 13:21





                      $begingroup$
                      Thank you for your reply. I have provided an example in the question. Hope this helps clarify what I am looking for.
                      $endgroup$
                      – S Datta
                      Apr 30 '18 at 13:21













                      $begingroup$
                      I saw your edit, if you build a mapping of the dummy variables you've created, you can create a function to return the original values but again, the classifier won't be able to predict based on the original values only the transformed features you've feed it on.
                      $endgroup$
                      – Felipe Bormann
                      Apr 30 '18 at 13:45




                      $begingroup$
                      I saw your edit, if you build a mapping of the dummy variables you've created, you can create a function to return the original values but again, the classifier won't be able to predict based on the original values only the transformed features you've feed it on.
                      $endgroup$
                      – Felipe Bormann
                      Apr 30 '18 at 13:45











                      0












                      $begingroup$

                      If you just need names of the original features you can use a regex to parse them out. You can easily decide a naming convention for transformed features (using the prefix parameter in get_dummies). After getting the scores, you can traverse the list of features in ascending/descending order and parse the column names using regex, use an ordered dict to store the results.



                      If you need the whole dataset transformed back, then go with the inverse_transform method mentioned in other answers.






                      share|improve this answer











                      $endgroup$

















                        0












                        $begingroup$

                        If you just need names of the original features you can use a regex to parse them out. You can easily decide a naming convention for transformed features (using the prefix parameter in get_dummies). After getting the scores, you can traverse the list of features in ascending/descending order and parse the column names using regex, use an ordered dict to store the results.



                        If you need the whole dataset transformed back, then go with the inverse_transform method mentioned in other answers.






                        share|improve this answer











                        $endgroup$















                          0












                          0








                          0





                          $begingroup$

                          If you just need names of the original features you can use a regex to parse them out. You can easily decide a naming convention for transformed features (using the prefix parameter in get_dummies). After getting the scores, you can traverse the list of features in ascending/descending order and parse the column names using regex, use an ordered dict to store the results.



                          If you need the whole dataset transformed back, then go with the inverse_transform method mentioned in other answers.






                          share|improve this answer











                          $endgroup$



                          If you just need names of the original features you can use a regex to parse them out. You can easily decide a naming convention for transformed features (using the prefix parameter in get_dummies). After getting the scores, you can traverse the list of features in ascending/descending order and parse the column names using regex, use an ordered dict to store the results.



                          If you need the whole dataset transformed back, then go with the inverse_transform method mentioned in other answers.







                          share|improve this answer














                          share|improve this answer



                          share|improve this answer








                          edited Oct 27 '18 at 17:57









                          Stephen Rauch

                          1,53551330




                          1,53551330










                          answered Oct 27 '18 at 17:35









                          Himanshu MisraHimanshu Misra

                          11




                          11



























                              draft saved

                              draft discarded
















































                              Thanks for contributing an answer to Data Science Stack Exchange!


                              • Please be sure to answer the question. Provide details and share your research!

                              But avoid


                              • Asking for help, clarification, or responding to other answers.

                              • Making statements based on opinion; back them up with references or personal experience.

                              Use MathJax to format equations. MathJax reference.


                              To learn more, see our tips on writing great answers.




                              draft saved


                              draft discarded














                              StackExchange.ready(
                              function ()
                              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f31005%2fhow-to-obtain-original-feature-names-after-using-one-hot-encoding%23new-answer', 'question_page');

                              );

                              Post as a guest















                              Required, but never shown





















































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown

































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown







                              Popular posts from this blog

                              Францішак Багушэвіч Змест Сям'я | Біяграфія | Творчасць | Мова Багушэвіча | Ацэнкі дзейнасці | Цікавыя факты | Спадчына | Выбраная бібліяграфія | Ушанаванне памяці | У філатэліі | Зноскі | Літаратура | Спасылкі | НавігацыяЛяхоўскі У. Рупіўся дзеля Бога і людзей: Жыццёвы шлях Лявона Вітан-Дубейкаўскага // Вольскі і Памідораў з песняй пра немца Адвакат, паэт, народны заступнік Ашмянскі веснікВ Минске появится площадь Богушевича и улица Сырокомли, Белорусская деловая газета, 19 июля 2001 г.Айцец беларускай нацыянальнай ідэі паўстаў у бронзе Сяргей Аляксандравіч Адашкевіч (1918, Мінск). 80-я гады. Бюст «Францішак Багушэвіч».Яўген Мікалаевіч Ціхановіч. «Партрэт Францішка Багушэвіча»Мікола Мікалаевіч Купава. «Партрэт зачынальніка новай беларускай літаратуры Францішка Багушэвіча»Уладзімір Іванавіч Мелехаў. На помніку «Змагарам за родную мову» Барэльеф «Францішак Багушэвіч»Памяць пра Багушэвіча на Віленшчыне Страчаная сталіца. Беларускія шыльды на вуліцах Вільні«Krynica». Ideologia i przywódcy białoruskiego katolicyzmuФранцішак БагушэвічТворы на knihi.comТворы Францішка Багушэвіча на bellib.byСодаль Уладзімір. Францішак Багушэвіч на Лідчыне;Луцкевіч Антон. Жыцьцё і творчасьць Фр. Багушэвіча ў успамінах ягоных сучасьнікаў // Запісы Беларускага Навуковага таварыства. Вільня, 1938. Сшытак 1. С. 16-34.Большая российская1188761710000 0000 5537 633Xn9209310021619551927869394п

                              Partai Komunis Tiongkok Daftar isi Kepemimpinan | Pranala luar | Referensi | Menu navigasidiperiksa1 perubahan tertundacpc.people.com.cnSitus resmiSurat kabar resmi"Why the Communist Party is alive, well and flourishing in China"0307-1235"Full text of Constitution of Communist Party of China"smengembangkannyas

                              ValueError: Expected n_neighbors <= n_samples, but n_samples = 1, n_neighbors = 6 (SMOTE) The 2019 Stack Overflow Developer Survey Results Are InCan SMOTE be applied over sequence of words (sentences)?ValueError when doing validation with random forestsSMOTE and multi class oversamplingLogic behind SMOTE-NC?ValueError: Error when checking target: expected dense_1 to have shape (7,) but got array with shape (1,)SmoteBoost: Should SMOTE be ran individually for each iteration/tree in the boosting?solving multi-class imbalance classification using smote and OSSUsing SMOTE for Synthetic Data generation to improve performance on unbalanced dataproblem of entry format for a simple model in KerasSVM SMOTE fit_resample() function runs forever with no result