How to obtain original feature names after using one-hot encodingwhat is difference between one hot encoding and leave one out encoding?Decision Tree generating leaves for only one caseWhy don't tree ensembles require one-hot-encoding?Should columns with close to zero variance be removed before or after one hot encoding?adding logic combinations of boolean features in classificationAlways drop the first column after performing One Hot Encoding?Interpreting lasso logistic regression feature coefficients in multiclass problemValue of features is zero in Decision tree ClassifierDecision Tree Classifier For Minimizing Arbitrary Cost Functionreceive value error decision tree classifier after one-hot encoding

Can fracking help reduce CO2?

How to creep the reader out with what seems like a normal person?

Is creating your own "experiment" considered cheating during a physics exam?

Why was the Spitfire's elliptical wing almost uncopied by other aircraft of World War 2?

Is it possible to measure lightning discharges as Nikola Tesla?

Mysql fixing root password

When India mathematicians did know Euclid's Elements?

What does YCWCYODFTRFDTY mean?

Was it really necessary for the Lunar Module to have 2 stages?

Subtleties of choosing the sequence of tenses in Russian

Do I have to worry about players making “bad” choices on level up?

Will tsunami waves travel forever if there was no land?

Is it possible to dynamically set properties of an `Object` using Apex?

How does a swashbuckler fight with two weapons and safely dart away?

What is the point of Germany's 299 "party seats" in the Bundestag?

gnu parallel how to use with ffmpeg

Packing rectangles: Does rotation ever help?

Do generators produce a fixed load?

What is the strongest case that can be made in favour of the UK regaining some control over fishing policy after Brexit?

Can solid acids and bases have pH values? If not, how are they classified as acids or bases?

Bayes Nash Equilibria in Battle of Sexes

What are the spoon bit of a spoon and fork bit of a fork called?

Where did the extra Pym particles come from in Endgame?

Binary Numbers Magic Trick

How to obtain original feature names after using one-hot encoding

what is difference between one hot encoding and leave one out encoding?Decision Tree generating leaves for only one caseWhy don't tree ensembles require one-hot-encoding?Should columns with close to zero variance be removed before or after one hot encoding?adding logic combinations of boolean features in classificationAlways drop the first column after performing One Hot Encoding?Interpreting lasso logistic regression feature coefficients in multiclass problemValue of features is zero in Decision tree ClassifierDecision Tree Classifier For Minimizing Arbitrary Cost Functionreceive value error decision tree classifier after one-hot encoding

This question is on an implementation aspect of sklearn DecisionTreeClassifier

How do I get the feature names ranked in descending order, from the feature_importances_ returned by the sklearn DecisionTreeClassifier?

The problem is that the input features to the classifier are not the original ones - they are numerical encoded one from pandas DataFrame get_dummies.

For example I take the mushroom dataset from the UCI repository.
Features in the dataset include - cap_shape, cap_surface, cap_color, odor etc.

pandas dataframe getdummies encodes these into multiple features based on values of the original features.
say cap_shape has values b,c,f,k .. after encoding new columns are cap_shape_b, cap_shape_c, cap_shape_f. Similar transformation happens for other features.

After training, the classifier tells me that the top two features are:
cap_shape_b, cap_shape_c, cap_shape_f, odor_a,odor_c,odor_f,odor_l.
From this result thrown by the classifier, I want my function to return the original features, that is, cap_shape and odor.

edited 46 mins ago

Stephen Rauch♦

1,53551330

asked Apr 29 '18 at 14:22

S Datta

163

add a comment |

This question is on an implementation aspect of sklearn DecisionTreeClassifier

How do I get the feature names ranked in descending order, from the feature_importances_ returned by the sklearn DecisionTreeClassifier?

The problem is that the input features to the classifier are not the original ones - they are numerical encoded one from pandas DataFrame get_dummies.

For example I take the mushroom dataset from the UCI repository.
Features in the dataset include - cap_shape, cap_surface, cap_color, odor etc.

edited 46 mins ago

Stephen Rauch♦

1,53551330

asked Apr 29 '18 at 14:22

S Datta

163

add a comment |

This question is on an implementation aspect of sklearn DecisionTreeClassifier

How do I get the feature names ranked in descending order, from the feature_importances_ returned by the sklearn DecisionTreeClassifier?

The problem is that the input features to the classifier are not the original ones - they are numerical encoded one from pandas DataFrame get_dummies.

For example I take the mushroom dataset from the UCI repository.
Features in the dataset include - cap_shape, cap_surface, cap_color, odor etc.

edited 46 mins ago

Stephen Rauch♦

1,53551330

asked Apr 29 '18 at 14:22

S Datta

163

This question is on an implementation aspect of sklearn DecisionTreeClassifier

How do I get the feature names ranked in descending order, from the feature_importances_ returned by the sklearn DecisionTreeClassifier?

The problem is that the input features to the classifier are not the original ones - they are numerical encoded one from pandas DataFrame get_dummies.

For example I take the mushroom dataset from the UCI repository.
Features in the dataset include - cap_shape, cap_surface, cap_color, odor etc.

feature-selection decision-trees dummy-variables

edited 46 mins ago

Stephen Rauch♦

1,53551330

asked Apr 29 '18 at 14:22

S Datta

163

edited 46 mins ago

Stephen Rauch♦

1,53551330

asked Apr 29 '18 at 14:22

S Datta

163

edited 46 mins ago

Stephen Rauch♦

1,53551330

edited 46 mins ago

Stephen Rauch♦

1,53551330

edited 46 mins ago

Stephen Rauch♦

1,53551330

asked Apr 29 '18 at 14:22

S Datta

163

asked Apr 29 '18 at 14:22

S Datta

163

asked Apr 29 '18 at 14:22

S Datta

163

add a comment |

3 Answers
3

active

oldest

votes

Consider using the one-hot encoder in category_encoders module for your encoding. It has an inverse_transform method which I believe will transform your one-hot encoded data back to its original form.

answered Jun 29 '18 at 14:38

bradS

720213

add a comment |

As shown in these docs: http://scikit-learn.org/stable/modules/tree.html#tips-on-practical-use at the section "Classification".

You can export your tree using graphviz (it states that you have to install the graphviz package, too). And this way you're able to visualize the tree built by the algorithm.
About the problem of the input features being transformed from the original ones it's a problem the algorithm can't help you with but you should be able to manage that by yourself if you've made the transformations yourself.

Any further doubt, comment.

answered Apr 29 '18 at 17:35

Felipe Bormann

36117

$begingroup$
Thank you for your reply. I have provided an example in the question. Hope this helps clarify what I am looking for.
$endgroup$
– S Datta
Apr 30 '18 at 13:21

$begingroup$
I saw your edit, if you build a mapping of the dummy variables you've created, you can create a function to return the original values but again, the classifier won't be able to predict based on the original values only the transformed features you've feed it on.
$endgroup$
– Felipe Bormann
Apr 30 '18 at 13:45

add a comment |

If you just need names of the original features you can use a regex to parse them out. You can easily decide a naming convention for transformed features (using the prefix parameter in get_dummies). After getting the scores, you can traverse the list of features in ascending/descending order and parse the column names using regex, use an ordered dict to store the results.

If you need the whole dataset transformed back, then go with the inverse_transform method mentioned in other answers.

edited Oct 27 '18 at 17:57

Stephen Rauch♦

1,53551330

answered Oct 27 '18 at 17:35

Himanshu Misra

add a comment |

Your Answer

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f31005%2fhow-to-obtain-original-feature-names-after-using-one-hot-encoding%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

answered Jun 29 '18 at 14:38

bradS

720213

add a comment |

answered Jun 29 '18 at 14:38

bradS

720213

add a comment |

answered Jun 29 '18 at 14:38

bradS

720213

answered Jun 29 '18 at 14:38

bradS

720213

answered Jun 29 '18 at 14:38

bradS

720213

answered Jun 29 '18 at 14:38

bradS

720213

answered Jun 29 '18 at 14:38

bradS

720213

add a comment |

As shown in these docs: http://scikit-learn.org/stable/modules/tree.html#tips-on-practical-use at the section "Classification".

Any further doubt, comment.

answered Apr 29 '18 at 17:35

Felipe Bormann

36117

$begingroup$
Thank you for your reply. I have provided an example in the question. Hope this helps clarify what I am looking for.
$endgroup$
– S Datta
Apr 30 '18 at 13:21

$begingroup$
I saw your edit, if you build a mapping of the dummy variables you've created, you can create a function to return the original values but again, the classifier won't be able to predict based on the original values only the transformed features you've feed it on.
$endgroup$
– Felipe Bormann
Apr 30 '18 at 13:45

add a comment |

As shown in these docs: http://scikit-learn.org/stable/modules/tree.html#tips-on-practical-use at the section "Classification".

Any further doubt, comment.

answered Apr 29 '18 at 17:35

Felipe Bormann

36117

$begingroup$
Thank you for your reply. I have provided an example in the question. Hope this helps clarify what I am looking for.
$endgroup$
– S Datta
Apr 30 '18 at 13:21

$begingroup$
I saw your edit, if you build a mapping of the dummy variables you've created, you can create a function to return the original values but again, the classifier won't be able to predict based on the original values only the transformed features you've feed it on.
$endgroup$
– Felipe Bormann
Apr 30 '18 at 13:45

add a comment |

As shown in these docs: http://scikit-learn.org/stable/modules/tree.html#tips-on-practical-use at the section "Classification".

Any further doubt, comment.

answered Apr 29 '18 at 17:35

Felipe Bormann

36117

As shown in these docs: http://scikit-learn.org/stable/modules/tree.html#tips-on-practical-use at the section "Classification".

Any further doubt, comment.

answered Apr 29 '18 at 17:35

Felipe Bormann

36117

answered Apr 29 '18 at 17:35

Felipe Bormann

36117

answered Apr 29 '18 at 17:35

Felipe Bormann

36117

answered Apr 29 '18 at 17:35

Felipe Bormann

36117

$begingroup$
Thank you for your reply. I have provided an example in the question. Hope this helps clarify what I am looking for.
$endgroup$
– S Datta
Apr 30 '18 at 13:21

$begingroup$
I saw your edit, if you build a mapping of the dummy variables you've created, you can create a function to return the original values but again, the classifier won't be able to predict based on the original values only the transformed features you've feed it on.
$endgroup$
– Felipe Bormann
Apr 30 '18 at 13:45

add a comment |

$begingroup$
Thank you for your reply. I have provided an example in the question. Hope this helps clarify what I am looking for.
$endgroup$
– S Datta
Apr 30 '18 at 13:21

$begingroup$
I saw your edit, if you build a mapping of the dummy variables you've created, you can create a function to return the original values but again, the classifier won't be able to predict based on the original values only the transformed features you've feed it on.
$endgroup$
– Felipe Bormann
Apr 30 '18 at 13:45

Thank you for your reply. I have provided an example in the question. Hope this helps clarify what I am looking for.

– S Datta
Apr 30 '18 at 13:21

I saw your edit, if you build a mapping of the dummy variables you've created, you can create a function to return the original values but again, the classifier won't be able to predict based on the original values only the transformed features you've feed it on.

– Felipe Bormann
Apr 30 '18 at 13:45

add a comment |

If you need the whole dataset transformed back, then go with the inverse_transform method mentioned in other answers.

edited Oct 27 '18 at 17:57

Stephen Rauch♦

1,53551330

answered Oct 27 '18 at 17:35

Himanshu Misra

add a comment |

If you need the whole dataset transformed back, then go with the inverse_transform method mentioned in other answers.

edited Oct 27 '18 at 17:57

Stephen Rauch♦

1,53551330

answered Oct 27 '18 at 17:35

Himanshu Misra

add a comment |

If you need the whole dataset transformed back, then go with the inverse_transform method mentioned in other answers.

edited Oct 27 '18 at 17:57

Stephen Rauch♦

1,53551330

answered Oct 27 '18 at 17:35

Himanshu Misra

If you need the whole dataset transformed back, then go with the inverse_transform method mentioned in other answers.

edited Oct 27 '18 at 17:57

Stephen Rauch♦

1,53551330

answered Oct 27 '18 at 17:35

Himanshu Misra

edited Oct 27 '18 at 17:57

Stephen Rauch♦

1,53551330

edited Oct 27 '18 at 17:57

Stephen Rauch♦

1,53551330

edited Oct 27 '18 at 17:57

Stephen Rauch♦

1,53551330

answered Oct 27 '18 at 17:35

Himanshu Misra

answered Oct 27 '18 at 17:35

Himanshu Misra

answered Oct 27 '18 at 17:35

Himanshu Misra

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Data Science Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Hfrxdjt

3 Answers
3

Your Answer

Post as a guest

3 Answers
3

3 Answers
3

Post as a guest

Popular posts from this blog

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

3 Answers 3

3 Answers 3

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

3 Answers
3

3 Answers
3

3 Answers
3