How to obtain original feature names after using one-hot encodingwhat is difference between one hot encoding and leave one out encoding?Decision Tree generating leaves for only one caseWhy don't tree ensembles require one-hot-encoding?Should columns with close to zero variance be removed before or after one hot encoding?adding logic combinations of boolean features in classificationAlways drop the first column after performing One Hot Encoding?Interpreting lasso logistic regression feature coefficients in multiclass problemValue of features is zero in Decision tree ClassifierDecision Tree Classifier For Minimizing Arbitrary Cost Functionreceive value error decision tree classifier after one-hot encoding
Can fracking help reduce CO2?
How to creep the reader out with what seems like a normal person?
Is creating your own "experiment" considered cheating during a physics exam?
Why was the Spitfire's elliptical wing almost uncopied by other aircraft of World War 2?
Is it possible to measure lightning discharges as Nikola Tesla?
Mysql fixing root password
When India mathematicians did know Euclid's Elements?
What does YCWCYODFTRFDTY mean?
Was it really necessary for the Lunar Module to have 2 stages?
Subtleties of choosing the sequence of tenses in Russian
Do I have to worry about players making “bad” choices on level up?
Will tsunami waves travel forever if there was no land?
Is it possible to dynamically set properties of an `Object` using Apex?
How does a swashbuckler fight with two weapons and safely dart away?
What is the point of Germany's 299 "party seats" in the Bundestag?
gnu parallel how to use with ffmpeg
Packing rectangles: Does rotation ever help?
Do generators produce a fixed load?
What is the strongest case that can be made in favour of the UK regaining some control over fishing policy after Brexit?
Can solid acids and bases have pH values? If not, how are they classified as acids or bases?
Bayes Nash Equilibria in Battle of Sexes
What are the spoon bit of a spoon and fork bit of a fork called?
Where did the extra Pym particles come from in Endgame?
Binary Numbers Magic Trick
How to obtain original feature names after using one-hot encoding
what is difference between one hot encoding and leave one out encoding?Decision Tree generating leaves for only one caseWhy don't tree ensembles require one-hot-encoding?Should columns with close to zero variance be removed before or after one hot encoding?adding logic combinations of boolean features in classificationAlways drop the first column after performing One Hot Encoding?Interpreting lasso logistic regression feature coefficients in multiclass problemValue of features is zero in Decision tree ClassifierDecision Tree Classifier For Minimizing Arbitrary Cost Functionreceive value error decision tree classifier after one-hot encoding
$begingroup$
This question is on an implementation aspect of sklearn DecisionTreeClassifier
How do I get the feature names ranked in descending order, from the feature_importances_ returned by the sklearn DecisionTreeClassifier?
The problem is that the input features to the classifier are not the original ones - they are numerical encoded one from pandas DataFrame get_dummies.
For example I take the mushroom dataset from the UCI repository.
Features in the dataset include - cap_shape, cap_surface, cap_color, odor etc.
pandas dataframe getdummies encodes these into multiple features based on values of the original features.
say cap_shape has values b,c,f,k .. after encoding new columns are cap_shape_b, cap_shape_c, cap_shape_f. Similar transformation happens for other features.
After training, the classifier tells me that the top two features are:
cap_shape_b, cap_shape_c, cap_shape_f, odor_a,odor_c,odor_f,odor_l.
From this result thrown by the classifier, I want my function to return the original features, that is, cap_shape and odor.
feature-selection decision-trees dummy-variables
$endgroup$
add a comment |
$begingroup$
This question is on an implementation aspect of sklearn DecisionTreeClassifier
How do I get the feature names ranked in descending order, from the feature_importances_ returned by the sklearn DecisionTreeClassifier?
The problem is that the input features to the classifier are not the original ones - they are numerical encoded one from pandas DataFrame get_dummies.
For example I take the mushroom dataset from the UCI repository.
Features in the dataset include - cap_shape, cap_surface, cap_color, odor etc.
pandas dataframe getdummies encodes these into multiple features based on values of the original features.
say cap_shape has values b,c,f,k .. after encoding new columns are cap_shape_b, cap_shape_c, cap_shape_f. Similar transformation happens for other features.
After training, the classifier tells me that the top two features are:
cap_shape_b, cap_shape_c, cap_shape_f, odor_a,odor_c,odor_f,odor_l.
From this result thrown by the classifier, I want my function to return the original features, that is, cap_shape and odor.
feature-selection decision-trees dummy-variables
$endgroup$
add a comment |
$begingroup$
This question is on an implementation aspect of sklearn DecisionTreeClassifier
How do I get the feature names ranked in descending order, from the feature_importances_ returned by the sklearn DecisionTreeClassifier?
The problem is that the input features to the classifier are not the original ones - they are numerical encoded one from pandas DataFrame get_dummies.
For example I take the mushroom dataset from the UCI repository.
Features in the dataset include - cap_shape, cap_surface, cap_color, odor etc.
pandas dataframe getdummies encodes these into multiple features based on values of the original features.
say cap_shape has values b,c,f,k .. after encoding new columns are cap_shape_b, cap_shape_c, cap_shape_f. Similar transformation happens for other features.
After training, the classifier tells me that the top two features are:
cap_shape_b, cap_shape_c, cap_shape_f, odor_a,odor_c,odor_f,odor_l.
From this result thrown by the classifier, I want my function to return the original features, that is, cap_shape and odor.
feature-selection decision-trees dummy-variables
$endgroup$
This question is on an implementation aspect of sklearn DecisionTreeClassifier
How do I get the feature names ranked in descending order, from the feature_importances_ returned by the sklearn DecisionTreeClassifier?
The problem is that the input features to the classifier are not the original ones - they are numerical encoded one from pandas DataFrame get_dummies.
For example I take the mushroom dataset from the UCI repository.
Features in the dataset include - cap_shape, cap_surface, cap_color, odor etc.
pandas dataframe getdummies encodes these into multiple features based on values of the original features.
say cap_shape has values b,c,f,k .. after encoding new columns are cap_shape_b, cap_shape_c, cap_shape_f. Similar transformation happens for other features.
After training, the classifier tells me that the top two features are:
cap_shape_b, cap_shape_c, cap_shape_f, odor_a,odor_c,odor_f,odor_l.
From this result thrown by the classifier, I want my function to return the original features, that is, cap_shape and odor.
feature-selection decision-trees dummy-variables
feature-selection decision-trees dummy-variables
edited 46 mins ago
Stephen Rauch♦
1,53551330
1,53551330
asked Apr 29 '18 at 14:22
S DattaS Datta
163
163
add a comment |
add a comment |
3 Answers
3
active
oldest
votes
$begingroup$
Consider using the one-hot encoder in category_encoders
module for your encoding. It has an inverse_transform
method which I believe will transform your one-hot encoded data back to its original form.
$endgroup$
add a comment |
$begingroup$
As shown in these docs: http://scikit-learn.org/stable/modules/tree.html#tips-on-practical-use at the section "Classification".
You can export your tree using graphviz (it states that you have to install the graphviz package, too). And this way you're able to visualize the tree built by the algorithm.
About the problem of the input features being transformed from the original ones it's a problem the algorithm can't help you with but you should be able to manage that by yourself if you've made the transformations yourself.
Any further doubt, comment.
$endgroup$
$begingroup$
Thank you for your reply. I have provided an example in the question. Hope this helps clarify what I am looking for.
$endgroup$
– S Datta
Apr 30 '18 at 13:21
$begingroup$
I saw your edit, if you build a mapping of the dummy variables you've created, you can create a function to return the original values but again, the classifier won't be able to predict based on the original values only the transformed features you've feed it on.
$endgroup$
– Felipe Bormann
Apr 30 '18 at 13:45
add a comment |
$begingroup$
If you just need names of the original features you can use a regex to parse them out. You can easily decide a naming convention for transformed features (using the prefix parameter in get_dummies
). After getting the scores, you can traverse the list of features in ascending/descending order and parse the column names using regex, use an ordered dict to store the results.
If you need the whole dataset transformed back, then go with the inverse_transform
method mentioned in other answers.
$endgroup$
add a comment |
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f31005%2fhow-to-obtain-original-feature-names-after-using-one-hot-encoding%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
Consider using the one-hot encoder in category_encoders
module for your encoding. It has an inverse_transform
method which I believe will transform your one-hot encoded data back to its original form.
$endgroup$
add a comment |
$begingroup$
Consider using the one-hot encoder in category_encoders
module for your encoding. It has an inverse_transform
method which I believe will transform your one-hot encoded data back to its original form.
$endgroup$
add a comment |
$begingroup$
Consider using the one-hot encoder in category_encoders
module for your encoding. It has an inverse_transform
method which I believe will transform your one-hot encoded data back to its original form.
$endgroup$
Consider using the one-hot encoder in category_encoders
module for your encoding. It has an inverse_transform
method which I believe will transform your one-hot encoded data back to its original form.
answered Jun 29 '18 at 14:38
bradSbradS
720213
720213
add a comment |
add a comment |
$begingroup$
As shown in these docs: http://scikit-learn.org/stable/modules/tree.html#tips-on-practical-use at the section "Classification".
You can export your tree using graphviz (it states that you have to install the graphviz package, too). And this way you're able to visualize the tree built by the algorithm.
About the problem of the input features being transformed from the original ones it's a problem the algorithm can't help you with but you should be able to manage that by yourself if you've made the transformations yourself.
Any further doubt, comment.
$endgroup$
$begingroup$
Thank you for your reply. I have provided an example in the question. Hope this helps clarify what I am looking for.
$endgroup$
– S Datta
Apr 30 '18 at 13:21
$begingroup$
I saw your edit, if you build a mapping of the dummy variables you've created, you can create a function to return the original values but again, the classifier won't be able to predict based on the original values only the transformed features you've feed it on.
$endgroup$
– Felipe Bormann
Apr 30 '18 at 13:45
add a comment |
$begingroup$
As shown in these docs: http://scikit-learn.org/stable/modules/tree.html#tips-on-practical-use at the section "Classification".
You can export your tree using graphviz (it states that you have to install the graphviz package, too). And this way you're able to visualize the tree built by the algorithm.
About the problem of the input features being transformed from the original ones it's a problem the algorithm can't help you with but you should be able to manage that by yourself if you've made the transformations yourself.
Any further doubt, comment.
$endgroup$
$begingroup$
Thank you for your reply. I have provided an example in the question. Hope this helps clarify what I am looking for.
$endgroup$
– S Datta
Apr 30 '18 at 13:21
$begingroup$
I saw your edit, if you build a mapping of the dummy variables you've created, you can create a function to return the original values but again, the classifier won't be able to predict based on the original values only the transformed features you've feed it on.
$endgroup$
– Felipe Bormann
Apr 30 '18 at 13:45
add a comment |
$begingroup$
As shown in these docs: http://scikit-learn.org/stable/modules/tree.html#tips-on-practical-use at the section "Classification".
You can export your tree using graphviz (it states that you have to install the graphviz package, too). And this way you're able to visualize the tree built by the algorithm.
About the problem of the input features being transformed from the original ones it's a problem the algorithm can't help you with but you should be able to manage that by yourself if you've made the transformations yourself.
Any further doubt, comment.
$endgroup$
As shown in these docs: http://scikit-learn.org/stable/modules/tree.html#tips-on-practical-use at the section "Classification".
You can export your tree using graphviz (it states that you have to install the graphviz package, too). And this way you're able to visualize the tree built by the algorithm.
About the problem of the input features being transformed from the original ones it's a problem the algorithm can't help you with but you should be able to manage that by yourself if you've made the transformations yourself.
Any further doubt, comment.
answered Apr 29 '18 at 17:35
Felipe BormannFelipe Bormann
36117
36117
$begingroup$
Thank you for your reply. I have provided an example in the question. Hope this helps clarify what I am looking for.
$endgroup$
– S Datta
Apr 30 '18 at 13:21
$begingroup$
I saw your edit, if you build a mapping of the dummy variables you've created, you can create a function to return the original values but again, the classifier won't be able to predict based on the original values only the transformed features you've feed it on.
$endgroup$
– Felipe Bormann
Apr 30 '18 at 13:45
add a comment |
$begingroup$
Thank you for your reply. I have provided an example in the question. Hope this helps clarify what I am looking for.
$endgroup$
– S Datta
Apr 30 '18 at 13:21
$begingroup$
I saw your edit, if you build a mapping of the dummy variables you've created, you can create a function to return the original values but again, the classifier won't be able to predict based on the original values only the transformed features you've feed it on.
$endgroup$
– Felipe Bormann
Apr 30 '18 at 13:45
$begingroup$
Thank you for your reply. I have provided an example in the question. Hope this helps clarify what I am looking for.
$endgroup$
– S Datta
Apr 30 '18 at 13:21
$begingroup$
Thank you for your reply. I have provided an example in the question. Hope this helps clarify what I am looking for.
$endgroup$
– S Datta
Apr 30 '18 at 13:21
$begingroup$
I saw your edit, if you build a mapping of the dummy variables you've created, you can create a function to return the original values but again, the classifier won't be able to predict based on the original values only the transformed features you've feed it on.
$endgroup$
– Felipe Bormann
Apr 30 '18 at 13:45
$begingroup$
I saw your edit, if you build a mapping of the dummy variables you've created, you can create a function to return the original values but again, the classifier won't be able to predict based on the original values only the transformed features you've feed it on.
$endgroup$
– Felipe Bormann
Apr 30 '18 at 13:45
add a comment |
$begingroup$
If you just need names of the original features you can use a regex to parse them out. You can easily decide a naming convention for transformed features (using the prefix parameter in get_dummies
). After getting the scores, you can traverse the list of features in ascending/descending order and parse the column names using regex, use an ordered dict to store the results.
If you need the whole dataset transformed back, then go with the inverse_transform
method mentioned in other answers.
$endgroup$
add a comment |
$begingroup$
If you just need names of the original features you can use a regex to parse them out. You can easily decide a naming convention for transformed features (using the prefix parameter in get_dummies
). After getting the scores, you can traverse the list of features in ascending/descending order and parse the column names using regex, use an ordered dict to store the results.
If you need the whole dataset transformed back, then go with the inverse_transform
method mentioned in other answers.
$endgroup$
add a comment |
$begingroup$
If you just need names of the original features you can use a regex to parse them out. You can easily decide a naming convention for transformed features (using the prefix parameter in get_dummies
). After getting the scores, you can traverse the list of features in ascending/descending order and parse the column names using regex, use an ordered dict to store the results.
If you need the whole dataset transformed back, then go with the inverse_transform
method mentioned in other answers.
$endgroup$
If you just need names of the original features you can use a regex to parse them out. You can easily decide a naming convention for transformed features (using the prefix parameter in get_dummies
). After getting the scores, you can traverse the list of features in ascending/descending order and parse the column names using regex, use an ordered dict to store the results.
If you need the whole dataset transformed back, then go with the inverse_transform
method mentioned in other answers.
edited Oct 27 '18 at 17:57
Stephen Rauch♦
1,53551330
1,53551330
answered Oct 27 '18 at 17:35
Himanshu MisraHimanshu Misra
11
11
add a comment |
add a comment |
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f31005%2fhow-to-obtain-original-feature-names-after-using-one-hot-encoding%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown