EDA for analysis of nominal variable with high cardinality The Next CEO of Stack Overflow2019 Community Moderator ElectionWhat are the approaches to aggregate categorical variables?Kmeans on mixed dataset with high level for categClustering for mixed numeric and nominal discrete dataHow to deal with categorical feature of very high cardinality?Feature importance with high-cardinality categorical features for regression (numerical depdendent variable)Categorical, nominal or continuous variable?Is there a name for a scale which mixes ordinal and nominal?Nominal categorical variable with two levels: Label Encoding or One Hot encoding?How to deal with Nominal categorical with label encoding?Dummy variable for Categorical valuesResponse variable is nominal.

Film where the government was corrupt with aliens, people sent to kill aliens are given rigged visors not showing the right aliens

Lucky Feat: How can "more than one creature spend a luck point to influence the outcome of a roll"?

How do I fit a non linear curve?

Is it okay to majorly distort historical facts while writing a fiction story?

What CSS properties can the br tag have?

Why is the US ranked as #45 in Press Freedom ratings, despite its extremely permissive free speech laws?

Is there a way to save my career from absolute disaster?

Does destroying a Lich's phylactery destroy the soul within it?

Towers in the ocean; How deep can they be built?

Getting Stale Gas Out of a Gas Tank w/out Dropping the Tank

"Eavesdropping" vs "Listen in on"

New carbon wheel brake pads after use on aluminum wheel?

How to set page number in right side in chapter title page?

My ex-girlfriend uses my Apple ID to login to her iPad, do I have to give her my Apple ID password to reset it?

Is a distribution that is normal, but highly skewed, considered Gaussian?

Traveling with my 5 year old daughter (as the father) without the mother from Germany to Mexico

What flight has the highest ratio of timezone difference to flight time?

Is dried pee considered dirt?

Do I need to write [sic] when including a quotation with a number less than 10 that isn't written out?

IC has pull-down resistors on SMBus lines?

Can someone explain this formula for calculating Manhattan distance?

Can I calculate next year's exemptions based on this year's refund/amount owed?

When "be it" is at the beginning of a sentence, what kind of structure do you call it?

What would be the main consequences for a country leaving the WTO?

EDA for analysis of nominal variable with high cardinality

The Next CEO of Stack Overflow

2019 Community Moderator ElectionWhat are the approaches to aggregate categorical variables?Kmeans on mixed dataset with high level for categClustering for mixed numeric and nominal discrete dataHow to deal with categorical feature of very high cardinality?Feature importance with high-cardinality categorical features for regression (numerical depdendent variable)Categorical, nominal or continuous variable?Is there a name for a scale which mixes ordinal and nominal?Nominal categorical variable with two levels: Label Encoding or One Hot encoding?How to deal with Nominal categorical with label encoding?Dummy variable for Categorical valuesResponse variable is nominal.

I have a nominal variable (car model) with very high cardinality (~8500 labels) and I would like to analyse its relation with a binary target variable. While I can create logical groups and compare the distribution of target variable for each of the groups, can anyone suggest if there are any superior techniques/visualization tools for this type of analysis?

asked Mar 1 at 6:06

Rohit Gavval

617

bumped to the homepage by Community♦ 1 hour ago

This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.

add a comment |

asked Mar 1 at 6:06

Rohit Gavval

617

bumped to the homepage by Community♦ 1 hour ago

This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.

add a comment |

asked Mar 1 at 6:06

Rohit Gavval

617

categorical-data data-analysis

asked Mar 1 at 6:06

Rohit Gavval

617

asked Mar 1 at 6:06

Rohit Gavval

617

asked Mar 1 at 6:06

Rohit Gavval

617

asked Mar 1 at 6:06

Rohit Gavval

617

asked Mar 1 at 6:06

Rohit Gavval

617

bumped to the homepage by Community♦ 1 hour ago

This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.

bumped to the homepage by Community♦ 1 hour ago

This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.

add a comment |

1 Answer
1

active

oldest

votes

You can calculate mean target for each categorical variable and compare its values.
In pandas this can be done easily: df.groupby('categorical_feature').target.mean()

Then you can make a histogram to compare the approach. I also, seaborn has a catplot, where it do the same as above in a bar plot format, showing mean value for target variable based on each categorical one.

answered Mar 1 at 13:09

Victor Oliveira

3407

$begingroup$
My target variable is dichotomous. So taking the mean is not an option. May be I can take count, but the real problem is that I have around 8000 levels in one categorical attribute. How can I study that?
$endgroup$
– Rohit Gavval
Mar 7 at 9:43

$begingroup$
@RohitGavval, if you have a binary variable, you can calculate mean. It will be something like 0.333, 0.67, that is the point. Look at my answer to this question where I put the links with more explanation for the mentioned methods: datascience.stackexchange.com/questions/46780/…
$endgroup$
– Victor Oliveira
Mar 7 at 11:23

add a comment |

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f46439%2feda-for-analysis-of-nominal-variable-with-high-cardinality%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

You can calculate mean target for each categorical variable and compare its values.
In pandas this can be done easily: df.groupby('categorical_feature').target.mean()

answered Mar 1 at 13:09

Victor Oliveira

3407

$begingroup$
My target variable is dichotomous. So taking the mean is not an option. May be I can take count, but the real problem is that I have around 8000 levels in one categorical attribute. How can I study that?
$endgroup$
– Rohit Gavval
Mar 7 at 9:43

$begingroup$
@RohitGavval, if you have a binary variable, you can calculate mean. It will be something like 0.333, 0.67, that is the point. Look at my answer to this question where I put the links with more explanation for the mentioned methods: datascience.stackexchange.com/questions/46780/…
$endgroup$
– Victor Oliveira
Mar 7 at 11:23

add a comment |

You can calculate mean target for each categorical variable and compare its values.
In pandas this can be done easily: df.groupby('categorical_feature').target.mean()

answered Mar 1 at 13:09

Victor Oliveira

3407

$begingroup$
My target variable is dichotomous. So taking the mean is not an option. May be I can take count, but the real problem is that I have around 8000 levels in one categorical attribute. How can I study that?
$endgroup$
– Rohit Gavval
Mar 7 at 9:43

$begingroup$
@RohitGavval, if you have a binary variable, you can calculate mean. It will be something like 0.333, 0.67, that is the point. Look at my answer to this question where I put the links with more explanation for the mentioned methods: datascience.stackexchange.com/questions/46780/…
$endgroup$
– Victor Oliveira
Mar 7 at 11:23

add a comment |

You can calculate mean target for each categorical variable and compare its values.
In pandas this can be done easily: df.groupby('categorical_feature').target.mean()

answered Mar 1 at 13:09

Victor Oliveira

3407

You can calculate mean target for each categorical variable and compare its values.
In pandas this can be done easily: df.groupby('categorical_feature').target.mean()

answered Mar 1 at 13:09

Victor Oliveira

3407

answered Mar 1 at 13:09

Victor Oliveira

3407

answered Mar 1 at 13:09

Victor Oliveira

3407

answered Mar 1 at 13:09

Victor Oliveira

3407

$begingroup$
My target variable is dichotomous. So taking the mean is not an option. May be I can take count, but the real problem is that I have around 8000 levels in one categorical attribute. How can I study that?
$endgroup$
– Rohit Gavval
Mar 7 at 9:43

$begingroup$
@RohitGavval, if you have a binary variable, you can calculate mean. It will be something like 0.333, 0.67, that is the point. Look at my answer to this question where I put the links with more explanation for the mentioned methods: datascience.stackexchange.com/questions/46780/…
$endgroup$
– Victor Oliveira
Mar 7 at 11:23

add a comment |

$begingroup$
My target variable is dichotomous. So taking the mean is not an option. May be I can take count, but the real problem is that I have around 8000 levels in one categorical attribute. How can I study that?
$endgroup$
– Rohit Gavval
Mar 7 at 9:43

$begingroup$
@RohitGavval, if you have a binary variable, you can calculate mean. It will be something like 0.333, 0.67, that is the point. Look at my answer to this question where I put the links with more explanation for the mentioned methods: datascience.stackexchange.com/questions/46780/…
$endgroup$
– Victor Oliveira
Mar 7 at 11:23

My target variable is dichotomous. So taking the mean is not an option. May be I can take count, but the real problem is that I have around 8000 levels in one categorical attribute. How can I study that?

– Rohit Gavval
Mar 7 at 9:43

@RohitGavval, if you have a binary variable, you can calculate mean. It will be something like 0.333, 0.67, that is the point. Look at my answer to this question where I put the links with more explanation for the mentioned methods: datascience.stackexchange.com/questions/46780/…

– Victor Oliveira
Mar 7 at 11:23

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Data Science Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Hfrxdjt

bumped to the homepage by Community♦ 1 hour ago

bumped to the homepage by Community♦ 1 hour ago

bumped to the homepage by Community♦ 1 hour ago

bumped to the homepage by Community♦ 1 hour ago

1 Answer
1

Post as a guest

1 Answer
1

1 Answer
1

Post as a guest

Popular posts from this blog

bumped to the homepage by Community♦ 1 hour ago

bumped to the homepage by Community♦ 1 hour ago

bumped to the homepage by Community♦ 1 hour ago

bumped to the homepage by Community♦ 1 hour ago

1 Answer 1

Sign up or log in

Post as a guest

Post as a guest

1 Answer 1

1 Answer 1

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

1 Answer
1

1 Answer
1

1 Answer
1