EDA for analysis of nominal variable with high cardinality The Next CEO of Stack Overflow2019 Community Moderator ElectionWhat are the approaches to aggregate categorical variables?Kmeans on mixed dataset with high level for categClustering for mixed numeric and nominal discrete dataHow to deal with categorical feature of very high cardinality?Feature importance with high-cardinality categorical features for regression (numerical depdendent variable)Categorical, nominal or continuous variable?Is there a name for a scale which mixes ordinal and nominal?Nominal categorical variable with two levels: Label Encoding or One Hot encoding?How to deal with Nominal categorical with label encoding?Dummy variable for Categorical valuesResponse variable is nominal.

Film where the government was corrupt with aliens, people sent to kill aliens are given rigged visors not showing the right aliens

Lucky Feat: How can "more than one creature spend a luck point to influence the outcome of a roll"?

How do I fit a non linear curve?

Is it okay to majorly distort historical facts while writing a fiction story?

What CSS properties can the br tag have?

Why is the US ranked as #45 in Press Freedom ratings, despite its extremely permissive free speech laws?

Is there a way to save my career from absolute disaster?

Does destroying a Lich's phylactery destroy the soul within it?

Towers in the ocean; How deep can they be built?

Getting Stale Gas Out of a Gas Tank w/out Dropping the Tank

"Eavesdropping" vs "Listen in on"

New carbon wheel brake pads after use on aluminum wheel?

How to set page number in right side in chapter title page?

My ex-girlfriend uses my Apple ID to login to her iPad, do I have to give her my Apple ID password to reset it?

Is a distribution that is normal, but highly skewed, considered Gaussian?

Traveling with my 5 year old daughter (as the father) without the mother from Germany to Mexico

What flight has the highest ratio of timezone difference to flight time?

Is dried pee considered dirt?

Do I need to write [sic] when including a quotation with a number less than 10 that isn't written out?

IC has pull-down resistors on SMBus lines?

Can someone explain this formula for calculating Manhattan distance?

Can I calculate next year's exemptions based on this year's refund/amount owed?

When "be it" is at the beginning of a sentence, what kind of structure do you call it?

What would be the main consequences for a country leaving the WTO?



EDA for analysis of nominal variable with high cardinality



The Next CEO of Stack Overflow
2019 Community Moderator ElectionWhat are the approaches to aggregate categorical variables?Kmeans on mixed dataset with high level for categClustering for mixed numeric and nominal discrete dataHow to deal with categorical feature of very high cardinality?Feature importance with high-cardinality categorical features for regression (numerical depdendent variable)Categorical, nominal or continuous variable?Is there a name for a scale which mixes ordinal and nominal?Nominal categorical variable with two levels: Label Encoding or One Hot encoding?How to deal with Nominal categorical with label encoding?Dummy variable for Categorical valuesResponse variable is nominal.










1












$begingroup$


I have a nominal variable (car model) with very high cardinality (~8500 labels) and I would like to analyse its relation with a binary target variable. While I can create logical groups and compare the distribution of target variable for each of the groups, can anyone suggest if there are any superior techniques/visualization tools for this type of analysis?










share|improve this question









$endgroup$




bumped to the homepage by Community 1 hour ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.



















    1












    $begingroup$


    I have a nominal variable (car model) with very high cardinality (~8500 labels) and I would like to analyse its relation with a binary target variable. While I can create logical groups and compare the distribution of target variable for each of the groups, can anyone suggest if there are any superior techniques/visualization tools for this type of analysis?










    share|improve this question









    $endgroup$




    bumped to the homepage by Community 1 hour ago


    This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.

















      1












      1








      1





      $begingroup$


      I have a nominal variable (car model) with very high cardinality (~8500 labels) and I would like to analyse its relation with a binary target variable. While I can create logical groups and compare the distribution of target variable for each of the groups, can anyone suggest if there are any superior techniques/visualization tools for this type of analysis?










      share|improve this question









      $endgroup$




      I have a nominal variable (car model) with very high cardinality (~8500 labels) and I would like to analyse its relation with a binary target variable. While I can create logical groups and compare the distribution of target variable for each of the groups, can anyone suggest if there are any superior techniques/visualization tools for this type of analysis?







      categorical-data data-analysis






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Mar 1 at 6:06









      Rohit GavvalRohit Gavval

      617




      617





      bumped to the homepage by Community 1 hour ago


      This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.







      bumped to the homepage by Community 1 hour ago


      This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.






















          1 Answer
          1






          active

          oldest

          votes


















          0












          $begingroup$

          You can calculate mean target for each categorical variable and compare its values.
          In pandas this can be done easily: df.groupby('categorical_feature').target.mean()



          Then you can make a histogram to compare the approach. I also, seaborn has a catplot, where it do the same as above in a bar plot format, showing mean value for target variable based on each categorical one.






          share|improve this answer









          $endgroup$












          • $begingroup$
            My target variable is dichotomous. So taking the mean is not an option. May be I can take count, but the real problem is that I have around 8000 levels in one categorical attribute. How can I study that?
            $endgroup$
            – Rohit Gavval
            Mar 7 at 9:43










          • $begingroup$
            @RohitGavval, if you have a binary variable, you can calculate mean. It will be something like 0.333, 0.67, that is the point. Look at my answer to this question where I put the links with more explanation for the mentioned methods: datascience.stackexchange.com/questions/46780/…
            $endgroup$
            – Victor Oliveira
            Mar 7 at 11:23











          Your Answer





          StackExchange.ifUsing("editor", function ()
          return StackExchange.using("mathjaxEditing", function ()
          StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
          StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
          );
          );
          , "mathjax-editing");

          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "557"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: false,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          imageUploader:
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          ,
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );













          draft saved

          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f46439%2feda-for-analysis-of-nominal-variable-with-high-cardinality%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          0












          $begingroup$

          You can calculate mean target for each categorical variable and compare its values.
          In pandas this can be done easily: df.groupby('categorical_feature').target.mean()



          Then you can make a histogram to compare the approach. I also, seaborn has a catplot, where it do the same as above in a bar plot format, showing mean value for target variable based on each categorical one.






          share|improve this answer









          $endgroup$












          • $begingroup$
            My target variable is dichotomous. So taking the mean is not an option. May be I can take count, but the real problem is that I have around 8000 levels in one categorical attribute. How can I study that?
            $endgroup$
            – Rohit Gavval
            Mar 7 at 9:43










          • $begingroup$
            @RohitGavval, if you have a binary variable, you can calculate mean. It will be something like 0.333, 0.67, that is the point. Look at my answer to this question where I put the links with more explanation for the mentioned methods: datascience.stackexchange.com/questions/46780/…
            $endgroup$
            – Victor Oliveira
            Mar 7 at 11:23















          0












          $begingroup$

          You can calculate mean target for each categorical variable and compare its values.
          In pandas this can be done easily: df.groupby('categorical_feature').target.mean()



          Then you can make a histogram to compare the approach. I also, seaborn has a catplot, where it do the same as above in a bar plot format, showing mean value for target variable based on each categorical one.






          share|improve this answer









          $endgroup$












          • $begingroup$
            My target variable is dichotomous. So taking the mean is not an option. May be I can take count, but the real problem is that I have around 8000 levels in one categorical attribute. How can I study that?
            $endgroup$
            – Rohit Gavval
            Mar 7 at 9:43










          • $begingroup$
            @RohitGavval, if you have a binary variable, you can calculate mean. It will be something like 0.333, 0.67, that is the point. Look at my answer to this question where I put the links with more explanation for the mentioned methods: datascience.stackexchange.com/questions/46780/…
            $endgroup$
            – Victor Oliveira
            Mar 7 at 11:23













          0












          0








          0





          $begingroup$

          You can calculate mean target for each categorical variable and compare its values.
          In pandas this can be done easily: df.groupby('categorical_feature').target.mean()



          Then you can make a histogram to compare the approach. I also, seaborn has a catplot, where it do the same as above in a bar plot format, showing mean value for target variable based on each categorical one.






          share|improve this answer









          $endgroup$



          You can calculate mean target for each categorical variable and compare its values.
          In pandas this can be done easily: df.groupby('categorical_feature').target.mean()



          Then you can make a histogram to compare the approach. I also, seaborn has a catplot, where it do the same as above in a bar plot format, showing mean value for target variable based on each categorical one.







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Mar 1 at 13:09









          Victor OliveiraVictor Oliveira

          3407




          3407











          • $begingroup$
            My target variable is dichotomous. So taking the mean is not an option. May be I can take count, but the real problem is that I have around 8000 levels in one categorical attribute. How can I study that?
            $endgroup$
            – Rohit Gavval
            Mar 7 at 9:43










          • $begingroup$
            @RohitGavval, if you have a binary variable, you can calculate mean. It will be something like 0.333, 0.67, that is the point. Look at my answer to this question where I put the links with more explanation for the mentioned methods: datascience.stackexchange.com/questions/46780/…
            $endgroup$
            – Victor Oliveira
            Mar 7 at 11:23
















          • $begingroup$
            My target variable is dichotomous. So taking the mean is not an option. May be I can take count, but the real problem is that I have around 8000 levels in one categorical attribute. How can I study that?
            $endgroup$
            – Rohit Gavval
            Mar 7 at 9:43










          • $begingroup$
            @RohitGavval, if you have a binary variable, you can calculate mean. It will be something like 0.333, 0.67, that is the point. Look at my answer to this question where I put the links with more explanation for the mentioned methods: datascience.stackexchange.com/questions/46780/…
            $endgroup$
            – Victor Oliveira
            Mar 7 at 11:23















          $begingroup$
          My target variable is dichotomous. So taking the mean is not an option. May be I can take count, but the real problem is that I have around 8000 levels in one categorical attribute. How can I study that?
          $endgroup$
          – Rohit Gavval
          Mar 7 at 9:43




          $begingroup$
          My target variable is dichotomous. So taking the mean is not an option. May be I can take count, but the real problem is that I have around 8000 levels in one categorical attribute. How can I study that?
          $endgroup$
          – Rohit Gavval
          Mar 7 at 9:43












          $begingroup$
          @RohitGavval, if you have a binary variable, you can calculate mean. It will be something like 0.333, 0.67, that is the point. Look at my answer to this question where I put the links with more explanation for the mentioned methods: datascience.stackexchange.com/questions/46780/…
          $endgroup$
          – Victor Oliveira
          Mar 7 at 11:23




          $begingroup$
          @RohitGavval, if you have a binary variable, you can calculate mean. It will be something like 0.333, 0.67, that is the point. Look at my answer to this question where I put the links with more explanation for the mentioned methods: datascience.stackexchange.com/questions/46780/…
          $endgroup$
          – Victor Oliveira
          Mar 7 at 11:23

















          draft saved

          draft discarded
















































          Thanks for contributing an answer to Data Science Stack Exchange!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid


          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.

          Use MathJax to format equations. MathJax reference.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f46439%2feda-for-analysis-of-nominal-variable-with-high-cardinality%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Ружовы пелікан Змест Знешні выгляд | Пашырэнне | Асаблівасці біялогіі | Літаратура | НавігацыяДагледжаная версіяправерана1 зменаДагледжаная версіяправерана1 змена/ 22697590 Сістэматыкана ВіківідахВыявына Вікісховішчы174693363011049382

          ValueError: Error when checking input: expected conv2d_13_input to have shape (3, 150, 150) but got array with shape (150, 150, 3)2019 Community Moderator ElectionError when checking : expected dense_1_input to have shape (None, 5) but got array with shape (200, 1)Error 'Expected 2D array, got 1D array instead:'ValueError: Error when checking input: expected lstm_41_input to have 3 dimensions, but got array with shape (40000,100)ValueError: Error when checking target: expected dense_1 to have shape (7,) but got array with shape (1,)ValueError: Error when checking target: expected dense_2 to have shape (1,) but got array with shape (0,)Keras exception: ValueError: Error when checking input: expected conv2d_1_input to have shape (150, 150, 3) but got array with shape (256, 256, 3)Steps taking too long to completewhen checking input: expected dense_1_input to have shape (13328,) but got array with shape (317,)ValueError: Error when checking target: expected dense_3 to have shape (None, 1) but got array with shape (7715, 40000)Keras exception: Error when checking input: expected dense_input to have shape (2,) but got array with shape (1,)

          Illegal assignment from SObject to ContactFetching String, Id from Map - Illegal Assignment Id to Field / ObjectError: Compile Error: Illegal assignment from String to BooleanError: List has no rows for assignment to SObjectError on Test Class - System.QueryException: List has no rows for assignment to SObjectRemote action problemDML requires SObject or SObject list type error“Illegal assignment from List to List”Test Class Fail: Batch Class: System.QueryException: List has no rows for assignment to SObjectMapping to a user'List has no rows for assignment to SObject' Mystery