customer segmentation with categorical variablesIdentifying top predictors from a mix of categorical and ordinal dataClustering : handling categorical data, should we pivot and scale?problem with regular expressionClusering based on categorical variables?Imputation of missing values and dealing with categorical valuesHandling categorical variables in linear regression and random forestHow to deal with categorical variablesConvert nominal to numeric variables?Dealing with a dataset with a mix of continuous and categorical variablesDealing with multiple distinct-value categorical variables

English or Hindi translation of Vyasa Smriti

A ​Note ​on ​N!

My bank got bought out, am I now going to have to start filing tax returns in a different state?

Combinatorics problem, right solution?

How does the mezzoloth's teleportation work?

Drawing a german abacus as in the books of Adam Ries

Could moose/elk survive in the Amazon forest?

Is there any pythonic way to find average of specific tuple elements in array?

Multiple options vs single option UI

How important is it that $TERM is correct?

How to find if a column is referenced in a computed column?

How exactly does Hawking radiation decrease the mass of black holes?

Can a level 2 Warlock take one level in rogue, then continue advancing as a warlock?

Do I need to watch Ant-Man and the Wasp and Captain Marvel before watching Avengers: Endgame?

Philosophical question on logistic regression: why isn't the optimal threshold value trained?

`microtype`: Set Minimum Width of a Space

What makes accurate emulation of old systems a difficult task?

Which big number is bigger?

A Paper Record is What I Hamper

How bug prioritization works in agile projects vs non agile

What is the best way to deal with NPC-NPC combat?

A strange hotel

What's the difference between using dependency injection with a container and using a service locator?

How do I produce this symbol: Ϟ in pdfLaTeX?



customer segmentation with categorical variables


Identifying top predictors from a mix of categorical and ordinal dataClustering : handling categorical data, should we pivot and scale?problem with regular expressionClusering based on categorical variables?Imputation of missing values and dealing with categorical valuesHandling categorical variables in linear regression and random forestHow to deal with categorical variablesConvert nominal to numeric variables?Dealing with a dataset with a mix of continuous and categorical variablesDealing with multiple distinct-value categorical variables













1












$begingroup$


I was adviced to write in this group regarding my question about modeling categorical database.
I have a customer dataset, which is a survey result. I have 1595 obs. and about 200 columns(200 because most of the cases the questions were multiple choice and we had to split it into columns). Majority of variables are categorical or binary. I do not have continous variables at all. My task is to do customer segmentation, clustering. There is no initial assumptions although as I have also the questionnaire so can logically seperate the important questions.



I face several issues regarding the modeling



  1. I need to validate the choice of variables i use

  2. I am trying to find associations, pairwise associations and trends, as I do not have initial assumtions who can be my segments

  3. Clustering models are not working good for categorical variables and the ones I tried for example kmods, ignore the associations, correlations and return me not clear picture.

Can you please suggest how to approach, or from where to start. I am new in data analytics and I need some hints to go on with the analysis and I will be grateful to have some guidance at least high level what can be done.



Thanks in advance!










share|improve this question









$endgroup$




bumped to the homepage by Community 11 mins ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.



















    1












    $begingroup$


    I was adviced to write in this group regarding my question about modeling categorical database.
    I have a customer dataset, which is a survey result. I have 1595 obs. and about 200 columns(200 because most of the cases the questions were multiple choice and we had to split it into columns). Majority of variables are categorical or binary. I do not have continous variables at all. My task is to do customer segmentation, clustering. There is no initial assumptions although as I have also the questionnaire so can logically seperate the important questions.



    I face several issues regarding the modeling



    1. I need to validate the choice of variables i use

    2. I am trying to find associations, pairwise associations and trends, as I do not have initial assumtions who can be my segments

    3. Clustering models are not working good for categorical variables and the ones I tried for example kmods, ignore the associations, correlations and return me not clear picture.

    Can you please suggest how to approach, or from where to start. I am new in data analytics and I need some hints to go on with the analysis and I will be grateful to have some guidance at least high level what can be done.



    Thanks in advance!










    share|improve this question









    $endgroup$




    bumped to the homepage by Community 11 mins ago


    This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.

















      1












      1








      1





      $begingroup$


      I was adviced to write in this group regarding my question about modeling categorical database.
      I have a customer dataset, which is a survey result. I have 1595 obs. and about 200 columns(200 because most of the cases the questions were multiple choice and we had to split it into columns). Majority of variables are categorical or binary. I do not have continous variables at all. My task is to do customer segmentation, clustering. There is no initial assumptions although as I have also the questionnaire so can logically seperate the important questions.



      I face several issues regarding the modeling



      1. I need to validate the choice of variables i use

      2. I am trying to find associations, pairwise associations and trends, as I do not have initial assumtions who can be my segments

      3. Clustering models are not working good for categorical variables and the ones I tried for example kmods, ignore the associations, correlations and return me not clear picture.

      Can you please suggest how to approach, or from where to start. I am new in data analytics and I need some hints to go on with the analysis and I will be grateful to have some guidance at least high level what can be done.



      Thanks in advance!










      share|improve this question









      $endgroup$




      I was adviced to write in this group regarding my question about modeling categorical database.
      I have a customer dataset, which is a survey result. I have 1595 obs. and about 200 columns(200 because most of the cases the questions were multiple choice and we had to split it into columns). Majority of variables are categorical or binary. I do not have continous variables at all. My task is to do customer segmentation, clustering. There is no initial assumptions although as I have also the questionnaire so can logically seperate the important questions.



      I face several issues regarding the modeling



      1. I need to validate the choice of variables i use

      2. I am trying to find associations, pairwise associations and trends, as I do not have initial assumtions who can be my segments

      3. Clustering models are not working good for categorical variables and the ones I tried for example kmods, ignore the associations, correlations and return me not clear picture.

      Can you please suggest how to approach, or from where to start. I am new in data analytics and I need some hints to go on with the analysis and I will be grateful to have some guidance at least high level what can be done.



      Thanks in advance!







      r clustering categorical-data






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Mar 28 '18 at 19:55









      SaraSara

      111




      111





      bumped to the homepage by Community 11 mins ago


      This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.







      bumped to the homepage by Community 11 mins ago


      This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.






















          1 Answer
          1






          active

          oldest

          votes


















          0












          $begingroup$

          You can use mixture model with variable selection. In this framework, the challenge of variable selection consists in model selection. Thus, detection of the relevant features and estimation of the number of clusters can be done according to information criteria (like BIC or ICL).



          To perform this analysis, you can use the R package VarSelLCM. Because you consider categorical variables, your dataset must be a data.frame and each column must be a factor. Here is an example of the script. Your dataset is denoted by "my.data".



          ## Clustering by considering all the variables as discriminative
          # Number of clusters is between 1 and 6
          res.all <- VarSelCluster(my.data, 1:6, vbleSelec = FALSE)

          # partition
          res.all@partitions@zMAP

          # shiny application
          VarSelShiny(res.all)


          ## Clustering with variable selection
          # Number of clusters is between 1 and 6
          res.selec <- VarSelCluster(my.data, 1:6, vbleSelec = TRUE)

          # partition
          res.selec@partitions@zMAP

          # shiny application
          VarSelShiny(res.selec)





          share|improve this answer









          $endgroup$












          • $begingroup$
            Thanks a lot for your reply! I tried the code on my dataset, although I have all my variables as factors in dataframe. i receive an error that at least one is not and some additinal 50 warnings ahead. I will try to figure it out and see what are the results for this method. Thank you very much for supprot!
            $endgroup$
            – Sara
            Mar 29 '18 at 19:17











          • $begingroup$
            About the warnings, they appear if some levels are not taken by any observations. About the error, may be you can convert each column as factor, see the R code below my.data <- as.data.frame(my.data) for (j in 1:ncol(my_data)) my.data[,j] <- as.factor(my.data[,j])
            $endgroup$
            – user200668
            Mar 29 '18 at 19:48











          Your Answer








          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "557"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: false,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          imageUploader:
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          ,
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );













          draft saved

          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f29665%2fcustomer-segmentation-with-categorical-variables%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          0












          $begingroup$

          You can use mixture model with variable selection. In this framework, the challenge of variable selection consists in model selection. Thus, detection of the relevant features and estimation of the number of clusters can be done according to information criteria (like BIC or ICL).



          To perform this analysis, you can use the R package VarSelLCM. Because you consider categorical variables, your dataset must be a data.frame and each column must be a factor. Here is an example of the script. Your dataset is denoted by "my.data".



          ## Clustering by considering all the variables as discriminative
          # Number of clusters is between 1 and 6
          res.all <- VarSelCluster(my.data, 1:6, vbleSelec = FALSE)

          # partition
          res.all@partitions@zMAP

          # shiny application
          VarSelShiny(res.all)


          ## Clustering with variable selection
          # Number of clusters is between 1 and 6
          res.selec <- VarSelCluster(my.data, 1:6, vbleSelec = TRUE)

          # partition
          res.selec@partitions@zMAP

          # shiny application
          VarSelShiny(res.selec)





          share|improve this answer









          $endgroup$












          • $begingroup$
            Thanks a lot for your reply! I tried the code on my dataset, although I have all my variables as factors in dataframe. i receive an error that at least one is not and some additinal 50 warnings ahead. I will try to figure it out and see what are the results for this method. Thank you very much for supprot!
            $endgroup$
            – Sara
            Mar 29 '18 at 19:17











          • $begingroup$
            About the warnings, they appear if some levels are not taken by any observations. About the error, may be you can convert each column as factor, see the R code below my.data <- as.data.frame(my.data) for (j in 1:ncol(my_data)) my.data[,j] <- as.factor(my.data[,j])
            $endgroup$
            – user200668
            Mar 29 '18 at 19:48















          0












          $begingroup$

          You can use mixture model with variable selection. In this framework, the challenge of variable selection consists in model selection. Thus, detection of the relevant features and estimation of the number of clusters can be done according to information criteria (like BIC or ICL).



          To perform this analysis, you can use the R package VarSelLCM. Because you consider categorical variables, your dataset must be a data.frame and each column must be a factor. Here is an example of the script. Your dataset is denoted by "my.data".



          ## Clustering by considering all the variables as discriminative
          # Number of clusters is between 1 and 6
          res.all <- VarSelCluster(my.data, 1:6, vbleSelec = FALSE)

          # partition
          res.all@partitions@zMAP

          # shiny application
          VarSelShiny(res.all)


          ## Clustering with variable selection
          # Number of clusters is between 1 and 6
          res.selec <- VarSelCluster(my.data, 1:6, vbleSelec = TRUE)

          # partition
          res.selec@partitions@zMAP

          # shiny application
          VarSelShiny(res.selec)





          share|improve this answer









          $endgroup$












          • $begingroup$
            Thanks a lot for your reply! I tried the code on my dataset, although I have all my variables as factors in dataframe. i receive an error that at least one is not and some additinal 50 warnings ahead. I will try to figure it out and see what are the results for this method. Thank you very much for supprot!
            $endgroup$
            – Sara
            Mar 29 '18 at 19:17











          • $begingroup$
            About the warnings, they appear if some levels are not taken by any observations. About the error, may be you can convert each column as factor, see the R code below my.data <- as.data.frame(my.data) for (j in 1:ncol(my_data)) my.data[,j] <- as.factor(my.data[,j])
            $endgroup$
            – user200668
            Mar 29 '18 at 19:48













          0












          0








          0





          $begingroup$

          You can use mixture model with variable selection. In this framework, the challenge of variable selection consists in model selection. Thus, detection of the relevant features and estimation of the number of clusters can be done according to information criteria (like BIC or ICL).



          To perform this analysis, you can use the R package VarSelLCM. Because you consider categorical variables, your dataset must be a data.frame and each column must be a factor. Here is an example of the script. Your dataset is denoted by "my.data".



          ## Clustering by considering all the variables as discriminative
          # Number of clusters is between 1 and 6
          res.all <- VarSelCluster(my.data, 1:6, vbleSelec = FALSE)

          # partition
          res.all@partitions@zMAP

          # shiny application
          VarSelShiny(res.all)


          ## Clustering with variable selection
          # Number of clusters is between 1 and 6
          res.selec <- VarSelCluster(my.data, 1:6, vbleSelec = TRUE)

          # partition
          res.selec@partitions@zMAP

          # shiny application
          VarSelShiny(res.selec)





          share|improve this answer









          $endgroup$



          You can use mixture model with variable selection. In this framework, the challenge of variable selection consists in model selection. Thus, detection of the relevant features and estimation of the number of clusters can be done according to information criteria (like BIC or ICL).



          To perform this analysis, you can use the R package VarSelLCM. Because you consider categorical variables, your dataset must be a data.frame and each column must be a factor. Here is an example of the script. Your dataset is denoted by "my.data".



          ## Clustering by considering all the variables as discriminative
          # Number of clusters is between 1 and 6
          res.all <- VarSelCluster(my.data, 1:6, vbleSelec = FALSE)

          # partition
          res.all@partitions@zMAP

          # shiny application
          VarSelShiny(res.all)


          ## Clustering with variable selection
          # Number of clusters is between 1 and 6
          res.selec <- VarSelCluster(my.data, 1:6, vbleSelec = TRUE)

          # partition
          res.selec@partitions@zMAP

          # shiny application
          VarSelShiny(res.selec)






          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Mar 28 '18 at 20:49









          user200668user200668

          11




          11











          • $begingroup$
            Thanks a lot for your reply! I tried the code on my dataset, although I have all my variables as factors in dataframe. i receive an error that at least one is not and some additinal 50 warnings ahead. I will try to figure it out and see what are the results for this method. Thank you very much for supprot!
            $endgroup$
            – Sara
            Mar 29 '18 at 19:17











          • $begingroup$
            About the warnings, they appear if some levels are not taken by any observations. About the error, may be you can convert each column as factor, see the R code below my.data <- as.data.frame(my.data) for (j in 1:ncol(my_data)) my.data[,j] <- as.factor(my.data[,j])
            $endgroup$
            – user200668
            Mar 29 '18 at 19:48
















          • $begingroup$
            Thanks a lot for your reply! I tried the code on my dataset, although I have all my variables as factors in dataframe. i receive an error that at least one is not and some additinal 50 warnings ahead. I will try to figure it out and see what are the results for this method. Thank you very much for supprot!
            $endgroup$
            – Sara
            Mar 29 '18 at 19:17











          • $begingroup$
            About the warnings, they appear if some levels are not taken by any observations. About the error, may be you can convert each column as factor, see the R code below my.data <- as.data.frame(my.data) for (j in 1:ncol(my_data)) my.data[,j] <- as.factor(my.data[,j])
            $endgroup$
            – user200668
            Mar 29 '18 at 19:48















          $begingroup$
          Thanks a lot for your reply! I tried the code on my dataset, although I have all my variables as factors in dataframe. i receive an error that at least one is not and some additinal 50 warnings ahead. I will try to figure it out and see what are the results for this method. Thank you very much for supprot!
          $endgroup$
          – Sara
          Mar 29 '18 at 19:17





          $begingroup$
          Thanks a lot for your reply! I tried the code on my dataset, although I have all my variables as factors in dataframe. i receive an error that at least one is not and some additinal 50 warnings ahead. I will try to figure it out and see what are the results for this method. Thank you very much for supprot!
          $endgroup$
          – Sara
          Mar 29 '18 at 19:17













          $begingroup$
          About the warnings, they appear if some levels are not taken by any observations. About the error, may be you can convert each column as factor, see the R code below my.data <- as.data.frame(my.data) for (j in 1:ncol(my_data)) my.data[,j] <- as.factor(my.data[,j])
          $endgroup$
          – user200668
          Mar 29 '18 at 19:48




          $begingroup$
          About the warnings, they appear if some levels are not taken by any observations. About the error, may be you can convert each column as factor, see the R code below my.data <- as.data.frame(my.data) for (j in 1:ncol(my_data)) my.data[,j] <- as.factor(my.data[,j])
          $endgroup$
          – user200668
          Mar 29 '18 at 19:48

















          draft saved

          draft discarded
















































          Thanks for contributing an answer to Data Science Stack Exchange!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid


          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.

          Use MathJax to format equations. MathJax reference.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f29665%2fcustomer-segmentation-with-categorical-variables%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          ValueError: Error when checking input: expected conv2d_13_input to have shape (3, 150, 150) but got array with shape (150, 150, 3)2019 Community Moderator ElectionError when checking : expected dense_1_input to have shape (None, 5) but got array with shape (200, 1)Error 'Expected 2D array, got 1D array instead:'ValueError: Error when checking input: expected lstm_41_input to have 3 dimensions, but got array with shape (40000,100)ValueError: Error when checking target: expected dense_1 to have shape (7,) but got array with shape (1,)ValueError: Error when checking target: expected dense_2 to have shape (1,) but got array with shape (0,)Keras exception: ValueError: Error when checking input: expected conv2d_1_input to have shape (150, 150, 3) but got array with shape (256, 256, 3)Steps taking too long to completewhen checking input: expected dense_1_input to have shape (13328,) but got array with shape (317,)ValueError: Error when checking target: expected dense_3 to have shape (None, 1) but got array with shape (7715, 40000)Keras exception: Error when checking input: expected dense_input to have shape (2,) but got array with shape (1,)

          Ружовы пелікан Змест Знешні выгляд | Пашырэнне | Асаблівасці біялогіі | Літаратура | НавігацыяДагледжаная версіяправерана1 зменаДагледжаная версіяправерана1 змена/ 22697590 Сістэматыкана ВіківідахВыявына Вікісховішчы174693363011049382

          Illegal assignment from SObject to ContactFetching String, Id from Map - Illegal Assignment Id to Field / ObjectError: Compile Error: Illegal assignment from String to BooleanError: List has no rows for assignment to SObjectError on Test Class - System.QueryException: List has no rows for assignment to SObjectRemote action problemDML requires SObject or SObject list type error“Illegal assignment from List to List”Test Class Fail: Batch Class: System.QueryException: List has no rows for assignment to SObjectMapping to a user'List has no rows for assignment to SObject' Mystery