customer segmentation with categorical variablesIdentifying top predictors from a mix of categorical and ordinal dataClustering : handling categorical data, should we pivot and scale?problem with regular expressionClusering based on categorical variables?Imputation of missing values and dealing with categorical valuesHandling categorical variables in linear regression and random forestHow to deal with categorical variablesConvert nominal to numeric variables?Dealing with a dataset with a mix of continuous and categorical variablesDealing with multiple distinct-value categorical variables

English or Hindi translation of Vyasa Smriti

A ​Note ​on ​N!

My bank got bought out, am I now going to have to start filing tax returns in a different state?

Combinatorics problem, right solution?

How does the mezzoloth's teleportation work?

Drawing a german abacus as in the books of Adam Ries

Could moose/elk survive in the Amazon forest?

Is there any pythonic way to find average of specific tuple elements in array?

Multiple options vs single option UI

How important is it that $TERM is correct?

How to find if a column is referenced in a computed column?

How exactly does Hawking radiation decrease the mass of black holes?

Can a level 2 Warlock take one level in rogue, then continue advancing as a warlock?

Do I need to watch Ant-Man and the Wasp and Captain Marvel before watching Avengers: Endgame?

Philosophical question on logistic regression: why isn't the optimal threshold value trained?

`microtype`: Set Minimum Width of a Space

What makes accurate emulation of old systems a difficult task?

Which big number is bigger?

A Paper Record is What I Hamper

How bug prioritization works in agile projects vs non agile

What is the best way to deal with NPC-NPC combat?

A strange hotel

What's the difference between using dependency injection with a container and using a service locator?

How do I produce this symbol: Ϟ in pdfLaTeX?



customer segmentation with categorical variables


Identifying top predictors from a mix of categorical and ordinal dataClustering : handling categorical data, should we pivot and scale?problem with regular expressionClusering based on categorical variables?Imputation of missing values and dealing with categorical valuesHandling categorical variables in linear regression and random forestHow to deal with categorical variablesConvert nominal to numeric variables?Dealing with a dataset with a mix of continuous and categorical variablesDealing with multiple distinct-value categorical variables













1












$begingroup$


I was adviced to write in this group regarding my question about modeling categorical database.
I have a customer dataset, which is a survey result. I have 1595 obs. and about 200 columns(200 because most of the cases the questions were multiple choice and we had to split it into columns). Majority of variables are categorical or binary. I do not have continous variables at all. My task is to do customer segmentation, clustering. There is no initial assumptions although as I have also the questionnaire so can logically seperate the important questions.



I face several issues regarding the modeling



  1. I need to validate the choice of variables i use

  2. I am trying to find associations, pairwise associations and trends, as I do not have initial assumtions who can be my segments

  3. Clustering models are not working good for categorical variables and the ones I tried for example kmods, ignore the associations, correlations and return me not clear picture.

Can you please suggest how to approach, or from where to start. I am new in data analytics and I need some hints to go on with the analysis and I will be grateful to have some guidance at least high level what can be done.



Thanks in advance!










share|improve this question









$endgroup$




bumped to the homepage by Community 11 mins ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.



















    1












    $begingroup$


    I was adviced to write in this group regarding my question about modeling categorical database.
    I have a customer dataset, which is a survey result. I have 1595 obs. and about 200 columns(200 because most of the cases the questions were multiple choice and we had to split it into columns). Majority of variables are categorical or binary. I do not have continous variables at all. My task is to do customer segmentation, clustering. There is no initial assumptions although as I have also the questionnaire so can logically seperate the important questions.



    I face several issues regarding the modeling



    1. I need to validate the choice of variables i use

    2. I am trying to find associations, pairwise associations and trends, as I do not have initial assumtions who can be my segments

    3. Clustering models are not working good for categorical variables and the ones I tried for example kmods, ignore the associations, correlations and return me not clear picture.

    Can you please suggest how to approach, or from where to start. I am new in data analytics and I need some hints to go on with the analysis and I will be grateful to have some guidance at least high level what can be done.



    Thanks in advance!










    share|improve this question









    $endgroup$




    bumped to the homepage by Community 11 mins ago


    This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.

















      1












      1








      1





      $begingroup$


      I was adviced to write in this group regarding my question about modeling categorical database.
      I have a customer dataset, which is a survey result. I have 1595 obs. and about 200 columns(200 because most of the cases the questions were multiple choice and we had to split it into columns). Majority of variables are categorical or binary. I do not have continous variables at all. My task is to do customer segmentation, clustering. There is no initial assumptions although as I have also the questionnaire so can logically seperate the important questions.



      I face several issues regarding the modeling



      1. I need to validate the choice of variables i use

      2. I am trying to find associations, pairwise associations and trends, as I do not have initial assumtions who can be my segments

      3. Clustering models are not working good for categorical variables and the ones I tried for example kmods, ignore the associations, correlations and return me not clear picture.

      Can you please suggest how to approach, or from where to start. I am new in data analytics and I need some hints to go on with the analysis and I will be grateful to have some guidance at least high level what can be done.



      Thanks in advance!










      share|improve this question









      $endgroup$




      I was adviced to write in this group regarding my question about modeling categorical database.
      I have a customer dataset, which is a survey result. I have 1595 obs. and about 200 columns(200 because most of the cases the questions were multiple choice and we had to split it into columns). Majority of variables are categorical or binary. I do not have continous variables at all. My task is to do customer segmentation, clustering. There is no initial assumptions although as I have also the questionnaire so can logically seperate the important questions.



      I face several issues regarding the modeling



      1. I need to validate the choice of variables i use

      2. I am trying to find associations, pairwise associations and trends, as I do not have initial assumtions who can be my segments

      3. Clustering models are not working good for categorical variables and the ones I tried for example kmods, ignore the associations, correlations and return me not clear picture.

      Can you please suggest how to approach, or from where to start. I am new in data analytics and I need some hints to go on with the analysis and I will be grateful to have some guidance at least high level what can be done.



      Thanks in advance!







      r clustering categorical-data






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Mar 28 '18 at 19:55









      SaraSara

      111




      111





      bumped to the homepage by Community 11 mins ago


      This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.







      bumped to the homepage by Community 11 mins ago


      This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.






















          1 Answer
          1






          active

          oldest

          votes


















          0












          $begingroup$

          You can use mixture model with variable selection. In this framework, the challenge of variable selection consists in model selection. Thus, detection of the relevant features and estimation of the number of clusters can be done according to information criteria (like BIC or ICL).



          To perform this analysis, you can use the R package VarSelLCM. Because you consider categorical variables, your dataset must be a data.frame and each column must be a factor. Here is an example of the script. Your dataset is denoted by "my.data".



          ## Clustering by considering all the variables as discriminative
          # Number of clusters is between 1 and 6
          res.all <- VarSelCluster(my.data, 1:6, vbleSelec = FALSE)

          # partition
          res.all@partitions@zMAP

          # shiny application
          VarSelShiny(res.all)


          ## Clustering with variable selection
          # Number of clusters is between 1 and 6
          res.selec <- VarSelCluster(my.data, 1:6, vbleSelec = TRUE)

          # partition
          res.selec@partitions@zMAP

          # shiny application
          VarSelShiny(res.selec)





          share|improve this answer









          $endgroup$












          • $begingroup$
            Thanks a lot for your reply! I tried the code on my dataset, although I have all my variables as factors in dataframe. i receive an error that at least one is not and some additinal 50 warnings ahead. I will try to figure it out and see what are the results for this method. Thank you very much for supprot!
            $endgroup$
            – Sara
            Mar 29 '18 at 19:17











          • $begingroup$
            About the warnings, they appear if some levels are not taken by any observations. About the error, may be you can convert each column as factor, see the R code below my.data <- as.data.frame(my.data) for (j in 1:ncol(my_data)) my.data[,j] <- as.factor(my.data[,j])
            $endgroup$
            – user200668
            Mar 29 '18 at 19:48











          Your Answer








          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "557"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: false,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          imageUploader:
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          ,
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );













          draft saved

          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f29665%2fcustomer-segmentation-with-categorical-variables%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          0












          $begingroup$

          You can use mixture model with variable selection. In this framework, the challenge of variable selection consists in model selection. Thus, detection of the relevant features and estimation of the number of clusters can be done according to information criteria (like BIC or ICL).



          To perform this analysis, you can use the R package VarSelLCM. Because you consider categorical variables, your dataset must be a data.frame and each column must be a factor. Here is an example of the script. Your dataset is denoted by "my.data".



          ## Clustering by considering all the variables as discriminative
          # Number of clusters is between 1 and 6
          res.all <- VarSelCluster(my.data, 1:6, vbleSelec = FALSE)

          # partition
          res.all@partitions@zMAP

          # shiny application
          VarSelShiny(res.all)


          ## Clustering with variable selection
          # Number of clusters is between 1 and 6
          res.selec <- VarSelCluster(my.data, 1:6, vbleSelec = TRUE)

          # partition
          res.selec@partitions@zMAP

          # shiny application
          VarSelShiny(res.selec)





          share|improve this answer









          $endgroup$












          • $begingroup$
            Thanks a lot for your reply! I tried the code on my dataset, although I have all my variables as factors in dataframe. i receive an error that at least one is not and some additinal 50 warnings ahead. I will try to figure it out and see what are the results for this method. Thank you very much for supprot!
            $endgroup$
            – Sara
            Mar 29 '18 at 19:17











          • $begingroup$
            About the warnings, they appear if some levels are not taken by any observations. About the error, may be you can convert each column as factor, see the R code below my.data <- as.data.frame(my.data) for (j in 1:ncol(my_data)) my.data[,j] <- as.factor(my.data[,j])
            $endgroup$
            – user200668
            Mar 29 '18 at 19:48















          0












          $begingroup$

          You can use mixture model with variable selection. In this framework, the challenge of variable selection consists in model selection. Thus, detection of the relevant features and estimation of the number of clusters can be done according to information criteria (like BIC or ICL).



          To perform this analysis, you can use the R package VarSelLCM. Because you consider categorical variables, your dataset must be a data.frame and each column must be a factor. Here is an example of the script. Your dataset is denoted by "my.data".



          ## Clustering by considering all the variables as discriminative
          # Number of clusters is between 1 and 6
          res.all <- VarSelCluster(my.data, 1:6, vbleSelec = FALSE)

          # partition
          res.all@partitions@zMAP

          # shiny application
          VarSelShiny(res.all)


          ## Clustering with variable selection
          # Number of clusters is between 1 and 6
          res.selec <- VarSelCluster(my.data, 1:6, vbleSelec = TRUE)

          # partition
          res.selec@partitions@zMAP

          # shiny application
          VarSelShiny(res.selec)





          share|improve this answer









          $endgroup$












          • $begingroup$
            Thanks a lot for your reply! I tried the code on my dataset, although I have all my variables as factors in dataframe. i receive an error that at least one is not and some additinal 50 warnings ahead. I will try to figure it out and see what are the results for this method. Thank you very much for supprot!
            $endgroup$
            – Sara
            Mar 29 '18 at 19:17











          • $begingroup$
            About the warnings, they appear if some levels are not taken by any observations. About the error, may be you can convert each column as factor, see the R code below my.data <- as.data.frame(my.data) for (j in 1:ncol(my_data)) my.data[,j] <- as.factor(my.data[,j])
            $endgroup$
            – user200668
            Mar 29 '18 at 19:48













          0












          0








          0





          $begingroup$

          You can use mixture model with variable selection. In this framework, the challenge of variable selection consists in model selection. Thus, detection of the relevant features and estimation of the number of clusters can be done according to information criteria (like BIC or ICL).



          To perform this analysis, you can use the R package VarSelLCM. Because you consider categorical variables, your dataset must be a data.frame and each column must be a factor. Here is an example of the script. Your dataset is denoted by "my.data".



          ## Clustering by considering all the variables as discriminative
          # Number of clusters is between 1 and 6
          res.all <- VarSelCluster(my.data, 1:6, vbleSelec = FALSE)

          # partition
          res.all@partitions@zMAP

          # shiny application
          VarSelShiny(res.all)


          ## Clustering with variable selection
          # Number of clusters is between 1 and 6
          res.selec <- VarSelCluster(my.data, 1:6, vbleSelec = TRUE)

          # partition
          res.selec@partitions@zMAP

          # shiny application
          VarSelShiny(res.selec)





          share|improve this answer









          $endgroup$



          You can use mixture model with variable selection. In this framework, the challenge of variable selection consists in model selection. Thus, detection of the relevant features and estimation of the number of clusters can be done according to information criteria (like BIC or ICL).



          To perform this analysis, you can use the R package VarSelLCM. Because you consider categorical variables, your dataset must be a data.frame and each column must be a factor. Here is an example of the script. Your dataset is denoted by "my.data".



          ## Clustering by considering all the variables as discriminative
          # Number of clusters is between 1 and 6
          res.all <- VarSelCluster(my.data, 1:6, vbleSelec = FALSE)

          # partition
          res.all@partitions@zMAP

          # shiny application
          VarSelShiny(res.all)


          ## Clustering with variable selection
          # Number of clusters is between 1 and 6
          res.selec <- VarSelCluster(my.data, 1:6, vbleSelec = TRUE)

          # partition
          res.selec@partitions@zMAP

          # shiny application
          VarSelShiny(res.selec)






          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Mar 28 '18 at 20:49









          user200668user200668

          11




          11











          • $begingroup$
            Thanks a lot for your reply! I tried the code on my dataset, although I have all my variables as factors in dataframe. i receive an error that at least one is not and some additinal 50 warnings ahead. I will try to figure it out and see what are the results for this method. Thank you very much for supprot!
            $endgroup$
            – Sara
            Mar 29 '18 at 19:17











          • $begingroup$
            About the warnings, they appear if some levels are not taken by any observations. About the error, may be you can convert each column as factor, see the R code below my.data <- as.data.frame(my.data) for (j in 1:ncol(my_data)) my.data[,j] <- as.factor(my.data[,j])
            $endgroup$
            – user200668
            Mar 29 '18 at 19:48
















          • $begingroup$
            Thanks a lot for your reply! I tried the code on my dataset, although I have all my variables as factors in dataframe. i receive an error that at least one is not and some additinal 50 warnings ahead. I will try to figure it out and see what are the results for this method. Thank you very much for supprot!
            $endgroup$
            – Sara
            Mar 29 '18 at 19:17











          • $begingroup$
            About the warnings, they appear if some levels are not taken by any observations. About the error, may be you can convert each column as factor, see the R code below my.data <- as.data.frame(my.data) for (j in 1:ncol(my_data)) my.data[,j] <- as.factor(my.data[,j])
            $endgroup$
            – user200668
            Mar 29 '18 at 19:48















          $begingroup$
          Thanks a lot for your reply! I tried the code on my dataset, although I have all my variables as factors in dataframe. i receive an error that at least one is not and some additinal 50 warnings ahead. I will try to figure it out and see what are the results for this method. Thank you very much for supprot!
          $endgroup$
          – Sara
          Mar 29 '18 at 19:17





          $begingroup$
          Thanks a lot for your reply! I tried the code on my dataset, although I have all my variables as factors in dataframe. i receive an error that at least one is not and some additinal 50 warnings ahead. I will try to figure it out and see what are the results for this method. Thank you very much for supprot!
          $endgroup$
          – Sara
          Mar 29 '18 at 19:17













          $begingroup$
          About the warnings, they appear if some levels are not taken by any observations. About the error, may be you can convert each column as factor, see the R code below my.data <- as.data.frame(my.data) for (j in 1:ncol(my_data)) my.data[,j] <- as.factor(my.data[,j])
          $endgroup$
          – user200668
          Mar 29 '18 at 19:48




          $begingroup$
          About the warnings, they appear if some levels are not taken by any observations. About the error, may be you can convert each column as factor, see the R code below my.data <- as.data.frame(my.data) for (j in 1:ncol(my_data)) my.data[,j] <- as.factor(my.data[,j])
          $endgroup$
          – user200668
          Mar 29 '18 at 19:48

















          draft saved

          draft discarded
















































          Thanks for contributing an answer to Data Science Stack Exchange!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid


          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.

          Use MathJax to format equations. MathJax reference.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f29665%2fcustomer-segmentation-with-categorical-variables%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Францішак Багушэвіч Змест Сям'я | Біяграфія | Творчасць | Мова Багушэвіча | Ацэнкі дзейнасці | Цікавыя факты | Спадчына | Выбраная бібліяграфія | Ушанаванне памяці | У філатэліі | Зноскі | Літаратура | Спасылкі | НавігацыяЛяхоўскі У. Рупіўся дзеля Бога і людзей: Жыццёвы шлях Лявона Вітан-Дубейкаўскага // Вольскі і Памідораў з песняй пра немца Адвакат, паэт, народны заступнік Ашмянскі веснікВ Минске появится площадь Богушевича и улица Сырокомли, Белорусская деловая газета, 19 июля 2001 г.Айцец беларускай нацыянальнай ідэі паўстаў у бронзе Сяргей Аляксандравіч Адашкевіч (1918, Мінск). 80-я гады. Бюст «Францішак Багушэвіч».Яўген Мікалаевіч Ціхановіч. «Партрэт Францішка Багушэвіча»Мікола Мікалаевіч Купава. «Партрэт зачынальніка новай беларускай літаратуры Францішка Багушэвіча»Уладзімір Іванавіч Мелехаў. На помніку «Змагарам за родную мову» Барэльеф «Францішак Багушэвіч»Памяць пра Багушэвіча на Віленшчыне Страчаная сталіца. Беларускія шыльды на вуліцах Вільні«Krynica». Ideologia i przywódcy białoruskiego katolicyzmuФранцішак БагушэвічТворы на knihi.comТворы Францішка Багушэвіча на bellib.byСодаль Уладзімір. Францішак Багушэвіч на Лідчыне;Луцкевіч Антон. Жыцьцё і творчасьць Фр. Багушэвіча ў успамінах ягоных сучасьнікаў // Запісы Беларускага Навуковага таварыства. Вільня, 1938. Сшытак 1. С. 16-34.Большая российская1188761710000 0000 5537 633Xn9209310021619551927869394п

          Partai Komunis Tiongkok Daftar isi Kepemimpinan | Pranala luar | Referensi | Menu navigasidiperiksa1 perubahan tertundacpc.people.com.cnSitus resmiSurat kabar resmi"Why the Communist Party is alive, well and flourishing in China"0307-1235"Full text of Constitution of Communist Party of China"smengembangkannyas

          ValueError: Expected n_neighbors <= n_samples, but n_samples = 1, n_neighbors = 6 (SMOTE) The 2019 Stack Overflow Developer Survey Results Are InCan SMOTE be applied over sequence of words (sentences)?ValueError when doing validation with random forestsSMOTE and multi class oversamplingLogic behind SMOTE-NC?ValueError: Error when checking target: expected dense_1 to have shape (7,) but got array with shape (1,)SmoteBoost: Should SMOTE be ran individually for each iteration/tree in the boosting?solving multi-class imbalance classification using smote and OSSUsing SMOTE for Synthetic Data generation to improve performance on unbalanced dataproblem of entry format for a simple model in KerasSVM SMOTE fit_resample() function runs forever with no result