customer segmentation with categorical variablesIdentifying top predictors from a mix of categorical and ordinal dataClustering : handling categorical data, should we pivot and scale?problem with regular expressionClusering based on categorical variables?Imputation of missing values and dealing with categorical valuesHandling categorical variables in linear regression and random forestHow to deal with categorical variablesConvert nominal to numeric variables?Dealing with a dataset with a mix of continuous and categorical variablesDealing with multiple distinct-value categorical variables
English or Hindi translation of Vyasa Smriti
A Note on N!
My bank got bought out, am I now going to have to start filing tax returns in a different state?
Combinatorics problem, right solution?
How does the mezzoloth's teleportation work?
Drawing a german abacus as in the books of Adam Ries
Could moose/elk survive in the Amazon forest?
Is there any pythonic way to find average of specific tuple elements in array?
Multiple options vs single option UI
How important is it that $TERM is correct?
How to find if a column is referenced in a computed column?
How exactly does Hawking radiation decrease the mass of black holes?
Can a level 2 Warlock take one level in rogue, then continue advancing as a warlock?
Do I need to watch Ant-Man and the Wasp and Captain Marvel before watching Avengers: Endgame?
Philosophical question on logistic regression: why isn't the optimal threshold value trained?
`microtype`: Set Minimum Width of a Space
What makes accurate emulation of old systems a difficult task?
Which big number is bigger?
A Paper Record is What I Hamper
How bug prioritization works in agile projects vs non agile
What is the best way to deal with NPC-NPC combat?
A strange hotel
What's the difference between using dependency injection with a container and using a service locator?
How do I produce this symbol: Ϟ in pdfLaTeX?
customer segmentation with categorical variables
Identifying top predictors from a mix of categorical and ordinal dataClustering : handling categorical data, should we pivot and scale?problem with regular expressionClusering based on categorical variables?Imputation of missing values and dealing with categorical valuesHandling categorical variables in linear regression and random forestHow to deal with categorical variablesConvert nominal to numeric variables?Dealing with a dataset with a mix of continuous and categorical variablesDealing with multiple distinct-value categorical variables
$begingroup$
I was adviced to write in this group regarding my question about modeling categorical database.
I have a customer dataset, which is a survey result. I have 1595 obs. and about 200 columns(200 because most of the cases the questions were multiple choice and we had to split it into columns). Majority of variables are categorical or binary. I do not have continous variables at all. My task is to do customer segmentation, clustering. There is no initial assumptions although as I have also the questionnaire so can logically seperate the important questions.
I face several issues regarding the modeling
- I need to validate the choice of variables i use
- I am trying to find associations, pairwise associations and trends, as I do not have initial assumtions who can be my segments
- Clustering models are not working good for categorical variables and the ones I tried for example kmods, ignore the associations, correlations and return me not clear picture.
Can you please suggest how to approach, or from where to start. I am new in data analytics and I need some hints to go on with the analysis and I will be grateful to have some guidance at least high level what can be done.
Thanks in advance!
r clustering categorical-data
$endgroup$
bumped to the homepage by Community♦ 11 mins ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
add a comment |
$begingroup$
I was adviced to write in this group regarding my question about modeling categorical database.
I have a customer dataset, which is a survey result. I have 1595 obs. and about 200 columns(200 because most of the cases the questions were multiple choice and we had to split it into columns). Majority of variables are categorical or binary. I do not have continous variables at all. My task is to do customer segmentation, clustering. There is no initial assumptions although as I have also the questionnaire so can logically seperate the important questions.
I face several issues regarding the modeling
- I need to validate the choice of variables i use
- I am trying to find associations, pairwise associations and trends, as I do not have initial assumtions who can be my segments
- Clustering models are not working good for categorical variables and the ones I tried for example kmods, ignore the associations, correlations and return me not clear picture.
Can you please suggest how to approach, or from where to start. I am new in data analytics and I need some hints to go on with the analysis and I will be grateful to have some guidance at least high level what can be done.
Thanks in advance!
r clustering categorical-data
$endgroup$
bumped to the homepage by Community♦ 11 mins ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
add a comment |
$begingroup$
I was adviced to write in this group regarding my question about modeling categorical database.
I have a customer dataset, which is a survey result. I have 1595 obs. and about 200 columns(200 because most of the cases the questions were multiple choice and we had to split it into columns). Majority of variables are categorical or binary. I do not have continous variables at all. My task is to do customer segmentation, clustering. There is no initial assumptions although as I have also the questionnaire so can logically seperate the important questions.
I face several issues regarding the modeling
- I need to validate the choice of variables i use
- I am trying to find associations, pairwise associations and trends, as I do not have initial assumtions who can be my segments
- Clustering models are not working good for categorical variables and the ones I tried for example kmods, ignore the associations, correlations and return me not clear picture.
Can you please suggest how to approach, or from where to start. I am new in data analytics and I need some hints to go on with the analysis and I will be grateful to have some guidance at least high level what can be done.
Thanks in advance!
r clustering categorical-data
$endgroup$
I was adviced to write in this group regarding my question about modeling categorical database.
I have a customer dataset, which is a survey result. I have 1595 obs. and about 200 columns(200 because most of the cases the questions were multiple choice and we had to split it into columns). Majority of variables are categorical or binary. I do not have continous variables at all. My task is to do customer segmentation, clustering. There is no initial assumptions although as I have also the questionnaire so can logically seperate the important questions.
I face several issues regarding the modeling
- I need to validate the choice of variables i use
- I am trying to find associations, pairwise associations and trends, as I do not have initial assumtions who can be my segments
- Clustering models are not working good for categorical variables and the ones I tried for example kmods, ignore the associations, correlations and return me not clear picture.
Can you please suggest how to approach, or from where to start. I am new in data analytics and I need some hints to go on with the analysis and I will be grateful to have some guidance at least high level what can be done.
Thanks in advance!
r clustering categorical-data
r clustering categorical-data
asked Mar 28 '18 at 19:55
SaraSara
111
111
bumped to the homepage by Community♦ 11 mins ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
bumped to the homepage by Community♦ 11 mins ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
$begingroup$
You can use mixture model with variable selection. In this framework, the challenge of variable selection consists in model selection. Thus, detection of the relevant features and estimation of the number of clusters can be done according to information criteria (like BIC or ICL).
To perform this analysis, you can use the R package VarSelLCM. Because you consider categorical variables, your dataset must be a data.frame and each column must be a factor. Here is an example of the script. Your dataset is denoted by "my.data".
## Clustering by considering all the variables as discriminative
# Number of clusters is between 1 and 6
res.all <- VarSelCluster(my.data, 1:6, vbleSelec = FALSE)
# partition
res.all@partitions@zMAP
# shiny application
VarSelShiny(res.all)
## Clustering with variable selection
# Number of clusters is between 1 and 6
res.selec <- VarSelCluster(my.data, 1:6, vbleSelec = TRUE)
# partition
res.selec@partitions@zMAP
# shiny application
VarSelShiny(res.selec)
$endgroup$
$begingroup$
Thanks a lot for your reply! I tried the code on my dataset, although I have all my variables as factors in dataframe. i receive an error that at least one is not and some additinal 50 warnings ahead. I will try to figure it out and see what are the results for this method. Thank you very much for supprot!
$endgroup$
– Sara
Mar 29 '18 at 19:17
$begingroup$
About the warnings, they appear if some levels are not taken by any observations. About the error, may be you can convert each column as factor, see the R code below my.data <- as.data.frame(my.data) for (j in 1:ncol(my_data)) my.data[,j] <- as.factor(my.data[,j])
$endgroup$
– user200668
Mar 29 '18 at 19:48
add a comment |
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f29665%2fcustomer-segmentation-with-categorical-variables%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
You can use mixture model with variable selection. In this framework, the challenge of variable selection consists in model selection. Thus, detection of the relevant features and estimation of the number of clusters can be done according to information criteria (like BIC or ICL).
To perform this analysis, you can use the R package VarSelLCM. Because you consider categorical variables, your dataset must be a data.frame and each column must be a factor. Here is an example of the script. Your dataset is denoted by "my.data".
## Clustering by considering all the variables as discriminative
# Number of clusters is between 1 and 6
res.all <- VarSelCluster(my.data, 1:6, vbleSelec = FALSE)
# partition
res.all@partitions@zMAP
# shiny application
VarSelShiny(res.all)
## Clustering with variable selection
# Number of clusters is between 1 and 6
res.selec <- VarSelCluster(my.data, 1:6, vbleSelec = TRUE)
# partition
res.selec@partitions@zMAP
# shiny application
VarSelShiny(res.selec)
$endgroup$
$begingroup$
Thanks a lot for your reply! I tried the code on my dataset, although I have all my variables as factors in dataframe. i receive an error that at least one is not and some additinal 50 warnings ahead. I will try to figure it out and see what are the results for this method. Thank you very much for supprot!
$endgroup$
– Sara
Mar 29 '18 at 19:17
$begingroup$
About the warnings, they appear if some levels are not taken by any observations. About the error, may be you can convert each column as factor, see the R code below my.data <- as.data.frame(my.data) for (j in 1:ncol(my_data)) my.data[,j] <- as.factor(my.data[,j])
$endgroup$
– user200668
Mar 29 '18 at 19:48
add a comment |
$begingroup$
You can use mixture model with variable selection. In this framework, the challenge of variable selection consists in model selection. Thus, detection of the relevant features and estimation of the number of clusters can be done according to information criteria (like BIC or ICL).
To perform this analysis, you can use the R package VarSelLCM. Because you consider categorical variables, your dataset must be a data.frame and each column must be a factor. Here is an example of the script. Your dataset is denoted by "my.data".
## Clustering by considering all the variables as discriminative
# Number of clusters is between 1 and 6
res.all <- VarSelCluster(my.data, 1:6, vbleSelec = FALSE)
# partition
res.all@partitions@zMAP
# shiny application
VarSelShiny(res.all)
## Clustering with variable selection
# Number of clusters is between 1 and 6
res.selec <- VarSelCluster(my.data, 1:6, vbleSelec = TRUE)
# partition
res.selec@partitions@zMAP
# shiny application
VarSelShiny(res.selec)
$endgroup$
$begingroup$
Thanks a lot for your reply! I tried the code on my dataset, although I have all my variables as factors in dataframe. i receive an error that at least one is not and some additinal 50 warnings ahead. I will try to figure it out and see what are the results for this method. Thank you very much for supprot!
$endgroup$
– Sara
Mar 29 '18 at 19:17
$begingroup$
About the warnings, they appear if some levels are not taken by any observations. About the error, may be you can convert each column as factor, see the R code below my.data <- as.data.frame(my.data) for (j in 1:ncol(my_data)) my.data[,j] <- as.factor(my.data[,j])
$endgroup$
– user200668
Mar 29 '18 at 19:48
add a comment |
$begingroup$
You can use mixture model with variable selection. In this framework, the challenge of variable selection consists in model selection. Thus, detection of the relevant features and estimation of the number of clusters can be done according to information criteria (like BIC or ICL).
To perform this analysis, you can use the R package VarSelLCM. Because you consider categorical variables, your dataset must be a data.frame and each column must be a factor. Here is an example of the script. Your dataset is denoted by "my.data".
## Clustering by considering all the variables as discriminative
# Number of clusters is between 1 and 6
res.all <- VarSelCluster(my.data, 1:6, vbleSelec = FALSE)
# partition
res.all@partitions@zMAP
# shiny application
VarSelShiny(res.all)
## Clustering with variable selection
# Number of clusters is between 1 and 6
res.selec <- VarSelCluster(my.data, 1:6, vbleSelec = TRUE)
# partition
res.selec@partitions@zMAP
# shiny application
VarSelShiny(res.selec)
$endgroup$
You can use mixture model with variable selection. In this framework, the challenge of variable selection consists in model selection. Thus, detection of the relevant features and estimation of the number of clusters can be done according to information criteria (like BIC or ICL).
To perform this analysis, you can use the R package VarSelLCM. Because you consider categorical variables, your dataset must be a data.frame and each column must be a factor. Here is an example of the script. Your dataset is denoted by "my.data".
## Clustering by considering all the variables as discriminative
# Number of clusters is between 1 and 6
res.all <- VarSelCluster(my.data, 1:6, vbleSelec = FALSE)
# partition
res.all@partitions@zMAP
# shiny application
VarSelShiny(res.all)
## Clustering with variable selection
# Number of clusters is between 1 and 6
res.selec <- VarSelCluster(my.data, 1:6, vbleSelec = TRUE)
# partition
res.selec@partitions@zMAP
# shiny application
VarSelShiny(res.selec)
answered Mar 28 '18 at 20:49
user200668user200668
11
11
$begingroup$
Thanks a lot for your reply! I tried the code on my dataset, although I have all my variables as factors in dataframe. i receive an error that at least one is not and some additinal 50 warnings ahead. I will try to figure it out and see what are the results for this method. Thank you very much for supprot!
$endgroup$
– Sara
Mar 29 '18 at 19:17
$begingroup$
About the warnings, they appear if some levels are not taken by any observations. About the error, may be you can convert each column as factor, see the R code below my.data <- as.data.frame(my.data) for (j in 1:ncol(my_data)) my.data[,j] <- as.factor(my.data[,j])
$endgroup$
– user200668
Mar 29 '18 at 19:48
add a comment |
$begingroup$
Thanks a lot for your reply! I tried the code on my dataset, although I have all my variables as factors in dataframe. i receive an error that at least one is not and some additinal 50 warnings ahead. I will try to figure it out and see what are the results for this method. Thank you very much for supprot!
$endgroup$
– Sara
Mar 29 '18 at 19:17
$begingroup$
About the warnings, they appear if some levels are not taken by any observations. About the error, may be you can convert each column as factor, see the R code below my.data <- as.data.frame(my.data) for (j in 1:ncol(my_data)) my.data[,j] <- as.factor(my.data[,j])
$endgroup$
– user200668
Mar 29 '18 at 19:48
$begingroup$
Thanks a lot for your reply! I tried the code on my dataset, although I have all my variables as factors in dataframe. i receive an error that at least one is not and some additinal 50 warnings ahead. I will try to figure it out and see what are the results for this method. Thank you very much for supprot!
$endgroup$
– Sara
Mar 29 '18 at 19:17
$begingroup$
Thanks a lot for your reply! I tried the code on my dataset, although I have all my variables as factors in dataframe. i receive an error that at least one is not and some additinal 50 warnings ahead. I will try to figure it out and see what are the results for this method. Thank you very much for supprot!
$endgroup$
– Sara
Mar 29 '18 at 19:17
$begingroup$
About the warnings, they appear if some levels are not taken by any observations. About the error, may be you can convert each column as factor, see the R code below my.data <- as.data.frame(my.data) for (j in 1:ncol(my_data)) my.data[,j] <- as.factor(my.data[,j])
$endgroup$
– user200668
Mar 29 '18 at 19:48
$begingroup$
About the warnings, they appear if some levels are not taken by any observations. About the error, may be you can convert each column as factor, see the R code below my.data <- as.data.frame(my.data) for (j in 1:ncol(my_data)) my.data[,j] <- as.factor(my.data[,j])
$endgroup$
– user200668
Mar 29 '18 at 19:48
add a comment |
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f29665%2fcustomer-segmentation-with-categorical-variables%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown