customer segmentation with categorical variablesIdentifying top predictors from a mix of categorical and ordinal dataClustering : handling categorical data, should we pivot and scale?problem with regular expressionClusering based on categorical variables?Imputation of missing values and dealing with categorical valuesHandling categorical variables in linear regression and random forestHow to deal with categorical variablesConvert nominal to numeric variables?Dealing with a dataset with a mix of continuous and categorical variablesDealing with multiple distinct-value categorical variables

English or Hindi translation of Vyasa Smriti

A Note on N!

My bank got bought out, am I now going to have to start filing tax returns in a different state?

Combinatorics problem, right solution?

How does the mezzoloth's teleportation work?

Drawing a german abacus as in the books of Adam Ries

Could moose/elk survive in the Amazon forest?

Is there any pythonic way to find average of specific tuple elements in array?

Multiple options vs single option UI

How important is it that $TERM is correct?

How to find if a column is referenced in a computed column?

How exactly does Hawking radiation decrease the mass of black holes?

Can a level 2 Warlock take one level in rogue, then continue advancing as a warlock?

Do I need to watch Ant-Man and the Wasp and Captain Marvel before watching Avengers: Endgame?

Philosophical question on logistic regression: why isn't the optimal threshold value trained?

`microtype`: Set Minimum Width of a Space

What makes accurate emulation of old systems a difficult task?

Which big number is bigger?

A Paper Record is What I Hamper

How bug prioritization works in agile projects vs non agile

What is the best way to deal with NPC-NPC combat?

A strange hotel

What's the difference between using dependency injection with a container and using a service locator?

How do I produce this symbol: Ϟ in pdfLaTeX?

customer segmentation with categorical variables

Identifying top predictors from a mix of categorical and ordinal dataClustering : handling categorical data, should we pivot and scale?problem with regular expressionClusering based on categorical variables?Imputation of missing values and dealing with categorical valuesHandling categorical variables in linear regression and random forestHow to deal with categorical variablesConvert nominal to numeric variables?Dealing with a dataset with a mix of continuous and categorical variablesDealing with multiple distinct-value categorical variables

I was adviced to write in this group regarding my question about modeling categorical database.
I have a customer dataset, which is a survey result. I have 1595 obs. and about 200 columns(200 because most of the cases the questions were multiple choice and we had to split it into columns). Majority of variables are categorical or binary. I do not have continous variables at all. My task is to do customer segmentation, clustering. There is no initial assumptions although as I have also the questionnaire so can logically seperate the important questions.

I face several issues regarding the modeling

I need to validate the choice of variables i use

I am trying to find associations, pairwise associations and trends, as I do not have initial assumtions who can be my segments

Clustering models are not working good for categorical variables and the ones I tried for example kmods, ignore the associations, correlations and return me not clear picture.

Can you please suggest how to approach, or from where to start. I am new in data analytics and I need some hints to go on with the analysis and I will be grateful to have some guidance at least high level what can be done.

Thanks in advance!

asked Mar 28 '18 at 19:55

Sara

111

bumped to the homepage by Community♦ 11 mins ago

This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.

add a comment |

I face several issues regarding the modeling

I need to validate the choice of variables i use

I am trying to find associations, pairwise associations and trends, as I do not have initial assumtions who can be my segments

Clustering models are not working good for categorical variables and the ones I tried for example kmods, ignore the associations, correlations and return me not clear picture.

Thanks in advance!

asked Mar 28 '18 at 19:55

Sara

111

bumped to the homepage by Community♦ 11 mins ago

This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.

add a comment |

I face several issues regarding the modeling

I need to validate the choice of variables i use

I am trying to find associations, pairwise associations and trends, as I do not have initial assumtions who can be my segments

Clustering models are not working good for categorical variables and the ones I tried for example kmods, ignore the associations, correlations and return me not clear picture.

Thanks in advance!

asked Mar 28 '18 at 19:55

Sara

111

I face several issues regarding the modeling

I need to validate the choice of variables i use

I am trying to find associations, pairwise associations and trends, as I do not have initial assumtions who can be my segments

Clustering models are not working good for categorical variables and the ones I tried for example kmods, ignore the associations, correlations and return me not clear picture.

Thanks in advance!

r clustering categorical-data

asked Mar 28 '18 at 19:55

Sara

111

asked Mar 28 '18 at 19:55

Sara

111

asked Mar 28 '18 at 19:55

Sara

111

asked Mar 28 '18 at 19:55

Sara

111

asked Mar 28 '18 at 19:55

Sara

111

bumped to the homepage by Community♦ 11 mins ago

This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.

bumped to the homepage by Community♦ 11 mins ago

This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.

add a comment |

1 Answer
1

active

oldest

votes

You can use mixture model with variable selection. In this framework, the challenge of variable selection consists in model selection. Thus, detection of the relevant features and estimation of the number of clusters can be done according to information criteria (like BIC or ICL).

To perform this analysis, you can use the R package VarSelLCM. Because you consider categorical variables, your dataset must be a data.frame and each column must be a factor. Here is an example of the script. Your dataset is denoted by "my.data".

## Clustering by considering all the variables as discriminative
# Number of clusters is between 1 and 6
res.all <- VarSelCluster(my.data, 1:6, vbleSelec = FALSE)

# partition
res.all@partitions@zMAP

# shiny application
VarSelShiny(res.all)


## Clustering with variable selection
# Number of clusters is between 1 and 6
res.selec <- VarSelCluster(my.data, 1:6, vbleSelec = TRUE)

# partition
res.selec@partitions@zMAP

# shiny application
VarSelShiny(res.selec)

answered Mar 28 '18 at 20:49

user200668

$begingroup$
Thanks a lot for your reply! I tried the code on my dataset, although I have all my variables as factors in dataframe. i receive an error that at least one is not and some additinal 50 warnings ahead. I will try to figure it out and see what are the results for this method. Thank you very much for supprot!
$endgroup$
– Sara
Mar 29 '18 at 19:17

$begingroup$
About the warnings, they appear if some levels are not taken by any observations. About the error, may be you can convert each column as factor, see the R code below my.data <- as.data.frame(my.data) for (j in 1:ncol(my_data)) my.data[,j] <- as.factor(my.data[,j])
$endgroup$
– user200668
Mar 29 '18 at 19:48

add a comment |

Your Answer

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f29665%2fcustomer-segmentation-with-categorical-variables%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

## Clustering by considering all the variables as discriminative
# Number of clusters is between 1 and 6
res.all <- VarSelCluster(my.data, 1:6, vbleSelec = FALSE)

# partition
res.all@partitions@zMAP

# shiny application
VarSelShiny(res.all)


## Clustering with variable selection
# Number of clusters is between 1 and 6
res.selec <- VarSelCluster(my.data, 1:6, vbleSelec = TRUE)

# partition
res.selec@partitions@zMAP

# shiny application
VarSelShiny(res.selec)

answered Mar 28 '18 at 20:49

user200668

$begingroup$
Thanks a lot for your reply! I tried the code on my dataset, although I have all my variables as factors in dataframe. i receive an error that at least one is not and some additinal 50 warnings ahead. I will try to figure it out and see what are the results for this method. Thank you very much for supprot!
$endgroup$
– Sara
Mar 29 '18 at 19:17

$begingroup$
About the warnings, they appear if some levels are not taken by any observations. About the error, may be you can convert each column as factor, see the R code below my.data <- as.data.frame(my.data) for (j in 1:ncol(my_data)) my.data[,j] <- as.factor(my.data[,j])
$endgroup$
– user200668
Mar 29 '18 at 19:48

add a comment |

## Clustering by considering all the variables as discriminative
# Number of clusters is between 1 and 6
res.all <- VarSelCluster(my.data, 1:6, vbleSelec = FALSE)

# partition
res.all@partitions@zMAP

# shiny application
VarSelShiny(res.all)


## Clustering with variable selection
# Number of clusters is between 1 and 6
res.selec <- VarSelCluster(my.data, 1:6, vbleSelec = TRUE)

# partition
res.selec@partitions@zMAP

# shiny application
VarSelShiny(res.selec)

answered Mar 28 '18 at 20:49

user200668

$begingroup$
Thanks a lot for your reply! I tried the code on my dataset, although I have all my variables as factors in dataframe. i receive an error that at least one is not and some additinal 50 warnings ahead. I will try to figure it out and see what are the results for this method. Thank you very much for supprot!
$endgroup$
– Sara
Mar 29 '18 at 19:17

$begingroup$
About the warnings, they appear if some levels are not taken by any observations. About the error, may be you can convert each column as factor, see the R code below my.data <- as.data.frame(my.data) for (j in 1:ncol(my_data)) my.data[,j] <- as.factor(my.data[,j])
$endgroup$
– user200668
Mar 29 '18 at 19:48

add a comment |

## Clustering by considering all the variables as discriminative
# Number of clusters is between 1 and 6
res.all <- VarSelCluster(my.data, 1:6, vbleSelec = FALSE)

# partition
res.all@partitions@zMAP

# shiny application
VarSelShiny(res.all)


## Clustering with variable selection
# Number of clusters is between 1 and 6
res.selec <- VarSelCluster(my.data, 1:6, vbleSelec = TRUE)

# partition
res.selec@partitions@zMAP

# shiny application
VarSelShiny(res.selec)

answered Mar 28 '18 at 20:49

user200668

## Clustering by considering all the variables as discriminative
# Number of clusters is between 1 and 6
res.all <- VarSelCluster(my.data, 1:6, vbleSelec = FALSE)

# partition
res.all@partitions@zMAP

# shiny application
VarSelShiny(res.all)


## Clustering with variable selection
# Number of clusters is between 1 and 6
res.selec <- VarSelCluster(my.data, 1:6, vbleSelec = TRUE)

# partition
res.selec@partitions@zMAP

# shiny application
VarSelShiny(res.selec)

answered Mar 28 '18 at 20:49

user200668

answered Mar 28 '18 at 20:49

user200668

answered Mar 28 '18 at 20:49

user200668

answered Mar 28 '18 at 20:49

user200668

$begingroup$
Thanks a lot for your reply! I tried the code on my dataset, although I have all my variables as factors in dataframe. i receive an error that at least one is not and some additinal 50 warnings ahead. I will try to figure it out and see what are the results for this method. Thank you very much for supprot!
$endgroup$
– Sara
Mar 29 '18 at 19:17

$begingroup$
About the warnings, they appear if some levels are not taken by any observations. About the error, may be you can convert each column as factor, see the R code below my.data <- as.data.frame(my.data) for (j in 1:ncol(my_data)) my.data[,j] <- as.factor(my.data[,j])
$endgroup$
– user200668
Mar 29 '18 at 19:48

add a comment |

$begingroup$
Thanks a lot for your reply! I tried the code on my dataset, although I have all my variables as factors in dataframe. i receive an error that at least one is not and some additinal 50 warnings ahead. I will try to figure it out and see what are the results for this method. Thank you very much for supprot!
$endgroup$
– Sara
Mar 29 '18 at 19:17

$begingroup$
About the warnings, they appear if some levels are not taken by any observations. About the error, may be you can convert each column as factor, see the R code below my.data <- as.data.frame(my.data) for (j in 1:ncol(my_data)) my.data[,j] <- as.factor(my.data[,j])
$endgroup$
– user200668
Mar 29 '18 at 19:48

Thanks a lot for your reply! I tried the code on my dataset, although I have all my variables as factors in dataframe. i receive an error that at least one is not and some additinal 50 warnings ahead. I will try to figure it out and see what are the results for this method. Thank you very much for supprot!

– Sara
Mar 29 '18 at 19:17

About the warnings, they appear if some levels are not taken by any observations. About the error, may be you can convert each column as factor, see the R code below my.data <- as.data.frame(my.data) for (j in 1:ncol(my_data)) my.data[,j] <- as.factor(my.data[,j])

– user200668
Mar 29 '18 at 19:48

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Data Science Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Hfrxdjt

bumped to the homepage by Community♦ 11 mins ago

bumped to the homepage by Community♦ 11 mins ago

bumped to the homepage by Community♦ 11 mins ago

bumped to the homepage by Community♦ 11 mins ago

1 Answer
1

Your Answer

Post as a guest

1 Answer
1

1 Answer
1

Post as a guest

Popular posts from this blog

bumped to the homepage by Community♦ 11 mins ago

bumped to the homepage by Community♦ 11 mins ago

bumped to the homepage by Community♦ 11 mins ago

bumped to the homepage by Community♦ 11 mins ago

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

1 Answer 1

1 Answer 1

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

1 Answer
1

1 Answer
1

1 Answer
1