fix first two levels of decision tree? Unicorn Meta Zoo #1: Why another podcast? Announcing the arrival of Valued Associate #679: Cesar Manara 2019 Moderator Election Q&A - Questionnaire 2019 Community Moderator Election ResultsUsing machine learning specifically for feature analysis, not predictionsHow to interpret a decision tree correctly?Predict a tree structure out of nodes with different featuresUnderstanding decision tree conceptDecision tree orderingMulticollinearity in Decision TreeR vs. Python Decision TreeDisadvantage of decision treeRegression Decision Tree - Normalize or Split into Ranges a continuos featureDecision tree to get difference in rates in two groups?
What is ls Largest Number Formed by only moving two sticks in 508?
Arriving in Atlanta (after US Preclearance in Dublin). Will I go through TSA security in Atlanta to transfer to a connecting flight?
Why isPrototypeOf() returns false?
Like totally amazing interchangeable sister outfit accessory swapping or whatever
Why isn't everyone flabbergasted about Bran's "gift"?
How would it unbalance gameplay to rule that Weapon Master allows for picking a fighting style?
Is there a possibility to generate a list dynamically in Latex?
What is a 'Key' in computer science?
Where/What are Arya's scars from?
How can I wire a 9-position switch so that each position turns on one more LED than the one before?
What happened to Viserion in Season 7?
Has a Nobel Peace laureate ever been accused of war crimes?
Where to find documentation for `whois` command options?
Are these square matrices always diagonalisable?
What to do with someone that cheated their way though university and a PhD program?
Processing ADC conversion result: DMA vs Processor Registers
What is the numbering system used for the DSN dishes?
All ASCII characters with a given bit count
Was Objective-C really a hindrance to Apple software development?
How did Elite on the NES work?
Getting AggregateResult variables from Execute Anonymous Window
Israeli soda type drink
What is the evidence that custom checks in Northern Ireland are going to result in violence?
Why is arima in R one time step off?
fix first two levels of decision tree?
Unicorn Meta Zoo #1: Why another podcast?
Announcing the arrival of Valued Associate #679: Cesar Manara
2019 Moderator Election Q&A - Questionnaire
2019 Community Moderator Election ResultsUsing machine learning specifically for feature analysis, not predictionsHow to interpret a decision tree correctly?Predict a tree structure out of nodes with different featuresUnderstanding decision tree conceptDecision tree orderingMulticollinearity in Decision TreeR vs. Python Decision TreeDisadvantage of decision treeRegression Decision Tree - Normalize or Split into Ranges a continuos featureDecision tree to get difference in rates in two groups?
$begingroup$
I am trying to build a regression tree with 70 attributes where the business team wants to fix the first two levels namely country and product type.To achieve this,I have two proposals:
1.Build a separate tree for each combination of country & product type and use subsets of the data accordingly and pass on to respective tree for prediction.Saw here in comments.I have 88 levels in country and 3 levels in product type so it will generate 264 trees.
2.Build a basic tree with two variables namely country and product type with appropriate cp value to generate all combination as leaf nodes(264).Build a second tree with rest all variables and stack tree one upon tree two as a single decision tree.
I don't think the first one is the right way to do.Also, struck on how to stack the trees in second approach, even if it is not the right way would love to know how to achieve this.
Please guide me to approach the problem.Thanks.
machine-learning r predictive-modeling decision-trees
$endgroup$
bumped to the homepage by Community♦ 23 mins ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
add a comment |
$begingroup$
I am trying to build a regression tree with 70 attributes where the business team wants to fix the first two levels namely country and product type.To achieve this,I have two proposals:
1.Build a separate tree for each combination of country & product type and use subsets of the data accordingly and pass on to respective tree for prediction.Saw here in comments.I have 88 levels in country and 3 levels in product type so it will generate 264 trees.
2.Build a basic tree with two variables namely country and product type with appropriate cp value to generate all combination as leaf nodes(264).Build a second tree with rest all variables and stack tree one upon tree two as a single decision tree.
I don't think the first one is the right way to do.Also, struck on how to stack the trees in second approach, even if it is not the right way would love to know how to achieve this.
Please guide me to approach the problem.Thanks.
machine-learning r predictive-modeling decision-trees
$endgroup$
bumped to the homepage by Community♦ 23 mins ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
4
$begingroup$
Why do you not like the first method?
$endgroup$
– Hobbes
Nov 1 '16 at 14:58
$begingroup$
@Hobbes It will be hard to monitor and tune the performance of each tree.
$endgroup$
– Aravind
Nov 2 '16 at 0:46
1
$begingroup$
What is the business problem? I had a similar case. We wanted the best set of prospects to target for each country/product group. The business felt that prospects in say South Africa for product A are very different from prospects in South Korea for product B. I could argue the merits of different marketing campaigns/messages/etc but that is the business's decision. I did not look at it as fixing the first 2 levels of the tree or any unnatural adjustments to an algorithm. I looked at it as how to find the best set of prospects for each country/product combination. Where I did not have enough d
$endgroup$
– Craig
Mar 3 '17 at 10:09
$begingroup$
@Aravind If you are worried about the tuning of each tree in Approach 1 then I would caution you that you might not be on the right track. Your decision to, essentially, hard-code the first two levels should be based on some business rules. If your intent is to keep the algorithm fixed then, are you really writing an algorithm? Are you not introducing a form of bias into your overall model? I would only be comfortable in proceeding if these choices were hard-coded and would rarely change. Otherwise you need to push back on the business and make them aware of the potential bias.
$endgroup$
– I_Play_With_Data
Oct 25 '18 at 18:02
add a comment |
$begingroup$
I am trying to build a regression tree with 70 attributes where the business team wants to fix the first two levels namely country and product type.To achieve this,I have two proposals:
1.Build a separate tree for each combination of country & product type and use subsets of the data accordingly and pass on to respective tree for prediction.Saw here in comments.I have 88 levels in country and 3 levels in product type so it will generate 264 trees.
2.Build a basic tree with two variables namely country and product type with appropriate cp value to generate all combination as leaf nodes(264).Build a second tree with rest all variables and stack tree one upon tree two as a single decision tree.
I don't think the first one is the right way to do.Also, struck on how to stack the trees in second approach, even if it is not the right way would love to know how to achieve this.
Please guide me to approach the problem.Thanks.
machine-learning r predictive-modeling decision-trees
$endgroup$
I am trying to build a regression tree with 70 attributes where the business team wants to fix the first two levels namely country and product type.To achieve this,I have two proposals:
1.Build a separate tree for each combination of country & product type and use subsets of the data accordingly and pass on to respective tree for prediction.Saw here in comments.I have 88 levels in country and 3 levels in product type so it will generate 264 trees.
2.Build a basic tree with two variables namely country and product type with appropriate cp value to generate all combination as leaf nodes(264).Build a second tree with rest all variables and stack tree one upon tree two as a single decision tree.
I don't think the first one is the right way to do.Also, struck on how to stack the trees in second approach, even if it is not the right way would love to know how to achieve this.
Please guide me to approach the problem.Thanks.
machine-learning r predictive-modeling decision-trees
machine-learning r predictive-modeling decision-trees
edited May 23 '17 at 12:38
Community♦
1
1
asked Nov 1 '16 at 12:03
AravindAravind
162
162
bumped to the homepage by Community♦ 23 mins ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
bumped to the homepage by Community♦ 23 mins ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
4
$begingroup$
Why do you not like the first method?
$endgroup$
– Hobbes
Nov 1 '16 at 14:58
$begingroup$
@Hobbes It will be hard to monitor and tune the performance of each tree.
$endgroup$
– Aravind
Nov 2 '16 at 0:46
1
$begingroup$
What is the business problem? I had a similar case. We wanted the best set of prospects to target for each country/product group. The business felt that prospects in say South Africa for product A are very different from prospects in South Korea for product B. I could argue the merits of different marketing campaigns/messages/etc but that is the business's decision. I did not look at it as fixing the first 2 levels of the tree or any unnatural adjustments to an algorithm. I looked at it as how to find the best set of prospects for each country/product combination. Where I did not have enough d
$endgroup$
– Craig
Mar 3 '17 at 10:09
$begingroup$
@Aravind If you are worried about the tuning of each tree in Approach 1 then I would caution you that you might not be on the right track. Your decision to, essentially, hard-code the first two levels should be based on some business rules. If your intent is to keep the algorithm fixed then, are you really writing an algorithm? Are you not introducing a form of bias into your overall model? I would only be comfortable in proceeding if these choices were hard-coded and would rarely change. Otherwise you need to push back on the business and make them aware of the potential bias.
$endgroup$
– I_Play_With_Data
Oct 25 '18 at 18:02
add a comment |
4
$begingroup$
Why do you not like the first method?
$endgroup$
– Hobbes
Nov 1 '16 at 14:58
$begingroup$
@Hobbes It will be hard to monitor and tune the performance of each tree.
$endgroup$
– Aravind
Nov 2 '16 at 0:46
1
$begingroup$
What is the business problem? I had a similar case. We wanted the best set of prospects to target for each country/product group. The business felt that prospects in say South Africa for product A are very different from prospects in South Korea for product B. I could argue the merits of different marketing campaigns/messages/etc but that is the business's decision. I did not look at it as fixing the first 2 levels of the tree or any unnatural adjustments to an algorithm. I looked at it as how to find the best set of prospects for each country/product combination. Where I did not have enough d
$endgroup$
– Craig
Mar 3 '17 at 10:09
$begingroup$
@Aravind If you are worried about the tuning of each tree in Approach 1 then I would caution you that you might not be on the right track. Your decision to, essentially, hard-code the first two levels should be based on some business rules. If your intent is to keep the algorithm fixed then, are you really writing an algorithm? Are you not introducing a form of bias into your overall model? I would only be comfortable in proceeding if these choices were hard-coded and would rarely change. Otherwise you need to push back on the business and make them aware of the potential bias.
$endgroup$
– I_Play_With_Data
Oct 25 '18 at 18:02
4
4
$begingroup$
Why do you not like the first method?
$endgroup$
– Hobbes
Nov 1 '16 at 14:58
$begingroup$
Why do you not like the first method?
$endgroup$
– Hobbes
Nov 1 '16 at 14:58
$begingroup$
@Hobbes It will be hard to monitor and tune the performance of each tree.
$endgroup$
– Aravind
Nov 2 '16 at 0:46
$begingroup$
@Hobbes It will be hard to monitor and tune the performance of each tree.
$endgroup$
– Aravind
Nov 2 '16 at 0:46
1
1
$begingroup$
What is the business problem? I had a similar case. We wanted the best set of prospects to target for each country/product group. The business felt that prospects in say South Africa for product A are very different from prospects in South Korea for product B. I could argue the merits of different marketing campaigns/messages/etc but that is the business's decision. I did not look at it as fixing the first 2 levels of the tree or any unnatural adjustments to an algorithm. I looked at it as how to find the best set of prospects for each country/product combination. Where I did not have enough d
$endgroup$
– Craig
Mar 3 '17 at 10:09
$begingroup$
What is the business problem? I had a similar case. We wanted the best set of prospects to target for each country/product group. The business felt that prospects in say South Africa for product A are very different from prospects in South Korea for product B. I could argue the merits of different marketing campaigns/messages/etc but that is the business's decision. I did not look at it as fixing the first 2 levels of the tree or any unnatural adjustments to an algorithm. I looked at it as how to find the best set of prospects for each country/product combination. Where I did not have enough d
$endgroup$
– Craig
Mar 3 '17 at 10:09
$begingroup$
@Aravind If you are worried about the tuning of each tree in Approach 1 then I would caution you that you might not be on the right track. Your decision to, essentially, hard-code the first two levels should be based on some business rules. If your intent is to keep the algorithm fixed then, are you really writing an algorithm? Are you not introducing a form of bias into your overall model? I would only be comfortable in proceeding if these choices were hard-coded and would rarely change. Otherwise you need to push back on the business and make them aware of the potential bias.
$endgroup$
– I_Play_With_Data
Oct 25 '18 at 18:02
$begingroup$
@Aravind If you are worried about the tuning of each tree in Approach 1 then I would caution you that you might not be on the right track. Your decision to, essentially, hard-code the first two levels should be based on some business rules. If your intent is to keep the algorithm fixed then, are you really writing an algorithm? Are you not introducing a form of bias into your overall model? I would only be comfortable in proceeding if these choices were hard-coded and would rarely change. Otherwise you need to push back on the business and make them aware of the potential bias.
$endgroup$
– I_Play_With_Data
Oct 25 '18 at 18:02
add a comment |
2 Answers
2
active
oldest
votes
$begingroup$
Depending which tree algorithm you want to use you could manually construct the two first levels of the tree. You can just follow the pseudo code explained for example here for the C4.5 tree. Once you have done this you can remove the two features from the data set and create trees for the remaining part of the tree. If you want to create a rpart
object you would be required to take some parts of the source and this may be a bit more demanding. Depending on what tree algorithm you use you will just have a binary split at both levels so you will only need to build 4 separate trees and not 264. Note that you may not have the optimal decision tree since after stepping through the first two levels, the country and product type may still be variables that cause a split. But without seeing the data is impossible to tell.
Side note, it may be valuable to explain the business that country and product type are not the most sensible variables to have in the top of the decision tree. Sometimes it is better to educate the end users than to force machine learning to do something inaccurate. In my experience end users prefer to have a correct solution than a solution that works because people have a gut feeling that it should be in a certain way.
$endgroup$
$begingroup$
I have 88 levels in country and 3 levels in product type so it will be 264 trees.if there are only 4 separate trees then i will take the easy option namely the first choice.I feel it will be easier to convince end user when i have the results for both what they want and the correct way of solving the problem. Can you help me find reference material for stacking two trees after built completely?
$endgroup$
– Aravind
Nov 3 '16 at 1:05
$begingroup$
What you could do is calculate the entropy or gini for country, product type and the first element that gets selected by CHAID and C4.5. Educate users on these metrics. If that fails you can always go back. Additionally when you run a binary decision tree the first splits will lump countries or/and products together so at a minimum 4 subtrees.
$endgroup$
– Stereo
Nov 3 '16 at 10:12
add a comment |
$begingroup$
I think you could do this fairly automatically if you're open to using Python. A library called auto_ml* has a feature called categorical ensembling, where you can explicitly say "I want a model built for each level of this feature". If you made a feature that was country-product type and used that as your category, the rest should be pretty easy.
*Disclosure: I've made minor contributions to auto_ml. It is FOSS under the MIT license.
$endgroup$
add a comment |
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f14864%2ffix-first-two-levels-of-decision-tree%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
Depending which tree algorithm you want to use you could manually construct the two first levels of the tree. You can just follow the pseudo code explained for example here for the C4.5 tree. Once you have done this you can remove the two features from the data set and create trees for the remaining part of the tree. If you want to create a rpart
object you would be required to take some parts of the source and this may be a bit more demanding. Depending on what tree algorithm you use you will just have a binary split at both levels so you will only need to build 4 separate trees and not 264. Note that you may not have the optimal decision tree since after stepping through the first two levels, the country and product type may still be variables that cause a split. But without seeing the data is impossible to tell.
Side note, it may be valuable to explain the business that country and product type are not the most sensible variables to have in the top of the decision tree. Sometimes it is better to educate the end users than to force machine learning to do something inaccurate. In my experience end users prefer to have a correct solution than a solution that works because people have a gut feeling that it should be in a certain way.
$endgroup$
$begingroup$
I have 88 levels in country and 3 levels in product type so it will be 264 trees.if there are only 4 separate trees then i will take the easy option namely the first choice.I feel it will be easier to convince end user when i have the results for both what they want and the correct way of solving the problem. Can you help me find reference material for stacking two trees after built completely?
$endgroup$
– Aravind
Nov 3 '16 at 1:05
$begingroup$
What you could do is calculate the entropy or gini for country, product type and the first element that gets selected by CHAID and C4.5. Educate users on these metrics. If that fails you can always go back. Additionally when you run a binary decision tree the first splits will lump countries or/and products together so at a minimum 4 subtrees.
$endgroup$
– Stereo
Nov 3 '16 at 10:12
add a comment |
$begingroup$
Depending which tree algorithm you want to use you could manually construct the two first levels of the tree. You can just follow the pseudo code explained for example here for the C4.5 tree. Once you have done this you can remove the two features from the data set and create trees for the remaining part of the tree. If you want to create a rpart
object you would be required to take some parts of the source and this may be a bit more demanding. Depending on what tree algorithm you use you will just have a binary split at both levels so you will only need to build 4 separate trees and not 264. Note that you may not have the optimal decision tree since after stepping through the first two levels, the country and product type may still be variables that cause a split. But without seeing the data is impossible to tell.
Side note, it may be valuable to explain the business that country and product type are not the most sensible variables to have in the top of the decision tree. Sometimes it is better to educate the end users than to force machine learning to do something inaccurate. In my experience end users prefer to have a correct solution than a solution that works because people have a gut feeling that it should be in a certain way.
$endgroup$
$begingroup$
I have 88 levels in country and 3 levels in product type so it will be 264 trees.if there are only 4 separate trees then i will take the easy option namely the first choice.I feel it will be easier to convince end user when i have the results for both what they want and the correct way of solving the problem. Can you help me find reference material for stacking two trees after built completely?
$endgroup$
– Aravind
Nov 3 '16 at 1:05
$begingroup$
What you could do is calculate the entropy or gini for country, product type and the first element that gets selected by CHAID and C4.5. Educate users on these metrics. If that fails you can always go back. Additionally when you run a binary decision tree the first splits will lump countries or/and products together so at a minimum 4 subtrees.
$endgroup$
– Stereo
Nov 3 '16 at 10:12
add a comment |
$begingroup$
Depending which tree algorithm you want to use you could manually construct the two first levels of the tree. You can just follow the pseudo code explained for example here for the C4.5 tree. Once you have done this you can remove the two features from the data set and create trees for the remaining part of the tree. If you want to create a rpart
object you would be required to take some parts of the source and this may be a bit more demanding. Depending on what tree algorithm you use you will just have a binary split at both levels so you will only need to build 4 separate trees and not 264. Note that you may not have the optimal decision tree since after stepping through the first two levels, the country and product type may still be variables that cause a split. But without seeing the data is impossible to tell.
Side note, it may be valuable to explain the business that country and product type are not the most sensible variables to have in the top of the decision tree. Sometimes it is better to educate the end users than to force machine learning to do something inaccurate. In my experience end users prefer to have a correct solution than a solution that works because people have a gut feeling that it should be in a certain way.
$endgroup$
Depending which tree algorithm you want to use you could manually construct the two first levels of the tree. You can just follow the pseudo code explained for example here for the C4.5 tree. Once you have done this you can remove the two features from the data set and create trees for the remaining part of the tree. If you want to create a rpart
object you would be required to take some parts of the source and this may be a bit more demanding. Depending on what tree algorithm you use you will just have a binary split at both levels so you will only need to build 4 separate trees and not 264. Note that you may not have the optimal decision tree since after stepping through the first two levels, the country and product type may still be variables that cause a split. But without seeing the data is impossible to tell.
Side note, it may be valuable to explain the business that country and product type are not the most sensible variables to have in the top of the decision tree. Sometimes it is better to educate the end users than to force machine learning to do something inaccurate. In my experience end users prefer to have a correct solution than a solution that works because people have a gut feeling that it should be in a certain way.
answered Nov 2 '16 at 11:47
StereoStereo
1,303423
1,303423
$begingroup$
I have 88 levels in country and 3 levels in product type so it will be 264 trees.if there are only 4 separate trees then i will take the easy option namely the first choice.I feel it will be easier to convince end user when i have the results for both what they want and the correct way of solving the problem. Can you help me find reference material for stacking two trees after built completely?
$endgroup$
– Aravind
Nov 3 '16 at 1:05
$begingroup$
What you could do is calculate the entropy or gini for country, product type and the first element that gets selected by CHAID and C4.5. Educate users on these metrics. If that fails you can always go back. Additionally when you run a binary decision tree the first splits will lump countries or/and products together so at a minimum 4 subtrees.
$endgroup$
– Stereo
Nov 3 '16 at 10:12
add a comment |
$begingroup$
I have 88 levels in country and 3 levels in product type so it will be 264 trees.if there are only 4 separate trees then i will take the easy option namely the first choice.I feel it will be easier to convince end user when i have the results for both what they want and the correct way of solving the problem. Can you help me find reference material for stacking two trees after built completely?
$endgroup$
– Aravind
Nov 3 '16 at 1:05
$begingroup$
What you could do is calculate the entropy or gini for country, product type and the first element that gets selected by CHAID and C4.5. Educate users on these metrics. If that fails you can always go back. Additionally when you run a binary decision tree the first splits will lump countries or/and products together so at a minimum 4 subtrees.
$endgroup$
– Stereo
Nov 3 '16 at 10:12
$begingroup$
I have 88 levels in country and 3 levels in product type so it will be 264 trees.if there are only 4 separate trees then i will take the easy option namely the first choice.I feel it will be easier to convince end user when i have the results for both what they want and the correct way of solving the problem. Can you help me find reference material for stacking two trees after built completely?
$endgroup$
– Aravind
Nov 3 '16 at 1:05
$begingroup$
I have 88 levels in country and 3 levels in product type so it will be 264 trees.if there are only 4 separate trees then i will take the easy option namely the first choice.I feel it will be easier to convince end user when i have the results for both what they want and the correct way of solving the problem. Can you help me find reference material for stacking two trees after built completely?
$endgroup$
– Aravind
Nov 3 '16 at 1:05
$begingroup$
What you could do is calculate the entropy or gini for country, product type and the first element that gets selected by CHAID and C4.5. Educate users on these metrics. If that fails you can always go back. Additionally when you run a binary decision tree the first splits will lump countries or/and products together so at a minimum 4 subtrees.
$endgroup$
– Stereo
Nov 3 '16 at 10:12
$begingroup$
What you could do is calculate the entropy or gini for country, product type and the first element that gets selected by CHAID and C4.5. Educate users on these metrics. If that fails you can always go back. Additionally when you run a binary decision tree the first splits will lump countries or/and products together so at a minimum 4 subtrees.
$endgroup$
– Stereo
Nov 3 '16 at 10:12
add a comment |
$begingroup$
I think you could do this fairly automatically if you're open to using Python. A library called auto_ml* has a feature called categorical ensembling, where you can explicitly say "I want a model built for each level of this feature". If you made a feature that was country-product type and used that as your category, the rest should be pretty easy.
*Disclosure: I've made minor contributions to auto_ml. It is FOSS under the MIT license.
$endgroup$
add a comment |
$begingroup$
I think you could do this fairly automatically if you're open to using Python. A library called auto_ml* has a feature called categorical ensembling, where you can explicitly say "I want a model built for each level of this feature". If you made a feature that was country-product type and used that as your category, the rest should be pretty easy.
*Disclosure: I've made minor contributions to auto_ml. It is FOSS under the MIT license.
$endgroup$
add a comment |
$begingroup$
I think you could do this fairly automatically if you're open to using Python. A library called auto_ml* has a feature called categorical ensembling, where you can explicitly say "I want a model built for each level of this feature". If you made a feature that was country-product type and used that as your category, the rest should be pretty easy.
*Disclosure: I've made minor contributions to auto_ml. It is FOSS under the MIT license.
$endgroup$
I think you could do this fairly automatically if you're open to using Python. A library called auto_ml* has a feature called categorical ensembling, where you can explicitly say "I want a model built for each level of this feature". If you made a feature that was country-product type and used that as your category, the rest should be pretty easy.
*Disclosure: I've made minor contributions to auto_ml. It is FOSS under the MIT license.
answered Jun 1 '17 at 12:26
CalZCalZ
1,438213
1,438213
add a comment |
add a comment |
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f14864%2ffix-first-two-levels-of-decision-tree%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
4
$begingroup$
Why do you not like the first method?
$endgroup$
– Hobbes
Nov 1 '16 at 14:58
$begingroup$
@Hobbes It will be hard to monitor and tune the performance of each tree.
$endgroup$
– Aravind
Nov 2 '16 at 0:46
1
$begingroup$
What is the business problem? I had a similar case. We wanted the best set of prospects to target for each country/product group. The business felt that prospects in say South Africa for product A are very different from prospects in South Korea for product B. I could argue the merits of different marketing campaigns/messages/etc but that is the business's decision. I did not look at it as fixing the first 2 levels of the tree or any unnatural adjustments to an algorithm. I looked at it as how to find the best set of prospects for each country/product combination. Where I did not have enough d
$endgroup$
– Craig
Mar 3 '17 at 10:09
$begingroup$
@Aravind If you are worried about the tuning of each tree in Approach 1 then I would caution you that you might not be on the right track. Your decision to, essentially, hard-code the first two levels should be based on some business rules. If your intent is to keep the algorithm fixed then, are you really writing an algorithm? Are you not introducing a form of bias into your overall model? I would only be comfortable in proceeding if these choices were hard-coded and would rarely change. Otherwise you need to push back on the business and make them aware of the potential bias.
$endgroup$
– I_Play_With_Data
Oct 25 '18 at 18:02