fix first two levels of decision tree? Unicorn Meta Zoo #1: Why another podcast? Announcing the arrival of Valued Associate #679: Cesar Manara 2019 Moderator Election Q&A - Questionnaire 2019 Community Moderator Election ResultsUsing machine learning specifically for feature analysis, not predictionsHow to interpret a decision tree correctly?Predict a tree structure out of nodes with different featuresUnderstanding decision tree conceptDecision tree orderingMulticollinearity in Decision TreeR vs. Python Decision TreeDisadvantage of decision treeRegression Decision Tree - Normalize or Split into Ranges a continuos featureDecision tree to get difference in rates in two groups?

What is ls Largest Number Formed by only moving two sticks in 508?

Arriving in Atlanta (after US Preclearance in Dublin). Will I go through TSA security in Atlanta to transfer to a connecting flight?

Why isPrototypeOf() returns false?

Like totally amazing interchangeable sister outfit accessory swapping or whatever

Why isn't everyone flabbergasted about Bran's "gift"?

How would it unbalance gameplay to rule that Weapon Master allows for picking a fighting style?

Is there a possibility to generate a list dynamically in Latex?

What is a 'Key' in computer science?

Where/What are Arya's scars from?

How can I wire a 9-position switch so that each position turns on one more LED than the one before?

What happened to Viserion in Season 7?

Has a Nobel Peace laureate ever been accused of war crimes?

Where to find documentation for `whois` command options?

Are these square matrices always diagonalisable?

What to do with someone that cheated their way though university and a PhD program?

Processing ADC conversion result: DMA vs Processor Registers

What is the numbering system used for the DSN dishes?

All ASCII characters with a given bit count

Was Objective-C really a hindrance to Apple software development?

How did Elite on the NES work?

Getting AggregateResult variables from Execute Anonymous Window

Israeli soda type drink

What is the evidence that custom checks in Northern Ireland are going to result in violence?

Why is arima in R one time step off?



fix first two levels of decision tree?



Unicorn Meta Zoo #1: Why another podcast?
Announcing the arrival of Valued Associate #679: Cesar Manara
2019 Moderator Election Q&A - Questionnaire
2019 Community Moderator Election ResultsUsing machine learning specifically for feature analysis, not predictionsHow to interpret a decision tree correctly?Predict a tree structure out of nodes with different featuresUnderstanding decision tree conceptDecision tree orderingMulticollinearity in Decision TreeR vs. Python Decision TreeDisadvantage of decision treeRegression Decision Tree - Normalize or Split into Ranges a continuos featureDecision tree to get difference in rates in two groups?










3












$begingroup$


I am trying to build a regression tree with 70 attributes where the business team wants to fix the first two levels namely country and product type.To achieve this,I have two proposals:



1.Build a separate tree for each combination of country & product type and use subsets of the data accordingly and pass on to respective tree for prediction.Saw here in comments.I have 88 levels in country and 3 levels in product type so it will generate 264 trees.



2.Build a basic tree with two variables namely country and product type with appropriate cp value to generate all combination as leaf nodes(264).Build a second tree with rest all variables and stack tree one upon tree two as a single decision tree.



I don't think the first one is the right way to do.Also, struck on how to stack the trees in second approach, even if it is not the right way would love to know how to achieve this.



Please guide me to approach the problem.Thanks.










share|improve this question











$endgroup$




bumped to the homepage by Community 23 mins ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.










  • 4




    $begingroup$
    Why do you not like the first method?
    $endgroup$
    – Hobbes
    Nov 1 '16 at 14:58










  • $begingroup$
    @Hobbes It will be hard to monitor and tune the performance of each tree.
    $endgroup$
    – Aravind
    Nov 2 '16 at 0:46






  • 1




    $begingroup$
    What is the business problem? I had a similar case. We wanted the best set of prospects to target for each country/product group. The business felt that prospects in say South Africa for product A are very different from prospects in South Korea for product B. I could argue the merits of different marketing campaigns/messages/etc but that is the business's decision. I did not look at it as fixing the first 2 levels of the tree or any unnatural adjustments to an algorithm. I looked at it as how to find the best set of prospects for each country/product combination. Where I did not have enough d
    $endgroup$
    – Craig
    Mar 3 '17 at 10:09










  • $begingroup$
    @Aravind If you are worried about the tuning of each tree in Approach 1 then I would caution you that you might not be on the right track. Your decision to, essentially, hard-code the first two levels should be based on some business rules. If your intent is to keep the algorithm fixed then, are you really writing an algorithm? Are you not introducing a form of bias into your overall model? I would only be comfortable in proceeding if these choices were hard-coded and would rarely change. Otherwise you need to push back on the business and make them aware of the potential bias.
    $endgroup$
    – I_Play_With_Data
    Oct 25 '18 at 18:02
















3












$begingroup$


I am trying to build a regression tree with 70 attributes where the business team wants to fix the first two levels namely country and product type.To achieve this,I have two proposals:



1.Build a separate tree for each combination of country & product type and use subsets of the data accordingly and pass on to respective tree for prediction.Saw here in comments.I have 88 levels in country and 3 levels in product type so it will generate 264 trees.



2.Build a basic tree with two variables namely country and product type with appropriate cp value to generate all combination as leaf nodes(264).Build a second tree with rest all variables and stack tree one upon tree two as a single decision tree.



I don't think the first one is the right way to do.Also, struck on how to stack the trees in second approach, even if it is not the right way would love to know how to achieve this.



Please guide me to approach the problem.Thanks.










share|improve this question











$endgroup$




bumped to the homepage by Community 23 mins ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.










  • 4




    $begingroup$
    Why do you not like the first method?
    $endgroup$
    – Hobbes
    Nov 1 '16 at 14:58










  • $begingroup$
    @Hobbes It will be hard to monitor and tune the performance of each tree.
    $endgroup$
    – Aravind
    Nov 2 '16 at 0:46






  • 1




    $begingroup$
    What is the business problem? I had a similar case. We wanted the best set of prospects to target for each country/product group. The business felt that prospects in say South Africa for product A are very different from prospects in South Korea for product B. I could argue the merits of different marketing campaigns/messages/etc but that is the business's decision. I did not look at it as fixing the first 2 levels of the tree or any unnatural adjustments to an algorithm. I looked at it as how to find the best set of prospects for each country/product combination. Where I did not have enough d
    $endgroup$
    – Craig
    Mar 3 '17 at 10:09










  • $begingroup$
    @Aravind If you are worried about the tuning of each tree in Approach 1 then I would caution you that you might not be on the right track. Your decision to, essentially, hard-code the first two levels should be based on some business rules. If your intent is to keep the algorithm fixed then, are you really writing an algorithm? Are you not introducing a form of bias into your overall model? I would only be comfortable in proceeding if these choices were hard-coded and would rarely change. Otherwise you need to push back on the business and make them aware of the potential bias.
    $endgroup$
    – I_Play_With_Data
    Oct 25 '18 at 18:02














3












3








3





$begingroup$


I am trying to build a regression tree with 70 attributes where the business team wants to fix the first two levels namely country and product type.To achieve this,I have two proposals:



1.Build a separate tree for each combination of country & product type and use subsets of the data accordingly and pass on to respective tree for prediction.Saw here in comments.I have 88 levels in country and 3 levels in product type so it will generate 264 trees.



2.Build a basic tree with two variables namely country and product type with appropriate cp value to generate all combination as leaf nodes(264).Build a second tree with rest all variables and stack tree one upon tree two as a single decision tree.



I don't think the first one is the right way to do.Also, struck on how to stack the trees in second approach, even if it is not the right way would love to know how to achieve this.



Please guide me to approach the problem.Thanks.










share|improve this question











$endgroup$




I am trying to build a regression tree with 70 attributes where the business team wants to fix the first two levels namely country and product type.To achieve this,I have two proposals:



1.Build a separate tree for each combination of country & product type and use subsets of the data accordingly and pass on to respective tree for prediction.Saw here in comments.I have 88 levels in country and 3 levels in product type so it will generate 264 trees.



2.Build a basic tree with two variables namely country and product type with appropriate cp value to generate all combination as leaf nodes(264).Build a second tree with rest all variables and stack tree one upon tree two as a single decision tree.



I don't think the first one is the right way to do.Also, struck on how to stack the trees in second approach, even if it is not the right way would love to know how to achieve this.



Please guide me to approach the problem.Thanks.







machine-learning r predictive-modeling decision-trees






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited May 23 '17 at 12:38









Community

1




1










asked Nov 1 '16 at 12:03









AravindAravind

162




162





bumped to the homepage by Community 23 mins ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.







bumped to the homepage by Community 23 mins ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.









  • 4




    $begingroup$
    Why do you not like the first method?
    $endgroup$
    – Hobbes
    Nov 1 '16 at 14:58










  • $begingroup$
    @Hobbes It will be hard to monitor and tune the performance of each tree.
    $endgroup$
    – Aravind
    Nov 2 '16 at 0:46






  • 1




    $begingroup$
    What is the business problem? I had a similar case. We wanted the best set of prospects to target for each country/product group. The business felt that prospects in say South Africa for product A are very different from prospects in South Korea for product B. I could argue the merits of different marketing campaigns/messages/etc but that is the business's decision. I did not look at it as fixing the first 2 levels of the tree or any unnatural adjustments to an algorithm. I looked at it as how to find the best set of prospects for each country/product combination. Where I did not have enough d
    $endgroup$
    – Craig
    Mar 3 '17 at 10:09










  • $begingroup$
    @Aravind If you are worried about the tuning of each tree in Approach 1 then I would caution you that you might not be on the right track. Your decision to, essentially, hard-code the first two levels should be based on some business rules. If your intent is to keep the algorithm fixed then, are you really writing an algorithm? Are you not introducing a form of bias into your overall model? I would only be comfortable in proceeding if these choices were hard-coded and would rarely change. Otherwise you need to push back on the business and make them aware of the potential bias.
    $endgroup$
    – I_Play_With_Data
    Oct 25 '18 at 18:02













  • 4




    $begingroup$
    Why do you not like the first method?
    $endgroup$
    – Hobbes
    Nov 1 '16 at 14:58










  • $begingroup$
    @Hobbes It will be hard to monitor and tune the performance of each tree.
    $endgroup$
    – Aravind
    Nov 2 '16 at 0:46






  • 1




    $begingroup$
    What is the business problem? I had a similar case. We wanted the best set of prospects to target for each country/product group. The business felt that prospects in say South Africa for product A are very different from prospects in South Korea for product B. I could argue the merits of different marketing campaigns/messages/etc but that is the business's decision. I did not look at it as fixing the first 2 levels of the tree or any unnatural adjustments to an algorithm. I looked at it as how to find the best set of prospects for each country/product combination. Where I did not have enough d
    $endgroup$
    – Craig
    Mar 3 '17 at 10:09










  • $begingroup$
    @Aravind If you are worried about the tuning of each tree in Approach 1 then I would caution you that you might not be on the right track. Your decision to, essentially, hard-code the first two levels should be based on some business rules. If your intent is to keep the algorithm fixed then, are you really writing an algorithm? Are you not introducing a form of bias into your overall model? I would only be comfortable in proceeding if these choices were hard-coded and would rarely change. Otherwise you need to push back on the business and make them aware of the potential bias.
    $endgroup$
    – I_Play_With_Data
    Oct 25 '18 at 18:02








4




4




$begingroup$
Why do you not like the first method?
$endgroup$
– Hobbes
Nov 1 '16 at 14:58




$begingroup$
Why do you not like the first method?
$endgroup$
– Hobbes
Nov 1 '16 at 14:58












$begingroup$
@Hobbes It will be hard to monitor and tune the performance of each tree.
$endgroup$
– Aravind
Nov 2 '16 at 0:46




$begingroup$
@Hobbes It will be hard to monitor and tune the performance of each tree.
$endgroup$
– Aravind
Nov 2 '16 at 0:46




1




1




$begingroup$
What is the business problem? I had a similar case. We wanted the best set of prospects to target for each country/product group. The business felt that prospects in say South Africa for product A are very different from prospects in South Korea for product B. I could argue the merits of different marketing campaigns/messages/etc but that is the business's decision. I did not look at it as fixing the first 2 levels of the tree or any unnatural adjustments to an algorithm. I looked at it as how to find the best set of prospects for each country/product combination. Where I did not have enough d
$endgroup$
– Craig
Mar 3 '17 at 10:09




$begingroup$
What is the business problem? I had a similar case. We wanted the best set of prospects to target for each country/product group. The business felt that prospects in say South Africa for product A are very different from prospects in South Korea for product B. I could argue the merits of different marketing campaigns/messages/etc but that is the business's decision. I did not look at it as fixing the first 2 levels of the tree or any unnatural adjustments to an algorithm. I looked at it as how to find the best set of prospects for each country/product combination. Where I did not have enough d
$endgroup$
– Craig
Mar 3 '17 at 10:09












$begingroup$
@Aravind If you are worried about the tuning of each tree in Approach 1 then I would caution you that you might not be on the right track. Your decision to, essentially, hard-code the first two levels should be based on some business rules. If your intent is to keep the algorithm fixed then, are you really writing an algorithm? Are you not introducing a form of bias into your overall model? I would only be comfortable in proceeding if these choices were hard-coded and would rarely change. Otherwise you need to push back on the business and make them aware of the potential bias.
$endgroup$
– I_Play_With_Data
Oct 25 '18 at 18:02





$begingroup$
@Aravind If you are worried about the tuning of each tree in Approach 1 then I would caution you that you might not be on the right track. Your decision to, essentially, hard-code the first two levels should be based on some business rules. If your intent is to keep the algorithm fixed then, are you really writing an algorithm? Are you not introducing a form of bias into your overall model? I would only be comfortable in proceeding if these choices were hard-coded and would rarely change. Otherwise you need to push back on the business and make them aware of the potential bias.
$endgroup$
– I_Play_With_Data
Oct 25 '18 at 18:02











2 Answers
2






active

oldest

votes


















0












$begingroup$

Depending which tree algorithm you want to use you could manually construct the two first levels of the tree. You can just follow the pseudo code explained for example here for the C4.5 tree. Once you have done this you can remove the two features from the data set and create trees for the remaining part of the tree. If you want to create a rpart object you would be required to take some parts of the source and this may be a bit more demanding. Depending on what tree algorithm you use you will just have a binary split at both levels so you will only need to build 4 separate trees and not 264. Note that you may not have the optimal decision tree since after stepping through the first two levels, the country and product type may still be variables that cause a split. But without seeing the data is impossible to tell.



Side note, it may be valuable to explain the business that country and product type are not the most sensible variables to have in the top of the decision tree. Sometimes it is better to educate the end users than to force machine learning to do something inaccurate. In my experience end users prefer to have a correct solution than a solution that works because people have a gut feeling that it should be in a certain way.






share|improve this answer









$endgroup$












  • $begingroup$
    I have 88 levels in country and 3 levels in product type so it will be 264 trees.if there are only 4 separate trees then i will take the easy option namely the first choice.I feel it will be easier to convince end user when i have the results for both what they want and the correct way of solving the problem. Can you help me find reference material for stacking two trees after built completely?
    $endgroup$
    – Aravind
    Nov 3 '16 at 1:05











  • $begingroup$
    What you could do is calculate the entropy or gini for country, product type and the first element that gets selected by CHAID and C4.5. Educate users on these metrics. If that fails you can always go back. Additionally when you run a binary decision tree the first splits will lump countries or/and products together so at a minimum 4 subtrees.
    $endgroup$
    – Stereo
    Nov 3 '16 at 10:12


















0












$begingroup$

I think you could do this fairly automatically if you're open to using Python. A library called auto_ml* has a feature called categorical ensembling, where you can explicitly say "I want a model built for each level of this feature". If you made a feature that was country-product type and used that as your category, the rest should be pretty easy.



*Disclosure: I've made minor contributions to auto_ml. It is FOSS under the MIT license.






share|improve this answer









$endgroup$













    Your Answer








    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "557"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













    draft saved

    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f14864%2ffix-first-two-levels-of-decision-tree%23new-answer', 'question_page');

    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0












    $begingroup$

    Depending which tree algorithm you want to use you could manually construct the two first levels of the tree. You can just follow the pseudo code explained for example here for the C4.5 tree. Once you have done this you can remove the two features from the data set and create trees for the remaining part of the tree. If you want to create a rpart object you would be required to take some parts of the source and this may be a bit more demanding. Depending on what tree algorithm you use you will just have a binary split at both levels so you will only need to build 4 separate trees and not 264. Note that you may not have the optimal decision tree since after stepping through the first two levels, the country and product type may still be variables that cause a split. But without seeing the data is impossible to tell.



    Side note, it may be valuable to explain the business that country and product type are not the most sensible variables to have in the top of the decision tree. Sometimes it is better to educate the end users than to force machine learning to do something inaccurate. In my experience end users prefer to have a correct solution than a solution that works because people have a gut feeling that it should be in a certain way.






    share|improve this answer









    $endgroup$












    • $begingroup$
      I have 88 levels in country and 3 levels in product type so it will be 264 trees.if there are only 4 separate trees then i will take the easy option namely the first choice.I feel it will be easier to convince end user when i have the results for both what they want and the correct way of solving the problem. Can you help me find reference material for stacking two trees after built completely?
      $endgroup$
      – Aravind
      Nov 3 '16 at 1:05











    • $begingroup$
      What you could do is calculate the entropy or gini for country, product type and the first element that gets selected by CHAID and C4.5. Educate users on these metrics. If that fails you can always go back. Additionally when you run a binary decision tree the first splits will lump countries or/and products together so at a minimum 4 subtrees.
      $endgroup$
      – Stereo
      Nov 3 '16 at 10:12















    0












    $begingroup$

    Depending which tree algorithm you want to use you could manually construct the two first levels of the tree. You can just follow the pseudo code explained for example here for the C4.5 tree. Once you have done this you can remove the two features from the data set and create trees for the remaining part of the tree. If you want to create a rpart object you would be required to take some parts of the source and this may be a bit more demanding. Depending on what tree algorithm you use you will just have a binary split at both levels so you will only need to build 4 separate trees and not 264. Note that you may not have the optimal decision tree since after stepping through the first two levels, the country and product type may still be variables that cause a split. But without seeing the data is impossible to tell.



    Side note, it may be valuable to explain the business that country and product type are not the most sensible variables to have in the top of the decision tree. Sometimes it is better to educate the end users than to force machine learning to do something inaccurate. In my experience end users prefer to have a correct solution than a solution that works because people have a gut feeling that it should be in a certain way.






    share|improve this answer









    $endgroup$












    • $begingroup$
      I have 88 levels in country and 3 levels in product type so it will be 264 trees.if there are only 4 separate trees then i will take the easy option namely the first choice.I feel it will be easier to convince end user when i have the results for both what they want and the correct way of solving the problem. Can you help me find reference material for stacking two trees after built completely?
      $endgroup$
      – Aravind
      Nov 3 '16 at 1:05











    • $begingroup$
      What you could do is calculate the entropy or gini for country, product type and the first element that gets selected by CHAID and C4.5. Educate users on these metrics. If that fails you can always go back. Additionally when you run a binary decision tree the first splits will lump countries or/and products together so at a minimum 4 subtrees.
      $endgroup$
      – Stereo
      Nov 3 '16 at 10:12













    0












    0








    0





    $begingroup$

    Depending which tree algorithm you want to use you could manually construct the two first levels of the tree. You can just follow the pseudo code explained for example here for the C4.5 tree. Once you have done this you can remove the two features from the data set and create trees for the remaining part of the tree. If you want to create a rpart object you would be required to take some parts of the source and this may be a bit more demanding. Depending on what tree algorithm you use you will just have a binary split at both levels so you will only need to build 4 separate trees and not 264. Note that you may not have the optimal decision tree since after stepping through the first two levels, the country and product type may still be variables that cause a split. But without seeing the data is impossible to tell.



    Side note, it may be valuable to explain the business that country and product type are not the most sensible variables to have in the top of the decision tree. Sometimes it is better to educate the end users than to force machine learning to do something inaccurate. In my experience end users prefer to have a correct solution than a solution that works because people have a gut feeling that it should be in a certain way.






    share|improve this answer









    $endgroup$



    Depending which tree algorithm you want to use you could manually construct the two first levels of the tree. You can just follow the pseudo code explained for example here for the C4.5 tree. Once you have done this you can remove the two features from the data set and create trees for the remaining part of the tree. If you want to create a rpart object you would be required to take some parts of the source and this may be a bit more demanding. Depending on what tree algorithm you use you will just have a binary split at both levels so you will only need to build 4 separate trees and not 264. Note that you may not have the optimal decision tree since after stepping through the first two levels, the country and product type may still be variables that cause a split. But without seeing the data is impossible to tell.



    Side note, it may be valuable to explain the business that country and product type are not the most sensible variables to have in the top of the decision tree. Sometimes it is better to educate the end users than to force machine learning to do something inaccurate. In my experience end users prefer to have a correct solution than a solution that works because people have a gut feeling that it should be in a certain way.







    share|improve this answer












    share|improve this answer



    share|improve this answer










    answered Nov 2 '16 at 11:47









    StereoStereo

    1,303423




    1,303423











    • $begingroup$
      I have 88 levels in country and 3 levels in product type so it will be 264 trees.if there are only 4 separate trees then i will take the easy option namely the first choice.I feel it will be easier to convince end user when i have the results for both what they want and the correct way of solving the problem. Can you help me find reference material for stacking two trees after built completely?
      $endgroup$
      – Aravind
      Nov 3 '16 at 1:05











    • $begingroup$
      What you could do is calculate the entropy or gini for country, product type and the first element that gets selected by CHAID and C4.5. Educate users on these metrics. If that fails you can always go back. Additionally when you run a binary decision tree the first splits will lump countries or/and products together so at a minimum 4 subtrees.
      $endgroup$
      – Stereo
      Nov 3 '16 at 10:12
















    • $begingroup$
      I have 88 levels in country and 3 levels in product type so it will be 264 trees.if there are only 4 separate trees then i will take the easy option namely the first choice.I feel it will be easier to convince end user when i have the results for both what they want and the correct way of solving the problem. Can you help me find reference material for stacking two trees after built completely?
      $endgroup$
      – Aravind
      Nov 3 '16 at 1:05











    • $begingroup$
      What you could do is calculate the entropy or gini for country, product type and the first element that gets selected by CHAID and C4.5. Educate users on these metrics. If that fails you can always go back. Additionally when you run a binary decision tree the first splits will lump countries or/and products together so at a minimum 4 subtrees.
      $endgroup$
      – Stereo
      Nov 3 '16 at 10:12















    $begingroup$
    I have 88 levels in country and 3 levels in product type so it will be 264 trees.if there are only 4 separate trees then i will take the easy option namely the first choice.I feel it will be easier to convince end user when i have the results for both what they want and the correct way of solving the problem. Can you help me find reference material for stacking two trees after built completely?
    $endgroup$
    – Aravind
    Nov 3 '16 at 1:05





    $begingroup$
    I have 88 levels in country and 3 levels in product type so it will be 264 trees.if there are only 4 separate trees then i will take the easy option namely the first choice.I feel it will be easier to convince end user when i have the results for both what they want and the correct way of solving the problem. Can you help me find reference material for stacking two trees after built completely?
    $endgroup$
    – Aravind
    Nov 3 '16 at 1:05













    $begingroup$
    What you could do is calculate the entropy or gini for country, product type and the first element that gets selected by CHAID and C4.5. Educate users on these metrics. If that fails you can always go back. Additionally when you run a binary decision tree the first splits will lump countries or/and products together so at a minimum 4 subtrees.
    $endgroup$
    – Stereo
    Nov 3 '16 at 10:12




    $begingroup$
    What you could do is calculate the entropy or gini for country, product type and the first element that gets selected by CHAID and C4.5. Educate users on these metrics. If that fails you can always go back. Additionally when you run a binary decision tree the first splits will lump countries or/and products together so at a minimum 4 subtrees.
    $endgroup$
    – Stereo
    Nov 3 '16 at 10:12











    0












    $begingroup$

    I think you could do this fairly automatically if you're open to using Python. A library called auto_ml* has a feature called categorical ensembling, where you can explicitly say "I want a model built for each level of this feature". If you made a feature that was country-product type and used that as your category, the rest should be pretty easy.



    *Disclosure: I've made minor contributions to auto_ml. It is FOSS under the MIT license.






    share|improve this answer









    $endgroup$

















      0












      $begingroup$

      I think you could do this fairly automatically if you're open to using Python. A library called auto_ml* has a feature called categorical ensembling, where you can explicitly say "I want a model built for each level of this feature". If you made a feature that was country-product type and used that as your category, the rest should be pretty easy.



      *Disclosure: I've made minor contributions to auto_ml. It is FOSS under the MIT license.






      share|improve this answer









      $endgroup$















        0












        0








        0





        $begingroup$

        I think you could do this fairly automatically if you're open to using Python. A library called auto_ml* has a feature called categorical ensembling, where you can explicitly say "I want a model built for each level of this feature". If you made a feature that was country-product type and used that as your category, the rest should be pretty easy.



        *Disclosure: I've made minor contributions to auto_ml. It is FOSS under the MIT license.






        share|improve this answer









        $endgroup$



        I think you could do this fairly automatically if you're open to using Python. A library called auto_ml* has a feature called categorical ensembling, where you can explicitly say "I want a model built for each level of this feature". If you made a feature that was country-product type and used that as your category, the rest should be pretty easy.



        *Disclosure: I've made minor contributions to auto_ml. It is FOSS under the MIT license.







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Jun 1 '17 at 12:26









        CalZCalZ

        1,438213




        1,438213



























            draft saved

            draft discarded
















































            Thanks for contributing an answer to Data Science Stack Exchange!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            Use MathJax to format equations. MathJax reference.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f14864%2ffix-first-two-levels-of-decision-tree%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Францішак Багушэвіч Змест Сям'я | Біяграфія | Творчасць | Мова Багушэвіча | Ацэнкі дзейнасці | Цікавыя факты | Спадчына | Выбраная бібліяграфія | Ушанаванне памяці | У філатэліі | Зноскі | Літаратура | Спасылкі | НавігацыяЛяхоўскі У. Рупіўся дзеля Бога і людзей: Жыццёвы шлях Лявона Вітан-Дубейкаўскага // Вольскі і Памідораў з песняй пра немца Адвакат, паэт, народны заступнік Ашмянскі веснікВ Минске появится площадь Богушевича и улица Сырокомли, Белорусская деловая газета, 19 июля 2001 г.Айцец беларускай нацыянальнай ідэі паўстаў у бронзе Сяргей Аляксандравіч Адашкевіч (1918, Мінск). 80-я гады. Бюст «Францішак Багушэвіч».Яўген Мікалаевіч Ціхановіч. «Партрэт Францішка Багушэвіча»Мікола Мікалаевіч Купава. «Партрэт зачынальніка новай беларускай літаратуры Францішка Багушэвіча»Уладзімір Іванавіч Мелехаў. На помніку «Змагарам за родную мову» Барэльеф «Францішак Багушэвіч»Памяць пра Багушэвіча на Віленшчыне Страчаная сталіца. Беларускія шыльды на вуліцах Вільні«Krynica». Ideologia i przywódcy białoruskiego katolicyzmuФранцішак БагушэвічТворы на knihi.comТворы Францішка Багушэвіча на bellib.byСодаль Уладзімір. Францішак Багушэвіч на Лідчыне;Луцкевіч Антон. Жыцьцё і творчасьць Фр. Багушэвіча ў успамінах ягоных сучасьнікаў // Запісы Беларускага Навуковага таварыства. Вільня, 1938. Сшытак 1. С. 16-34.Большая российская1188761710000 0000 5537 633Xn9209310021619551927869394п

            Беларусь Змест Назва Гісторыя Геаграфія Сімволіка Дзяржаўны лад Палітычныя партыі Міжнароднае становішча і знешняя палітыка Адміністрацыйны падзел Насельніцтва Эканоміка Культура і грамадства Сацыяльная сфера Узброеныя сілы Заўвагі Літаратура Спасылкі НавігацыяHGЯOiТоп-2011 г. (па версіі ej.by)Топ-2013 г. (па версіі ej.by)Топ-2016 г. (па версіі ej.by)Топ-2017 г. (па версіі ej.by)Нацыянальны статыстычны камітэт Рэспублікі БеларусьШчыльнасць насельніцтва па краінахhttp://naviny.by/rubrics/society/2011/09/16/ic_articles_116_175144/А. Калечыц, У. Ксяндзоў. Спробы засялення краю неандэртальскім чалавекам.І ў Менску былі мамантыА. Калечыц, У. Ксяндзоў. Старажытны каменны век (палеаліт). Першапачатковае засяленне тэрыторыіГ. Штыхаў. Балты і славяне ў VI—VIII стст.М. Клімаў. Полацкае княства ў IX—XI стст.Г. Штыхаў, В. Ляўко. Палітычная гісторыя Полацкай зямліГ. Штыхаў. Дзяржаўны лад у землях-княствахГ. Штыхаў. Дзяржаўны лад у землях-княствахБеларускія землі ў складзе Вялікага Княства ЛітоўскагаЛюблінская унія 1569 г."The Early Stages of Independence"Zapomniane prawdy25 гадоў таму было аб'яўлена, што Язэп Пілсудскі — беларус (фота)Наша вадаДакументы ЧАЭС: Забруджванне тэрыторыі Беларусі « ЧАЭС Зона адчужэнняСведения о политических партиях, зарегистрированных в Республике Беларусь // Министерство юстиции Республики БеларусьСтатыстычны бюлетэнь „Полаўзроставая структура насельніцтва Рэспублікі Беларусь на 1 студзеня 2012 года і сярэднегадовая колькасць насельніцтва за 2011 год“Индекс человеческого развития Беларуси — не было бы нижеБеларусь занимает первое место в СНГ по индексу развития с учетом гендерного факцёраНацыянальны статыстычны камітэт Рэспублікі БеларусьКанстытуцыя РБ. Артыкул 17Трансфармацыйныя задачы БеларусіВыйсце з крызісу — далейшае рэфармаванне Беларускі рубель — сусветны лідар па дэвальвацыяхПра змену коштаў у кастрычніку 2011 г.Бядней за беларусаў у СНД толькі таджыкіСярэдні заробак у верасні дасягнуў 2,26 мільёна рублёўЭканомікаГаласуем за ТОП-100 беларускай прозыСучасныя беларускія мастакіАрхитектура Беларуси BELARUS.BYА. Каханоўскі. Культура Беларусі ўсярэдзіне XVII—XVIII ст.Анталогія беларускай народнай песні, гуказапісы спеваўБеларускія Музычныя IнструментыБеларускі рок, які мы страцілі. Топ-10 гуртоў«Мясцовы час» — нязгаслая легенда беларускай рок-музыкіСЯРГЕЙ БУДКІН. МЫ НЯ ЗНАЕМ СВАЁЙ МУЗЫКІМ. А. Каладзінскі. НАРОДНЫ ТЭАТРМагнацкія культурныя цэнтрыПублічная дыскусія «Беларуская новая пьеса: без беларускай мовы ці беларуская?»Беларускія драматургі па-ранейшаму лепш ставяцца за мяжой, чым на радзіме«Працэс незалежнага кіно пайшоў, і дзяржаву турбуе яго непадкантрольнасць»Беларускія філосафы ў пошуках прасторыВсе идём в библиотекуАрхіваванаАб Нацыянальнай праграме даследавання і выкарыстання касмічнай прасторы ў мірных мэтах на 2008—2012 гадыУ космас — разам.У суседнім з Барысаўскім раёне пабудуюць Камандна-вымяральны пунктСвяты і абрады беларусаў«Мірныя бульбашы з малой краіны» — 5 непраўдзівых стэрэатыпаў пра БеларусьМ. Раманюк. Беларускае народнае адзеннеУ Беларусі скарачаецца колькасць злачынстваўЛукашэнка незадаволены мінскімі ўладамі Крадзяжы складаюць у Мінску каля 70% злачынстваў Узровень злачыннасці ў Мінскай вобласці — адзін з самых высокіх у краіне Генпракуратура аналізуе стан са злачыннасцю ў Беларусі па каэфіцыенце злачыннасці У Беларусі стабілізавалася крымінагеннае становішча, лічыць генпракурорЗамежнікі сталі здзяйсняць у Беларусі больш злачынстваўМУС Беларусі турбуе рост рэцыдыўнай злачыннасціЯ з ЖЭСа. Дазволіце вас абкрасці! Рэйтынг усіх службаў і падраздзяленняў ГУУС Мінгарвыканкама вырасАб КДБ РБГісторыя Аператыўна-аналітычнага цэнтра РБГісторыя ДКФРТаможняagentura.ruБеларусьBelarus.by — Афіцыйны сайт Рэспублікі БеларусьСайт урада БеларусіRadzima.org — Збор архітэктурных помнікаў, гісторыя Беларусі«Глобус Беларуси»Гербы и флаги БеларусиАсаблівасці каменнага веку на БеларусіА. Калечыц, У. Ксяндзоў. Старажытны каменны век (палеаліт). Першапачатковае засяленне тэрыторыіУ. Ксяндзоў. Сярэдні каменны век (мезаліт). Засяленне краю плямёнамі паляўнічых, рыбакоў і збіральнікаўА. Калечыц, М. Чарняўскі. Плямёны на тэрыторыі Беларусі ў новым каменным веку (неаліце)А. Калечыц, У. Ксяндзоў, М. Чарняўскі. Гаспадарчыя заняткі ў каменным векуЭ. Зайкоўскі. Духоўная культура ў каменным векуАсаблівасці бронзавага веку на БеларусіФарміраванне супольнасцей ранняга перыяду бронзавага векуФотографии БеларусиРоля беларускіх зямель ва ўтварэнні і ўмацаванні ВКЛВ. Фадзеева. З гісторыі развіцця беларускай народнай вышыўкіDMOZGran catalanaБольшая российскаяBritannica (анлайн)Швейцарскі гістарычны15325917611952699xDA123282154079143-90000 0001 2171 2080n9112870100577502ge128882171858027501086026362074122714179пппппп

            ValueError: Expected n_neighbors <= n_samples, but n_samples = 1, n_neighbors = 6 (SMOTE) The 2019 Stack Overflow Developer Survey Results Are InCan SMOTE be applied over sequence of words (sentences)?ValueError when doing validation with random forestsSMOTE and multi class oversamplingLogic behind SMOTE-NC?ValueError: Error when checking target: expected dense_1 to have shape (7,) but got array with shape (1,)SmoteBoost: Should SMOTE be ran individually for each iteration/tree in the boosting?solving multi-class imbalance classification using smote and OSSUsing SMOTE for Synthetic Data generation to improve performance on unbalanced dataproblem of entry format for a simple model in KerasSVM SMOTE fit_resample() function runs forever with no result