Do Random Forest overfit?2019 Community Moderator ElectionBinary classification model for sparse / biased dataAny mini-batch implementation of Random Forest?Applying random forest model to a dataframe with multiple types of dataRandom Forest Modelling?Primer on Random Forest AlgorithmPossible Reason for low Test accuracy and high AUCWrong train/test split strategyIs there an overview over recommender system architectures?To be useful, doesn't a test set often become a second dev set?overfit a Random Forest
Extreme, but not acceptable situation and I can't start the work tomorrow morning
Can I find out the caloric content of bread by dehydrating it?
Could a US political party gain complete control over the government by removing checks & balances?
What to wear for invited talk in Canada
Can a planet have a different gravitational pull depending on its location in orbit around its sun?
Crop image to path created in TikZ?
Copycat chess is back
What happens when a metallic dragon and a chromatic dragon mate?
How to deal with fear of taking dependencies
Piano - What is the notation for a double stop where both notes in the double stop are different lengths?
What do you call something that goes against the spirit of the law, but is legal when interpreting the law to the letter?
What is the offset in a seaplane's hull?
Are objects structures and/or vice versa?
What does 'script /dev/null' do?
Lied on resume at previous job
When blogging recipes, how can I support both readers who want the narrative/journey and ones who want the printer-friendly recipe?
Symmetry in quantum mechanics
What is the meaning of "of trouble" in the following sentence?
Is it wise to focus on putting odd beats on left when playing double bass drums?
Are white and non-white police officers equally likely to kill black suspects?
extract characters between two commas?
How to create a consistent feel for character names in a fantasy setting?
Pristine Bit Checking
How can I add custom success page
Do Random Forest overfit?
2019 Community Moderator ElectionBinary classification model for sparse / biased dataAny mini-batch implementation of Random Forest?Applying random forest model to a dataframe with multiple types of dataRandom Forest Modelling?Primer on Random Forest AlgorithmPossible Reason for low Test accuracy and high AUCWrong train/test split strategyIs there an overview over recommender system architectures?To be useful, doesn't a test set often become a second dev set?overfit a Random Forest
$begingroup$
I have been reading around about Random Forests but I cannot really find a definitive answer about the problem of overfitting. According to the original paper of Breiman, they should not overfit when increasing the number of trees in the forest, but it seems that there is not consensus about this. This is creating me quite some confusion about the issue.
Maybe someone more expert than me can give me a more concrete answer or point me in the right direction to better understand the problem.
machine-learning random-forest
$endgroup$
add a comment |
$begingroup$
I have been reading around about Random Forests but I cannot really find a definitive answer about the problem of overfitting. According to the original paper of Breiman, they should not overfit when increasing the number of trees in the forest, but it seems that there is not consensus about this. This is creating me quite some confusion about the issue.
Maybe someone more expert than me can give me a more concrete answer or point me in the right direction to better understand the problem.
machine-learning random-forest
$endgroup$
3
$begingroup$
All algorithms will overfit to some degree. It's not about picking something that doesn't overfit, it's about carefully considering the amount of overfitting and the form of the problem you're solving to maximize more relevant metrics.
$endgroup$
– indico
Aug 23 '14 at 18:16
1
$begingroup$
ISTR that Breiman had a proof based on the Law of Large Numbers. Has someone discovered a flaw in that proof?
$endgroup$
– JenSCDC
Aug 28 '14 at 1:18
$begingroup$
@AndyBlankertz ISTR = internetslang.com/ISTR-meaning-definition.asp ?
$endgroup$
– Hack-R
Nov 3 '15 at 3:15
add a comment |
$begingroup$
I have been reading around about Random Forests but I cannot really find a definitive answer about the problem of overfitting. According to the original paper of Breiman, they should not overfit when increasing the number of trees in the forest, but it seems that there is not consensus about this. This is creating me quite some confusion about the issue.
Maybe someone more expert than me can give me a more concrete answer or point me in the right direction to better understand the problem.
machine-learning random-forest
$endgroup$
I have been reading around about Random Forests but I cannot really find a definitive answer about the problem of overfitting. According to the original paper of Breiman, they should not overfit when increasing the number of trees in the forest, but it seems that there is not consensus about this. This is creating me quite some confusion about the issue.
Maybe someone more expert than me can give me a more concrete answer or point me in the right direction to better understand the problem.
machine-learning random-forest
machine-learning random-forest
asked Aug 23 '14 at 16:54
markusianmarkusian
270128
270128
3
$begingroup$
All algorithms will overfit to some degree. It's not about picking something that doesn't overfit, it's about carefully considering the amount of overfitting and the form of the problem you're solving to maximize more relevant metrics.
$endgroup$
– indico
Aug 23 '14 at 18:16
1
$begingroup$
ISTR that Breiman had a proof based on the Law of Large Numbers. Has someone discovered a flaw in that proof?
$endgroup$
– JenSCDC
Aug 28 '14 at 1:18
$begingroup$
@AndyBlankertz ISTR = internetslang.com/ISTR-meaning-definition.asp ?
$endgroup$
– Hack-R
Nov 3 '15 at 3:15
add a comment |
3
$begingroup$
All algorithms will overfit to some degree. It's not about picking something that doesn't overfit, it's about carefully considering the amount of overfitting and the form of the problem you're solving to maximize more relevant metrics.
$endgroup$
– indico
Aug 23 '14 at 18:16
1
$begingroup$
ISTR that Breiman had a proof based on the Law of Large Numbers. Has someone discovered a flaw in that proof?
$endgroup$
– JenSCDC
Aug 28 '14 at 1:18
$begingroup$
@AndyBlankertz ISTR = internetslang.com/ISTR-meaning-definition.asp ?
$endgroup$
– Hack-R
Nov 3 '15 at 3:15
3
3
$begingroup$
All algorithms will overfit to some degree. It's not about picking something that doesn't overfit, it's about carefully considering the amount of overfitting and the form of the problem you're solving to maximize more relevant metrics.
$endgroup$
– indico
Aug 23 '14 at 18:16
$begingroup$
All algorithms will overfit to some degree. It's not about picking something that doesn't overfit, it's about carefully considering the amount of overfitting and the form of the problem you're solving to maximize more relevant metrics.
$endgroup$
– indico
Aug 23 '14 at 18:16
1
1
$begingroup$
ISTR that Breiman had a proof based on the Law of Large Numbers. Has someone discovered a flaw in that proof?
$endgroup$
– JenSCDC
Aug 28 '14 at 1:18
$begingroup$
ISTR that Breiman had a proof based on the Law of Large Numbers. Has someone discovered a flaw in that proof?
$endgroup$
– JenSCDC
Aug 28 '14 at 1:18
$begingroup$
@AndyBlankertz ISTR = internetslang.com/ISTR-meaning-definition.asp ?
$endgroup$
– Hack-R
Nov 3 '15 at 3:15
$begingroup$
@AndyBlankertz ISTR = internetslang.com/ISTR-meaning-definition.asp ?
$endgroup$
– Hack-R
Nov 3 '15 at 3:15
add a comment |
4 Answers
4
active
oldest
votes
$begingroup$
Every ML algorithm with high complexity can overfit. However, the OP is asking whether an RF will not overfit when increasing the number of trees in the forest.
In general, ensemble methods reduces the prediction variance to almost nothing, improving the accuracy of the ensemble. If we define the variance of the expected generalization error of an individual randomized model as:
From here, the variance of the expected generalization error of an ensemble corresponds to:
where p(x)
is the Pearson’s correlation coefficient between the predictions of two randomized models trained on the same data from two independent seeds. If we increase the number of DT's in the RF, larger M
, the variance of the ensemble decreases when ρ(x)<1
. Therefore, the variance of an ensemble is strictly smaller than the variance of an individual model.
In a nutshell, increasing the number of individual randomized models in an ensemble will never increase the generalization error.
$endgroup$
1
$begingroup$
That's definitely what Leo Breiman and the theory says, but empirically it seems like they definitely do overfit. For example I currently have a model with 10-fold CV MSE of 0.02 but when measured against the ground truth the CV MSE is .4. OTOH if I reduce tree depth and tree number the model performance improves significantly.
$endgroup$
– Hack-R
Feb 18 '16 at 14:41
3
$begingroup$
If you reduce the tree depth is a different case because you are adding regularisation, which will decrease the overfitting. Try to plot the MSE when you increase the number of trees while keeping the rest of parameters unchanged. So, you have MSE in the y-axis and num_tress in the x-axis. You will see that when adding more trees, the error decreases fast, and then it has a plateau; but it will never increase.
$endgroup$
– tashuhka
Feb 19 '16 at 13:43
add a comment |
$begingroup$
You may want to check cross-validated - a stachexchange website for many things, including machine learning.
In particular, this question (with exactly same title) has already been answered multiple times. Check these links: https://stats.stackexchange.com/search?q=random+forest+overfit
But I may give you the short answer to it: yes, it does overfit, and sometimes you need to control the complexity of the trees in your forest, or even prune when they grow too much - but this depends on the library you use for building the forest. E.g. in randomForest
in R you can only control the complexity
$endgroup$
add a comment |
$begingroup$
STRUCTURED DATASET -> MISLEADING OOB ERRORS
I've found interesting case of RF overfitting in my work practice. When data are structured RF overfits on OOB observations.
Detail :
I try to predict electricity prices on electricity spot market for each single hour (each row of dataset contain price and system parameters (load, capacities etc.) for that single hour).
Electricity prices are created in batches (24 prices created on electricity market in one fixing in one moment of time).
So OOB obs for each tree are random subsets of set of hours, but if you predict next 24 hours you do it all at once (in first moment you obtain all system parameters, then you predict 24 prices, then there is an fixing which produces those prices), so its easier to make OOB predictions, then for the whole next day. OOB obs are not contained in 24-hour blocks, but dispersed uniformly, as there is an autocorrelation of prediction errors its easier to predict price for single hour which is missing then for whole block of missing hours.
easier to predict in case of error autocorrelation :
known, known, prediction, known, prediction - OBB case
harder one :
known, known, known, prediction, prediction - real world prediction case
I hope its interesting
$endgroup$
add a comment |
$begingroup$
- The Random Forest does overfit.
- The Random Forest does not increase generalization error when more trees are added to the model. The generalization variance is going to zero with more trees used.
I've made a very simple experiment. I have generated the synthetic data:
y = 10 * x + noise
I've train two Random Forest models:
- one with full trees
- one with pruned trees
The model with full trees has lower train error but higher test error than the model with pruned trees. The responses of both models:
It is clear evidence of overfitting. Then I took the hyper-parameters of the overfitted model and check the error while adding at each step 1 tree. I got the following plot:
As you can see the overfit error is not changing when adding more trees but the model is overfitted. Here is the link for the experiment I've made.
$endgroup$
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
);
);
, "mathjax-editing");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f1028%2fdo-random-forest-overfit%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
4 Answers
4
active
oldest
votes
4 Answers
4
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
Every ML algorithm with high complexity can overfit. However, the OP is asking whether an RF will not overfit when increasing the number of trees in the forest.
In general, ensemble methods reduces the prediction variance to almost nothing, improving the accuracy of the ensemble. If we define the variance of the expected generalization error of an individual randomized model as:
From here, the variance of the expected generalization error of an ensemble corresponds to:
where p(x)
is the Pearson’s correlation coefficient between the predictions of two randomized models trained on the same data from two independent seeds. If we increase the number of DT's in the RF, larger M
, the variance of the ensemble decreases when ρ(x)<1
. Therefore, the variance of an ensemble is strictly smaller than the variance of an individual model.
In a nutshell, increasing the number of individual randomized models in an ensemble will never increase the generalization error.
$endgroup$
1
$begingroup$
That's definitely what Leo Breiman and the theory says, but empirically it seems like they definitely do overfit. For example I currently have a model with 10-fold CV MSE of 0.02 but when measured against the ground truth the CV MSE is .4. OTOH if I reduce tree depth and tree number the model performance improves significantly.
$endgroup$
– Hack-R
Feb 18 '16 at 14:41
3
$begingroup$
If you reduce the tree depth is a different case because you are adding regularisation, which will decrease the overfitting. Try to plot the MSE when you increase the number of trees while keeping the rest of parameters unchanged. So, you have MSE in the y-axis and num_tress in the x-axis. You will see that when adding more trees, the error decreases fast, and then it has a plateau; but it will never increase.
$endgroup$
– tashuhka
Feb 19 '16 at 13:43
add a comment |
$begingroup$
Every ML algorithm with high complexity can overfit. However, the OP is asking whether an RF will not overfit when increasing the number of trees in the forest.
In general, ensemble methods reduces the prediction variance to almost nothing, improving the accuracy of the ensemble. If we define the variance of the expected generalization error of an individual randomized model as:
From here, the variance of the expected generalization error of an ensemble corresponds to:
where p(x)
is the Pearson’s correlation coefficient between the predictions of two randomized models trained on the same data from two independent seeds. If we increase the number of DT's in the RF, larger M
, the variance of the ensemble decreases when ρ(x)<1
. Therefore, the variance of an ensemble is strictly smaller than the variance of an individual model.
In a nutshell, increasing the number of individual randomized models in an ensemble will never increase the generalization error.
$endgroup$
1
$begingroup$
That's definitely what Leo Breiman and the theory says, but empirically it seems like they definitely do overfit. For example I currently have a model with 10-fold CV MSE of 0.02 but when measured against the ground truth the CV MSE is .4. OTOH if I reduce tree depth and tree number the model performance improves significantly.
$endgroup$
– Hack-R
Feb 18 '16 at 14:41
3
$begingroup$
If you reduce the tree depth is a different case because you are adding regularisation, which will decrease the overfitting. Try to plot the MSE when you increase the number of trees while keeping the rest of parameters unchanged. So, you have MSE in the y-axis and num_tress in the x-axis. You will see that when adding more trees, the error decreases fast, and then it has a plateau; but it will never increase.
$endgroup$
– tashuhka
Feb 19 '16 at 13:43
add a comment |
$begingroup$
Every ML algorithm with high complexity can overfit. However, the OP is asking whether an RF will not overfit when increasing the number of trees in the forest.
In general, ensemble methods reduces the prediction variance to almost nothing, improving the accuracy of the ensemble. If we define the variance of the expected generalization error of an individual randomized model as:
From here, the variance of the expected generalization error of an ensemble corresponds to:
where p(x)
is the Pearson’s correlation coefficient between the predictions of two randomized models trained on the same data from two independent seeds. If we increase the number of DT's in the RF, larger M
, the variance of the ensemble decreases when ρ(x)<1
. Therefore, the variance of an ensemble is strictly smaller than the variance of an individual model.
In a nutshell, increasing the number of individual randomized models in an ensemble will never increase the generalization error.
$endgroup$
Every ML algorithm with high complexity can overfit. However, the OP is asking whether an RF will not overfit when increasing the number of trees in the forest.
In general, ensemble methods reduces the prediction variance to almost nothing, improving the accuracy of the ensemble. If we define the variance of the expected generalization error of an individual randomized model as:
From here, the variance of the expected generalization error of an ensemble corresponds to:
where p(x)
is the Pearson’s correlation coefficient between the predictions of two randomized models trained on the same data from two independent seeds. If we increase the number of DT's in the RF, larger M
, the variance of the ensemble decreases when ρ(x)<1
. Therefore, the variance of an ensemble is strictly smaller than the variance of an individual model.
In a nutshell, increasing the number of individual randomized models in an ensemble will never increase the generalization error.
edited Nov 17 '15 at 16:19
DaL
2,194411
2,194411
answered Oct 20 '14 at 9:31
tashuhkatashuhka
356310
356310
1
$begingroup$
That's definitely what Leo Breiman and the theory says, but empirically it seems like they definitely do overfit. For example I currently have a model with 10-fold CV MSE of 0.02 but when measured against the ground truth the CV MSE is .4. OTOH if I reduce tree depth and tree number the model performance improves significantly.
$endgroup$
– Hack-R
Feb 18 '16 at 14:41
3
$begingroup$
If you reduce the tree depth is a different case because you are adding regularisation, which will decrease the overfitting. Try to plot the MSE when you increase the number of trees while keeping the rest of parameters unchanged. So, you have MSE in the y-axis and num_tress in the x-axis. You will see that when adding more trees, the error decreases fast, and then it has a plateau; but it will never increase.
$endgroup$
– tashuhka
Feb 19 '16 at 13:43
add a comment |
1
$begingroup$
That's definitely what Leo Breiman and the theory says, but empirically it seems like they definitely do overfit. For example I currently have a model with 10-fold CV MSE of 0.02 but when measured against the ground truth the CV MSE is .4. OTOH if I reduce tree depth and tree number the model performance improves significantly.
$endgroup$
– Hack-R
Feb 18 '16 at 14:41
3
$begingroup$
If you reduce the tree depth is a different case because you are adding regularisation, which will decrease the overfitting. Try to plot the MSE when you increase the number of trees while keeping the rest of parameters unchanged. So, you have MSE in the y-axis and num_tress in the x-axis. You will see that when adding more trees, the error decreases fast, and then it has a plateau; but it will never increase.
$endgroup$
– tashuhka
Feb 19 '16 at 13:43
1
1
$begingroup$
That's definitely what Leo Breiman and the theory says, but empirically it seems like they definitely do overfit. For example I currently have a model with 10-fold CV MSE of 0.02 but when measured against the ground truth the CV MSE is .4. OTOH if I reduce tree depth and tree number the model performance improves significantly.
$endgroup$
– Hack-R
Feb 18 '16 at 14:41
$begingroup$
That's definitely what Leo Breiman and the theory says, but empirically it seems like they definitely do overfit. For example I currently have a model with 10-fold CV MSE of 0.02 but when measured against the ground truth the CV MSE is .4. OTOH if I reduce tree depth and tree number the model performance improves significantly.
$endgroup$
– Hack-R
Feb 18 '16 at 14:41
3
3
$begingroup$
If you reduce the tree depth is a different case because you are adding regularisation, which will decrease the overfitting. Try to plot the MSE when you increase the number of trees while keeping the rest of parameters unchanged. So, you have MSE in the y-axis and num_tress in the x-axis. You will see that when adding more trees, the error decreases fast, and then it has a plateau; but it will never increase.
$endgroup$
– tashuhka
Feb 19 '16 at 13:43
$begingroup$
If you reduce the tree depth is a different case because you are adding regularisation, which will decrease the overfitting. Try to plot the MSE when you increase the number of trees while keeping the rest of parameters unchanged. So, you have MSE in the y-axis and num_tress in the x-axis. You will see that when adding more trees, the error decreases fast, and then it has a plateau; but it will never increase.
$endgroup$
– tashuhka
Feb 19 '16 at 13:43
add a comment |
$begingroup$
You may want to check cross-validated - a stachexchange website for many things, including machine learning.
In particular, this question (with exactly same title) has already been answered multiple times. Check these links: https://stats.stackexchange.com/search?q=random+forest+overfit
But I may give you the short answer to it: yes, it does overfit, and sometimes you need to control the complexity of the trees in your forest, or even prune when they grow too much - but this depends on the library you use for building the forest. E.g. in randomForest
in R you can only control the complexity
$endgroup$
add a comment |
$begingroup$
You may want to check cross-validated - a stachexchange website for many things, including machine learning.
In particular, this question (with exactly same title) has already been answered multiple times. Check these links: https://stats.stackexchange.com/search?q=random+forest+overfit
But I may give you the short answer to it: yes, it does overfit, and sometimes you need to control the complexity of the trees in your forest, or even prune when they grow too much - but this depends on the library you use for building the forest. E.g. in randomForest
in R you can only control the complexity
$endgroup$
add a comment |
$begingroup$
You may want to check cross-validated - a stachexchange website for many things, including machine learning.
In particular, this question (with exactly same title) has already been answered multiple times. Check these links: https://stats.stackexchange.com/search?q=random+forest+overfit
But I may give you the short answer to it: yes, it does overfit, and sometimes you need to control the complexity of the trees in your forest, or even prune when they grow too much - but this depends on the library you use for building the forest. E.g. in randomForest
in R you can only control the complexity
$endgroup$
You may want to check cross-validated - a stachexchange website for many things, including machine learning.
In particular, this question (with exactly same title) has already been answered multiple times. Check these links: https://stats.stackexchange.com/search?q=random+forest+overfit
But I may give you the short answer to it: yes, it does overfit, and sometimes you need to control the complexity of the trees in your forest, or even prune when they grow too much - but this depends on the library you use for building the forest. E.g. in randomForest
in R you can only control the complexity
edited Apr 13 '17 at 12:44
Community♦
1
1
answered Aug 24 '14 at 8:22
Alexey GrigorevAlexey Grigorev
1,900617
1,900617
add a comment |
add a comment |
$begingroup$
STRUCTURED DATASET -> MISLEADING OOB ERRORS
I've found interesting case of RF overfitting in my work practice. When data are structured RF overfits on OOB observations.
Detail :
I try to predict electricity prices on electricity spot market for each single hour (each row of dataset contain price and system parameters (load, capacities etc.) for that single hour).
Electricity prices are created in batches (24 prices created on electricity market in one fixing in one moment of time).
So OOB obs for each tree are random subsets of set of hours, but if you predict next 24 hours you do it all at once (in first moment you obtain all system parameters, then you predict 24 prices, then there is an fixing which produces those prices), so its easier to make OOB predictions, then for the whole next day. OOB obs are not contained in 24-hour blocks, but dispersed uniformly, as there is an autocorrelation of prediction errors its easier to predict price for single hour which is missing then for whole block of missing hours.
easier to predict in case of error autocorrelation :
known, known, prediction, known, prediction - OBB case
harder one :
known, known, known, prediction, prediction - real world prediction case
I hope its interesting
$endgroup$
add a comment |
$begingroup$
STRUCTURED DATASET -> MISLEADING OOB ERRORS
I've found interesting case of RF overfitting in my work practice. When data are structured RF overfits on OOB observations.
Detail :
I try to predict electricity prices on electricity spot market for each single hour (each row of dataset contain price and system parameters (load, capacities etc.) for that single hour).
Electricity prices are created in batches (24 prices created on electricity market in one fixing in one moment of time).
So OOB obs for each tree are random subsets of set of hours, but if you predict next 24 hours you do it all at once (in first moment you obtain all system parameters, then you predict 24 prices, then there is an fixing which produces those prices), so its easier to make OOB predictions, then for the whole next day. OOB obs are not contained in 24-hour blocks, but dispersed uniformly, as there is an autocorrelation of prediction errors its easier to predict price for single hour which is missing then for whole block of missing hours.
easier to predict in case of error autocorrelation :
known, known, prediction, known, prediction - OBB case
harder one :
known, known, known, prediction, prediction - real world prediction case
I hope its interesting
$endgroup$
add a comment |
$begingroup$
STRUCTURED DATASET -> MISLEADING OOB ERRORS
I've found interesting case of RF overfitting in my work practice. When data are structured RF overfits on OOB observations.
Detail :
I try to predict electricity prices on electricity spot market for each single hour (each row of dataset contain price and system parameters (load, capacities etc.) for that single hour).
Electricity prices are created in batches (24 prices created on electricity market in one fixing in one moment of time).
So OOB obs for each tree are random subsets of set of hours, but if you predict next 24 hours you do it all at once (in first moment you obtain all system parameters, then you predict 24 prices, then there is an fixing which produces those prices), so its easier to make OOB predictions, then for the whole next day. OOB obs are not contained in 24-hour blocks, but dispersed uniformly, as there is an autocorrelation of prediction errors its easier to predict price for single hour which is missing then for whole block of missing hours.
easier to predict in case of error autocorrelation :
known, known, prediction, known, prediction - OBB case
harder one :
known, known, known, prediction, prediction - real world prediction case
I hope its interesting
$endgroup$
STRUCTURED DATASET -> MISLEADING OOB ERRORS
I've found interesting case of RF overfitting in my work practice. When data are structured RF overfits on OOB observations.
Detail :
I try to predict electricity prices on electricity spot market for each single hour (each row of dataset contain price and system parameters (load, capacities etc.) for that single hour).
Electricity prices are created in batches (24 prices created on electricity market in one fixing in one moment of time).
So OOB obs for each tree are random subsets of set of hours, but if you predict next 24 hours you do it all at once (in first moment you obtain all system parameters, then you predict 24 prices, then there is an fixing which produces those prices), so its easier to make OOB predictions, then for the whole next day. OOB obs are not contained in 24-hour blocks, but dispersed uniformly, as there is an autocorrelation of prediction errors its easier to predict price for single hour which is missing then for whole block of missing hours.
easier to predict in case of error autocorrelation :
known, known, prediction, known, prediction - OBB case
harder one :
known, known, known, prediction, prediction - real world prediction case
I hope its interesting
answered Jul 22 '16 at 8:15
QbikQbik
1284
1284
add a comment |
add a comment |
$begingroup$
- The Random Forest does overfit.
- The Random Forest does not increase generalization error when more trees are added to the model. The generalization variance is going to zero with more trees used.
I've made a very simple experiment. I have generated the synthetic data:
y = 10 * x + noise
I've train two Random Forest models:
- one with full trees
- one with pruned trees
The model with full trees has lower train error but higher test error than the model with pruned trees. The responses of both models:
It is clear evidence of overfitting. Then I took the hyper-parameters of the overfitted model and check the error while adding at each step 1 tree. I got the following plot:
As you can see the overfit error is not changing when adding more trees but the model is overfitted. Here is the link for the experiment I've made.
$endgroup$
add a comment |
$begingroup$
- The Random Forest does overfit.
- The Random Forest does not increase generalization error when more trees are added to the model. The generalization variance is going to zero with more trees used.
I've made a very simple experiment. I have generated the synthetic data:
y = 10 * x + noise
I've train two Random Forest models:
- one with full trees
- one with pruned trees
The model with full trees has lower train error but higher test error than the model with pruned trees. The responses of both models:
It is clear evidence of overfitting. Then I took the hyper-parameters of the overfitted model and check the error while adding at each step 1 tree. I got the following plot:
As you can see the overfit error is not changing when adding more trees but the model is overfitted. Here is the link for the experiment I've made.
$endgroup$
add a comment |
$begingroup$
- The Random Forest does overfit.
- The Random Forest does not increase generalization error when more trees are added to the model. The generalization variance is going to zero with more trees used.
I've made a very simple experiment. I have generated the synthetic data:
y = 10 * x + noise
I've train two Random Forest models:
- one with full trees
- one with pruned trees
The model with full trees has lower train error but higher test error than the model with pruned trees. The responses of both models:
It is clear evidence of overfitting. Then I took the hyper-parameters of the overfitted model and check the error while adding at each step 1 tree. I got the following plot:
As you can see the overfit error is not changing when adding more trees but the model is overfitted. Here is the link for the experiment I've made.
$endgroup$
- The Random Forest does overfit.
- The Random Forest does not increase generalization error when more trees are added to the model. The generalization variance is going to zero with more trees used.
I've made a very simple experiment. I have generated the synthetic data:
y = 10 * x + noise
I've train two Random Forest models:
- one with full trees
- one with pruned trees
The model with full trees has lower train error but higher test error than the model with pruned trees. The responses of both models:
It is clear evidence of overfitting. Then I took the hyper-parameters of the overfitted model and check the error while adding at each step 1 tree. I got the following plot:
As you can see the overfit error is not changing when adding more trees but the model is overfitted. Here is the link for the experiment I've made.
answered 15 hours ago
pplonskipplonski
21115
21115
add a comment |
add a comment |
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f1028%2fdo-random-forest-overfit%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
3
$begingroup$
All algorithms will overfit to some degree. It's not about picking something that doesn't overfit, it's about carefully considering the amount of overfitting and the form of the problem you're solving to maximize more relevant metrics.
$endgroup$
– indico
Aug 23 '14 at 18:16
1
$begingroup$
ISTR that Breiman had a proof based on the Law of Large Numbers. Has someone discovered a flaw in that proof?
$endgroup$
– JenSCDC
Aug 28 '14 at 1:18
$begingroup$
@AndyBlankertz ISTR = internetslang.com/ISTR-meaning-definition.asp ?
$endgroup$
– Hack-R
Nov 3 '15 at 3:15