Misclassification Rate for Random Forest Plateauing too Early Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern) 2019 Moderator Election Q&A - Questionnaire 2019 Community Moderator Election ResultsHow many features to sample using Random ForestsR lm(log(y)~x,data) models and predict, need to remember the exp. R2 differencesRandom Forest Class Weighting for Logistic ProbabilitiesMinimum number of trees for Random Forest classifierRandom Forest Modelling?Primer on Random Forest AlgorithmLogistic or Random Forest?Random Forest vs. RainForestWEKA Random Forest J48 Attribute Importance

Stars Make Stars

Using "nakedly" instead of "with nothing on"

Why use gamma over alpha radiation?

What computer would be fastest for Mathematica Home Edition?

Need a suitable toxic chemical for a murder plot in my novel

Why is there no army of Iron-Mans in the MCU?

What do you call a plan that's an alternative plan in case your initial plan fails?

If A makes B more likely then B makes A more likely"

How can I protect witches in combat who wear limited clothing?

Simulating Exploding Dice

Is there a documented rationale why the House Ways and Means chairman can demand tax info?

Aligning matrix of nodes with grid

Complexity of many constant time steps with occasional logarithmic steps

Is above average number of years spent on PhD considered a red flag in future academia or industry positions?

What LEGO pieces have "real-world" functionality?

Stop battery usage [Ubuntu 18]

What did Darwin mean by 'squib' here?

Passing functions in C++

How to colour the US map with Yellow, Green, Red and Blue to minimize the number of states with the colour of Green

What's the difference between (size_t)-1 and ~0?

How to market an anarchic city as a tourism spot to people living in civilized areas?

Keep going mode for require-package

Can I throw a sword that doesn't have the Thrown property at someone?

How are presidential pardons supposed to be used?

Misclassification Rate for Random Forest Plateauing too Early

Announcing the arrival of Valued Associate #679: Cesar Manara

Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern)

2019 Moderator Election Q&A - Questionnaire

2019 Community Moderator Election ResultsHow many features to sample using Random ForestsR lm(log(y)~x,data) models and predict, need to remember the exp. R2 differencesRandom Forest Class Weighting for Logistic ProbabilitiesMinimum number of trees for Random Forest classifierRandom Forest Modelling?Primer on Random Forest AlgorithmLogistic or Random Forest?Random Forest vs. RainForestWEKA Random Forest J48 Attribute Importance

Using R, I have created 5 different random forest models using 5 different numbers of trees (3,10,30,100,300). My intention was to compute the misclassification rates of each of these models and plot the rates against the number of trees to illustrate the idea that generally, an increase in trees in a random forest model correlates with a decreasing misclassification rate.

I had a few colleagues run this same model in Python and with all of them, their model reached a misclassification rate of ~0.08 with the 300-tree model. However, When I run my models in R, the misclassification rate seems to level out around ~0.2 at the 100-tree model, and does not get any lower with the ~300 tree model. I'm curious as to what may be causing this discrepancy. I've provided my code below.

madelon_train <- data.frame(madelon_train_data, madelon_train_labels)
for(i in c(3,10,30,100,300))
 assign(paste("madelonforest", i, sep = ""), 
 randomForest(as.factor(madelon_train$V1.1) ~ ., data = madelon_train, ntree = 
 i, mtry = sqrt(500), replace = FALSE)) 


modellist <- vector(mode="list", length=5)
for(i in c(3,10,30,100,300))
 modellist[[i]] <- eval(as.name(paste("madelonforest", i, sep = "")))



#Use models to predict training data and compute misclassification error

classerrlisttrain <- vector(mode="list", length=5)
for(i in c(3,10,30,100,300))
 err <-table(as.numeric(as.character(predict(modellist[[i]], 
 madelon_train_data, type = 'class', OOB = TRUE))) - madelon_train_labels)
 classerrlisttrain[[i]] <- assign(paste("misclassification", i, sep = ""), 
err[names(err)==0])


for(i in c(3,10,30,100,300))
 classerrlisttrain[[i]] = as.double(classerrlisttrain[[i]])
 classerrlisttrain[[i]] = 1 - 
classerrlisttrain[[i]]/length(madelon_train_labels$V1)



#Use models to predict test data and compute misclassification error

classerrlisttest <- vector(mode="list", length=5)
for(i in c(3,10,30,100,300))
 err <-table(as.numeric(as.character(predict(modellist[[i]], 
 madelon_valid_data, type = 'class'))) - madelon_valid_labels)
 classerrlisttest[[i]] <- assign(paste("misclassification", i, sep = ""), 
err[names(err)==0])


for(i in c(3,10,30,100,300))
 classerrlisttest[[i]] = as.double(classerrlisttest[[i]])
 classerrlisttest[[i]] = 1 - 
classerrlisttest[[i]]/length(madelon_valid_labels$V1)



#Plot misclassification errors vs Tree Depth

plot(c(3,10,30,100,300), classerrlisttrain[c(3,10,30,100,300)], type = 'l', 
xlab = 'Number of Trees', ylab = 'Misclassification Rate', xlim = c(1,300), 
ylim = c(0,0.5), col = "red")
lines(c(3,10,30,100,300), classerrlisttest[c(3,10,30,100,300)], type = 'l', 
col = "blue")
legend(1,0.1,legend = c("Train Data", "Test Data"), col = 
c("red","blue"),lty=1, cex=0.8)

asked Sep 10 '18 at 22:19

user58887

bumped to the homepage by Community♦ 3 mins ago

This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.

add a comment |

madelon_train <- data.frame(madelon_train_data, madelon_train_labels)
for(i in c(3,10,30,100,300))
 assign(paste("madelonforest", i, sep = ""), 
 randomForest(as.factor(madelon_train$V1.1) ~ ., data = madelon_train, ntree = 
 i, mtry = sqrt(500), replace = FALSE)) 


modellist <- vector(mode="list", length=5)
for(i in c(3,10,30,100,300))
 modellist[[i]] <- eval(as.name(paste("madelonforest", i, sep = "")))



#Use models to predict training data and compute misclassification error

classerrlisttrain <- vector(mode="list", length=5)
for(i in c(3,10,30,100,300))
 err <-table(as.numeric(as.character(predict(modellist[[i]], 
 madelon_train_data, type = 'class', OOB = TRUE))) - madelon_train_labels)
 classerrlisttrain[[i]] <- assign(paste("misclassification", i, sep = ""), 
err[names(err)==0])


for(i in c(3,10,30,100,300))
 classerrlisttrain[[i]] = as.double(classerrlisttrain[[i]])
 classerrlisttrain[[i]] = 1 - 
classerrlisttrain[[i]]/length(madelon_train_labels$V1)



#Use models to predict test data and compute misclassification error

classerrlisttest <- vector(mode="list", length=5)
for(i in c(3,10,30,100,300))
 err <-table(as.numeric(as.character(predict(modellist[[i]], 
 madelon_valid_data, type = 'class'))) - madelon_valid_labels)
 classerrlisttest[[i]] <- assign(paste("misclassification", i, sep = ""), 
err[names(err)==0])


for(i in c(3,10,30,100,300))
 classerrlisttest[[i]] = as.double(classerrlisttest[[i]])
 classerrlisttest[[i]] = 1 - 
classerrlisttest[[i]]/length(madelon_valid_labels$V1)



#Plot misclassification errors vs Tree Depth

plot(c(3,10,30,100,300), classerrlisttrain[c(3,10,30,100,300)], type = 'l', 
xlab = 'Number of Trees', ylab = 'Misclassification Rate', xlim = c(1,300), 
ylim = c(0,0.5), col = "red")
lines(c(3,10,30,100,300), classerrlisttest[c(3,10,30,100,300)], type = 'l', 
col = "blue")
legend(1,0.1,legend = c("Train Data", "Test Data"), col = 
c("red","blue"),lty=1, cex=0.8)

asked Sep 10 '18 at 22:19

user58887

bumped to the homepage by Community♦ 3 mins ago

This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.

add a comment |

madelon_train <- data.frame(madelon_train_data, madelon_train_labels)
for(i in c(3,10,30,100,300))
 assign(paste("madelonforest", i, sep = ""), 
 randomForest(as.factor(madelon_train$V1.1) ~ ., data = madelon_train, ntree = 
 i, mtry = sqrt(500), replace = FALSE)) 


modellist <- vector(mode="list", length=5)
for(i in c(3,10,30,100,300))
 modellist[[i]] <- eval(as.name(paste("madelonforest", i, sep = "")))



#Use models to predict training data and compute misclassification error

classerrlisttrain <- vector(mode="list", length=5)
for(i in c(3,10,30,100,300))
 err <-table(as.numeric(as.character(predict(modellist[[i]], 
 madelon_train_data, type = 'class', OOB = TRUE))) - madelon_train_labels)
 classerrlisttrain[[i]] <- assign(paste("misclassification", i, sep = ""), 
err[names(err)==0])


for(i in c(3,10,30,100,300))
 classerrlisttrain[[i]] = as.double(classerrlisttrain[[i]])
 classerrlisttrain[[i]] = 1 - 
classerrlisttrain[[i]]/length(madelon_train_labels$V1)



#Use models to predict test data and compute misclassification error

classerrlisttest <- vector(mode="list", length=5)
for(i in c(3,10,30,100,300))
 err <-table(as.numeric(as.character(predict(modellist[[i]], 
 madelon_valid_data, type = 'class'))) - madelon_valid_labels)
 classerrlisttest[[i]] <- assign(paste("misclassification", i, sep = ""), 
err[names(err)==0])


for(i in c(3,10,30,100,300))
 classerrlisttest[[i]] = as.double(classerrlisttest[[i]])
 classerrlisttest[[i]] = 1 - 
classerrlisttest[[i]]/length(madelon_valid_labels$V1)



#Plot misclassification errors vs Tree Depth

plot(c(3,10,30,100,300), classerrlisttrain[c(3,10,30,100,300)], type = 'l', 
xlab = 'Number of Trees', ylab = 'Misclassification Rate', xlim = c(1,300), 
ylim = c(0,0.5), col = "red")
lines(c(3,10,30,100,300), classerrlisttest[c(3,10,30,100,300)], type = 'l', 
col = "blue")
legend(1,0.1,legend = c("Train Data", "Test Data"), col = 
c("red","blue"),lty=1, cex=0.8)

asked Sep 10 '18 at 22:19

user58887

madelon_train <- data.frame(madelon_train_data, madelon_train_labels)
for(i in c(3,10,30,100,300))
 assign(paste("madelonforest", i, sep = ""), 
 randomForest(as.factor(madelon_train$V1.1) ~ ., data = madelon_train, ntree = 
 i, mtry = sqrt(500), replace = FALSE)) 


modellist <- vector(mode="list", length=5)
for(i in c(3,10,30,100,300))
 modellist[[i]] <- eval(as.name(paste("madelonforest", i, sep = "")))



#Use models to predict training data and compute misclassification error

classerrlisttrain <- vector(mode="list", length=5)
for(i in c(3,10,30,100,300))
 err <-table(as.numeric(as.character(predict(modellist[[i]], 
 madelon_train_data, type = 'class', OOB = TRUE))) - madelon_train_labels)
 classerrlisttrain[[i]] <- assign(paste("misclassification", i, sep = ""), 
err[names(err)==0])


for(i in c(3,10,30,100,300))
 classerrlisttrain[[i]] = as.double(classerrlisttrain[[i]])
 classerrlisttrain[[i]] = 1 - 
classerrlisttrain[[i]]/length(madelon_train_labels$V1)



#Use models to predict test data and compute misclassification error

classerrlisttest <- vector(mode="list", length=5)
for(i in c(3,10,30,100,300))
 err <-table(as.numeric(as.character(predict(modellist[[i]], 
 madelon_valid_data, type = 'class'))) - madelon_valid_labels)
 classerrlisttest[[i]] <- assign(paste("misclassification", i, sep = ""), 
err[names(err)==0])


for(i in c(3,10,30,100,300))
 classerrlisttest[[i]] = as.double(classerrlisttest[[i]])
 classerrlisttest[[i]] = 1 - 
classerrlisttest[[i]]/length(madelon_valid_labels$V1)



#Plot misclassification errors vs Tree Depth

plot(c(3,10,30,100,300), classerrlisttrain[c(3,10,30,100,300)], type = 'l', 
xlab = 'Number of Trees', ylab = 'Misclassification Rate', xlim = c(1,300), 
ylim = c(0,0.5), col = "red")
lines(c(3,10,30,100,300), classerrlisttest[c(3,10,30,100,300)], type = 'l', 
col = "blue")
legend(1,0.1,legend = c("Train Data", "Test Data"), col = 
c("red","blue"),lty=1, cex=0.8)

r random-forest decision-trees

asked Sep 10 '18 at 22:19

user58887

asked Sep 10 '18 at 22:19

user58887

asked Sep 10 '18 at 22:19

user58887

asked Sep 10 '18 at 22:19

user58887

asked Sep 10 '18 at 22:19

user58887

bumped to the homepage by Community♦ 3 mins ago

This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.

bumped to the homepage by Community♦ 3 mins ago

This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.

add a comment |

2 Answers
2

active

oldest

votes

One important parameter for Random Forest training is the number of features used for constructing each tree which generally is a function of the number of all features given:

See How many features to sample using Random Forests for further details.

You chose mtry = sqrt(500) and might want to compare your choice with the ones of your friends.

answered Sep 11 '18 at 12:57

Elmar Macek

212

add a comment |

If you and your colleagues ran the same model on the same data you should get the same results (give or take a stochastic error). Did your colleagues use the same environment, same packages and same versions?

Also, it is known that building more trees gives better performance and if possible you should build more not less, as RF does not overfit with more trees, the error / accuracy stabilizes at some point. What that point is (number of trees) varies from data to data, so you cannot really determine this beforehand.

answered Sep 14 '18 at 7:43

user2974951

2355

add a comment |

Your Answer

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f38069%2fmisclassification-rate-for-random-forest-plateauing-too-early%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

One important parameter for Random Forest training is the number of features used for constructing each tree which generally is a function of the number of all features given:

See How many features to sample using Random Forests for further details.

You chose mtry = sqrt(500) and might want to compare your choice with the ones of your friends.

answered Sep 11 '18 at 12:57

Elmar Macek

212

add a comment |

One important parameter for Random Forest training is the number of features used for constructing each tree which generally is a function of the number of all features given:

See How many features to sample using Random Forests for further details.

You chose mtry = sqrt(500) and might want to compare your choice with the ones of your friends.

answered Sep 11 '18 at 12:57

Elmar Macek

212

add a comment |

One important parameter for Random Forest training is the number of features used for constructing each tree which generally is a function of the number of all features given:

See How many features to sample using Random Forests for further details.

You chose mtry = sqrt(500) and might want to compare your choice with the ones of your friends.

answered Sep 11 '18 at 12:57

Elmar Macek

212

One important parameter for Random Forest training is the number of features used for constructing each tree which generally is a function of the number of all features given:

See How many features to sample using Random Forests for further details.

You chose mtry = sqrt(500) and might want to compare your choice with the ones of your friends.

answered Sep 11 '18 at 12:57

Elmar Macek

212

answered Sep 11 '18 at 12:57

Elmar Macek

212

answered Sep 11 '18 at 12:57

Elmar Macek

212

answered Sep 11 '18 at 12:57

Elmar Macek

212

add a comment |

answered Sep 14 '18 at 7:43

user2974951

2355

add a comment |

answered Sep 14 '18 at 7:43

user2974951

2355

add a comment |

answered Sep 14 '18 at 7:43

user2974951

2355

answered Sep 14 '18 at 7:43

user2974951

2355

answered Sep 14 '18 at 7:43

user2974951

2355

answered Sep 14 '18 at 7:43

user2974951

2355

answered Sep 14 '18 at 7:43

user2974951

2355

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Data Science Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Hfrxdjt

bumped to the homepage by Community♦ 3 mins ago

bumped to the homepage by Community♦ 3 mins ago

bumped to the homepage by Community♦ 3 mins ago

bumped to the homepage by Community♦ 3 mins ago

2 Answers
2

Your Answer

Post as a guest

2 Answers
2

2 Answers
2

Post as a guest

Popular posts from this blog

bumped to the homepage by Community♦ 3 mins ago

bumped to the homepage by Community♦ 3 mins ago

bumped to the homepage by Community♦ 3 mins ago

bumped to the homepage by Community♦ 3 mins ago

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

2 Answers 2

2 Answers 2

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

2 Answers
2

2 Answers
2

2 Answers
2