Choosing k value in KNN classifier?2019 Community Moderator ElectionBackpropagation: how do you compute the gradient of the final output with respect to any loss function?scikit-learn classifier reset in loopSci-kit learn function to select threshold for higher recall than precisionInterpreting 1vs1 support vectors in an SVMStacking when the the target variable is categorical?How can I do classification with categorical data which is not fixed?Why does Bagging or Boosting algorithm give better accuracy than basic Algorithms in small datasets?When does decision tree perform better than the neural network?Problem about tuning hyper-parametresHow to use a one-hot encoded nominal feature in a classifier in Scikit Learn?

Patience, young "Padovan"

How to make payment on the internet without leaving a money trail?

How could a lack of term limits lead to a "dictatorship?"

"listening to me about as much as you're listening to this pole here"

Was there ever an axiom rendered a theorem?

Pristine Bit Checking

"My colleague's body is amazing"

Is there a name of the flying bionic bird?

Typesetting a double Over Dot on top of a symbol

Is Social Media Science Fiction?

Is this food a bread or a loaf?

What causes the sudden spool-up sound from an F-16 when enabling afterburner?

Should the British be getting ready for a no-deal Brexit?

Are white and non-white police officers equally likely to kill black suspects?

aging parents with no investments

Why did the Germans forbid the possession of pet pigeons in Rostov-on-Don in 1941?

How to move the player while also allowing forces to affect it

What happens when a metallic dragon and a chromatic dragon mate?

Why do UK politicians seemingly ignore opinion polls on Brexit?

Does a dangling wire really electrocute me if I'm standing in water?

Why is the design of haulage companies so “special”?

How can I plot a Farey diagram?

Landlord wants to switch my lease to a "Land contract" to "get back at the city"

When blogging recipes, how can I support both readers who want the narrative/journey and ones who want the printer-friendly recipe?



Choosing k value in KNN classifier?



2019 Community Moderator ElectionBackpropagation: how do you compute the gradient of the final output with respect to any loss function?scikit-learn classifier reset in loopSci-kit learn function to select threshold for higher recall than precisionInterpreting 1vs1 support vectors in an SVMStacking when the the target variable is categorical?How can I do classification with categorical data which is not fixed?Why does Bagging or Boosting algorithm give better accuracy than basic Algorithms in small datasets?When does decision tree perform better than the neural network?Problem about tuning hyper-parametresHow to use a one-hot encoded nominal feature in a classifier in Scikit Learn?










1












$begingroup$


I'm working on classification problem and decided to use KNN classifier for the problem.



so if k=131 gave me auc of 0.689 and k=71 gave me auc of 0.682 what should be my ideal k?



Does choosing higher k means more usage of computational resource? if that's the case can I go with k=71. (or) should I always use K with maximum score no matter what?










share|improve this question









$endgroup$











  • $begingroup$
    So, are you calculating auc using cross-validation?
    $endgroup$
    – pythinker
    11 hours ago










  • $begingroup$
    @pythinker yes..
    $endgroup$
    – user214
    11 hours ago















1












$begingroup$


I'm working on classification problem and decided to use KNN classifier for the problem.



so if k=131 gave me auc of 0.689 and k=71 gave me auc of 0.682 what should be my ideal k?



Does choosing higher k means more usage of computational resource? if that's the case can I go with k=71. (or) should I always use K with maximum score no matter what?










share|improve this question









$endgroup$











  • $begingroup$
    So, are you calculating auc using cross-validation?
    $endgroup$
    – pythinker
    11 hours ago










  • $begingroup$
    @pythinker yes..
    $endgroup$
    – user214
    11 hours ago













1












1








1





$begingroup$


I'm working on classification problem and decided to use KNN classifier for the problem.



so if k=131 gave me auc of 0.689 and k=71 gave me auc of 0.682 what should be my ideal k?



Does choosing higher k means more usage of computational resource? if that's the case can I go with k=71. (or) should I always use K with maximum score no matter what?










share|improve this question









$endgroup$




I'm working on classification problem and decided to use KNN classifier for the problem.



so if k=131 gave me auc of 0.689 and k=71 gave me auc of 0.682 what should be my ideal k?



Does choosing higher k means more usage of computational resource? if that's the case can I go with k=71. (or) should I always use K with maximum score no matter what?







machine-learning k-nn






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked 12 hours ago









user214user214

22318




22318











  • $begingroup$
    So, are you calculating auc using cross-validation?
    $endgroup$
    – pythinker
    11 hours ago










  • $begingroup$
    @pythinker yes..
    $endgroup$
    – user214
    11 hours ago
















  • $begingroup$
    So, are you calculating auc using cross-validation?
    $endgroup$
    – pythinker
    11 hours ago










  • $begingroup$
    @pythinker yes..
    $endgroup$
    – user214
    11 hours ago















$begingroup$
So, are you calculating auc using cross-validation?
$endgroup$
– pythinker
11 hours ago




$begingroup$
So, are you calculating auc using cross-validation?
$endgroup$
– pythinker
11 hours ago












$begingroup$
@pythinker yes..
$endgroup$
– user214
11 hours ago




$begingroup$
@pythinker yes..
$endgroup$
– user214
11 hours ago










2 Answers
2






active

oldest

votes


















1












$begingroup$

Because knn is a non-parametric method, computational costs of choosing k, highly depends on the size of training data. If the size of training data is small, you can freely choose the k for which the best auc for validation dataset is achieved. In the case where you have a large training dataset, choosing large k can lead to huge computational complexity which is reflected in slow prediction for test data.






share|improve this answer









$endgroup$












  • $begingroup$
    does 100k rows and 8000 features qualify as big training data? Also choosing high k values means we are underfitting how can I know that i'm not underfitting when choosing high k values?
    $endgroup$
    – user214
    11 hours ago







  • 1




    $begingroup$
    Yes, that’s actually a big training dataset. To ensure that you are not underfitting or overfitting, you should check the performance of your model on the training and validation dataset, simultaneously. If it training score is low, you are underfitting. If training score is much higher than validation score, you are overfitting. The best case is when training and validation scores are close enough.
    $endgroup$
    – pythinker
    10 hours ago



















1












$begingroup$

I was taught the best way is to find the error for each k then plot them and look for the "elbow" on the plot.






share|improve this answer








New contributor




Stephen Ewing is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






$endgroup$












  • $begingroup$
    So I used go with k=131
    $endgroup$
    – user214
    12 hours ago










  • $begingroup$
    It really depends. The higher your k the higher your chance of overfitting. So if you do every k from 2 to 200 and plot the error of all of them you use the k where the curve starts to flatten out.
    $endgroup$
    – Stephen Ewing
    12 hours ago











Your Answer





StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48905%2fchoosing-k-value-in-knn-classifier%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























2 Answers
2






active

oldest

votes








2 Answers
2






active

oldest

votes









active

oldest

votes






active

oldest

votes









1












$begingroup$

Because knn is a non-parametric method, computational costs of choosing k, highly depends on the size of training data. If the size of training data is small, you can freely choose the k for which the best auc for validation dataset is achieved. In the case where you have a large training dataset, choosing large k can lead to huge computational complexity which is reflected in slow prediction for test data.






share|improve this answer









$endgroup$












  • $begingroup$
    does 100k rows and 8000 features qualify as big training data? Also choosing high k values means we are underfitting how can I know that i'm not underfitting when choosing high k values?
    $endgroup$
    – user214
    11 hours ago







  • 1




    $begingroup$
    Yes, that’s actually a big training dataset. To ensure that you are not underfitting or overfitting, you should check the performance of your model on the training and validation dataset, simultaneously. If it training score is low, you are underfitting. If training score is much higher than validation score, you are overfitting. The best case is when training and validation scores are close enough.
    $endgroup$
    – pythinker
    10 hours ago
















1












$begingroup$

Because knn is a non-parametric method, computational costs of choosing k, highly depends on the size of training data. If the size of training data is small, you can freely choose the k for which the best auc for validation dataset is achieved. In the case where you have a large training dataset, choosing large k can lead to huge computational complexity which is reflected in slow prediction for test data.






share|improve this answer









$endgroup$












  • $begingroup$
    does 100k rows and 8000 features qualify as big training data? Also choosing high k values means we are underfitting how can I know that i'm not underfitting when choosing high k values?
    $endgroup$
    – user214
    11 hours ago







  • 1




    $begingroup$
    Yes, that’s actually a big training dataset. To ensure that you are not underfitting or overfitting, you should check the performance of your model on the training and validation dataset, simultaneously. If it training score is low, you are underfitting. If training score is much higher than validation score, you are overfitting. The best case is when training and validation scores are close enough.
    $endgroup$
    – pythinker
    10 hours ago














1












1








1





$begingroup$

Because knn is a non-parametric method, computational costs of choosing k, highly depends on the size of training data. If the size of training data is small, you can freely choose the k for which the best auc for validation dataset is achieved. In the case where you have a large training dataset, choosing large k can lead to huge computational complexity which is reflected in slow prediction for test data.






share|improve this answer









$endgroup$



Because knn is a non-parametric method, computational costs of choosing k, highly depends on the size of training data. If the size of training data is small, you can freely choose the k for which the best auc for validation dataset is achieved. In the case where you have a large training dataset, choosing large k can lead to huge computational complexity which is reflected in slow prediction for test data.







share|improve this answer












share|improve this answer



share|improve this answer










answered 11 hours ago









pythinkerpythinker

5431211




5431211











  • $begingroup$
    does 100k rows and 8000 features qualify as big training data? Also choosing high k values means we are underfitting how can I know that i'm not underfitting when choosing high k values?
    $endgroup$
    – user214
    11 hours ago







  • 1




    $begingroup$
    Yes, that’s actually a big training dataset. To ensure that you are not underfitting or overfitting, you should check the performance of your model on the training and validation dataset, simultaneously. If it training score is low, you are underfitting. If training score is much higher than validation score, you are overfitting. The best case is when training and validation scores are close enough.
    $endgroup$
    – pythinker
    10 hours ago

















  • $begingroup$
    does 100k rows and 8000 features qualify as big training data? Also choosing high k values means we are underfitting how can I know that i'm not underfitting when choosing high k values?
    $endgroup$
    – user214
    11 hours ago







  • 1




    $begingroup$
    Yes, that’s actually a big training dataset. To ensure that you are not underfitting or overfitting, you should check the performance of your model on the training and validation dataset, simultaneously. If it training score is low, you are underfitting. If training score is much higher than validation score, you are overfitting. The best case is when training and validation scores are close enough.
    $endgroup$
    – pythinker
    10 hours ago
















$begingroup$
does 100k rows and 8000 features qualify as big training data? Also choosing high k values means we are underfitting how can I know that i'm not underfitting when choosing high k values?
$endgroup$
– user214
11 hours ago





$begingroup$
does 100k rows and 8000 features qualify as big training data? Also choosing high k values means we are underfitting how can I know that i'm not underfitting when choosing high k values?
$endgroup$
– user214
11 hours ago





1




1




$begingroup$
Yes, that’s actually a big training dataset. To ensure that you are not underfitting or overfitting, you should check the performance of your model on the training and validation dataset, simultaneously. If it training score is low, you are underfitting. If training score is much higher than validation score, you are overfitting. The best case is when training and validation scores are close enough.
$endgroup$
– pythinker
10 hours ago





$begingroup$
Yes, that’s actually a big training dataset. To ensure that you are not underfitting or overfitting, you should check the performance of your model on the training and validation dataset, simultaneously. If it training score is low, you are underfitting. If training score is much higher than validation score, you are overfitting. The best case is when training and validation scores are close enough.
$endgroup$
– pythinker
10 hours ago












1












$begingroup$

I was taught the best way is to find the error for each k then plot them and look for the "elbow" on the plot.






share|improve this answer








New contributor




Stephen Ewing is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






$endgroup$












  • $begingroup$
    So I used go with k=131
    $endgroup$
    – user214
    12 hours ago










  • $begingroup$
    It really depends. The higher your k the higher your chance of overfitting. So if you do every k from 2 to 200 and plot the error of all of them you use the k where the curve starts to flatten out.
    $endgroup$
    – Stephen Ewing
    12 hours ago















1












$begingroup$

I was taught the best way is to find the error for each k then plot them and look for the "elbow" on the plot.






share|improve this answer








New contributor




Stephen Ewing is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






$endgroup$












  • $begingroup$
    So I used go with k=131
    $endgroup$
    – user214
    12 hours ago










  • $begingroup$
    It really depends. The higher your k the higher your chance of overfitting. So if you do every k from 2 to 200 and plot the error of all of them you use the k where the curve starts to flatten out.
    $endgroup$
    – Stephen Ewing
    12 hours ago













1












1








1





$begingroup$

I was taught the best way is to find the error for each k then plot them and look for the "elbow" on the plot.






share|improve this answer








New contributor




Stephen Ewing is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






$endgroup$



I was taught the best way is to find the error for each k then plot them and look for the "elbow" on the plot.







share|improve this answer








New contributor




Stephen Ewing is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









share|improve this answer



share|improve this answer






New contributor




Stephen Ewing is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









answered 12 hours ago









Stephen EwingStephen Ewing

112




112




New contributor




Stephen Ewing is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





New contributor





Stephen Ewing is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






Stephen Ewing is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











  • $begingroup$
    So I used go with k=131
    $endgroup$
    – user214
    12 hours ago










  • $begingroup$
    It really depends. The higher your k the higher your chance of overfitting. So if you do every k from 2 to 200 and plot the error of all of them you use the k where the curve starts to flatten out.
    $endgroup$
    – Stephen Ewing
    12 hours ago
















  • $begingroup$
    So I used go with k=131
    $endgroup$
    – user214
    12 hours ago










  • $begingroup$
    It really depends. The higher your k the higher your chance of overfitting. So if you do every k from 2 to 200 and plot the error of all of them you use the k where the curve starts to flatten out.
    $endgroup$
    – Stephen Ewing
    12 hours ago















$begingroup$
So I used go with k=131
$endgroup$
– user214
12 hours ago




$begingroup$
So I used go with k=131
$endgroup$
– user214
12 hours ago












$begingroup$
It really depends. The higher your k the higher your chance of overfitting. So if you do every k from 2 to 200 and plot the error of all of them you use the k where the curve starts to flatten out.
$endgroup$
– Stephen Ewing
12 hours ago




$begingroup$
It really depends. The higher your k the higher your chance of overfitting. So if you do every k from 2 to 200 and plot the error of all of them you use the k where the curve starts to flatten out.
$endgroup$
– Stephen Ewing
12 hours ago

















draft saved

draft discarded
















































Thanks for contributing an answer to Data Science Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48905%2fchoosing-k-value-in-knn-classifier%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Францішак Багушэвіч Змест Сям'я | Біяграфія | Творчасць | Мова Багушэвіча | Ацэнкі дзейнасці | Цікавыя факты | Спадчына | Выбраная бібліяграфія | Ушанаванне памяці | У філатэліі | Зноскі | Літаратура | Спасылкі | НавігацыяЛяхоўскі У. Рупіўся дзеля Бога і людзей: Жыццёвы шлях Лявона Вітан-Дубейкаўскага // Вольскі і Памідораў з песняй пра немца Адвакат, паэт, народны заступнік Ашмянскі веснікВ Минске появится площадь Богушевича и улица Сырокомли, Белорусская деловая газета, 19 июля 2001 г.Айцец беларускай нацыянальнай ідэі паўстаў у бронзе Сяргей Аляксандравіч Адашкевіч (1918, Мінск). 80-я гады. Бюст «Францішак Багушэвіч».Яўген Мікалаевіч Ціхановіч. «Партрэт Францішка Багушэвіча»Мікола Мікалаевіч Купава. «Партрэт зачынальніка новай беларускай літаратуры Францішка Багушэвіча»Уладзімір Іванавіч Мелехаў. На помніку «Змагарам за родную мову» Барэльеф «Францішак Багушэвіч»Памяць пра Багушэвіча на Віленшчыне Страчаная сталіца. Беларускія шыльды на вуліцах Вільні«Krynica». Ideologia i przywódcy białoruskiego katolicyzmuФранцішак БагушэвічТворы на knihi.comТворы Францішка Багушэвіча на bellib.byСодаль Уладзімір. Францішак Багушэвіч на Лідчыне;Луцкевіч Антон. Жыцьцё і творчасьць Фр. Багушэвіча ў успамінах ягоных сучасьнікаў // Запісы Беларускага Навуковага таварыства. Вільня, 1938. Сшытак 1. С. 16-34.Большая российская1188761710000 0000 5537 633Xn9209310021619551927869394п

Partai Komunis Tiongkok Daftar isi Kepemimpinan | Pranala luar | Referensi | Menu navigasidiperiksa1 perubahan tertundacpc.people.com.cnSitus resmiSurat kabar resmi"Why the Communist Party is alive, well and flourishing in China"0307-1235"Full text of Constitution of Communist Party of China"smengembangkannyas

ValueError: Expected n_neighbors <= n_samples, but n_samples = 1, n_neighbors = 6 (SMOTE) The 2019 Stack Overflow Developer Survey Results Are InCan SMOTE be applied over sequence of words (sentences)?ValueError when doing validation with random forestsSMOTE and multi class oversamplingLogic behind SMOTE-NC?ValueError: Error when checking target: expected dense_1 to have shape (7,) but got array with shape (1,)SmoteBoost: Should SMOTE be ran individually for each iteration/tree in the boosting?solving multi-class imbalance classification using smote and OSSUsing SMOTE for Synthetic Data generation to improve performance on unbalanced dataproblem of entry format for a simple model in KerasSVM SMOTE fit_resample() function runs forever with no result