Intuition behind using non-hypercubic kernels in density estimation The 2019 Stack Overflow Developer Survey Results Are Inintuition behind the difference between likelihood function of discriminative and generative algorithmsPoisson point process application and terminology
How to support a colleague who finds meetings extremely tiring?
What to do when moving next to a bird sanctuary with a loosely-domesticated cat?
Old scifi movie from the 50s or 60s with men in solid red uniforms who interrogate a spy from the past
How to type this arrow in math mode?
Why is the maximum length of OpenWrt’s root password 8 characters?
How to translate "being like"?
"as much details as you can remember"
How to charge AirPods to keep battery healthy?
If I score a critical hit on an 18 or higher, what are my chances of getting a critical hit if I roll 3d20?
Likelihood that a superbug or lethal virus could come from a landfill
Can we generate random numbers using irrational numbers like π and e?
What could be the right powersource for 15 seconds lifespan disposable giant chainsaw?
Match Roman Numerals
Loose spokes after only a few rides
Why not take a picture of a closer black hole?
Is Cinnamon a desktop environment or a window manager? (Or both?)
Can a flute soloist sit?
What is preventing me from simply constructing a hash that's lower than the current target?
The phrase "to the numbers born"?
How come people say “Would of”?
How to notate time signature switching consistently every measure
Getting crown tickets for Statue of Liberty
What is this business jet?
Pokemon Turn Based battle (Python)
Intuition behind using non-hypercubic kernels in density estimation
The 2019 Stack Overflow Developer Survey Results Are Inintuition behind the difference between likelihood function of discriminative and generative algorithmsPoisson point process application and terminology
$begingroup$
Suppose that we perform density estimation in m-dimensional space: we estimate the value $p(a)$ for some point $a$ given observations $x_1, dots, x_n $.
It is known that if region $A subset mathbbR^m$ is "small" enough to consider density being constant on points from $A$ then we can make the following estimate:
$$ p(a) approx frack / nA $$
where $k$ is the number of observations that lie in $A$ and $|A|$ is Lebesgue measure of $A$.
Let parameter $h$ be small enough to consider density as constant inside hypercube centered at $a$ with side length equal to $h$. The volume of this hypercube is equal to $h^m$ and point $x$ lies inside this hypercube iff $K(fracx-ah) = 1$ where
$$K(u) =cases
1textfracu^k - a^khcr
0text, otherwise
$$
It's easy to see that the number of observations inside this hypercube equals to
$$k = sum_i = 1^n K(fracx-ah)$$
and so the estimation described above gets the following form:
$$p(a) approx frac1n h^m sum_i = 1^n K(fracx-ah) $$
We can interpret $K$ as "weight" given to particular observations and one of the drawbacks of hypercubic approach is that all observations lying inside hypercube have equal weights despite having different distances from $a$. Yet another drawback is that the resulting estimate is not continuous. That's what i understand to be the main reason of using non-hypercubic kernels such as gaussian kernel which give more weight to points close to $a$ and yields continuous estimate.
But i have troubles with interpreting the usage of such kernels. The sum $sum_i = 1^n K(fracx-ah)$ is no longer equal to $k$ so we can't justify the usage of these kernels by formula $p(a) approx frack / nA $. Finally here are my questions: how do we justify the usage of smooth kernels? how can one interpret this usage?
Thank you for any ideas.
probability
$endgroup$
bumped to the homepage by Community♦ 37 mins ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
add a comment |
$begingroup$
Suppose that we perform density estimation in m-dimensional space: we estimate the value $p(a)$ for some point $a$ given observations $x_1, dots, x_n $.
It is known that if region $A subset mathbbR^m$ is "small" enough to consider density being constant on points from $A$ then we can make the following estimate:
$$ p(a) approx frack / nA $$
where $k$ is the number of observations that lie in $A$ and $|A|$ is Lebesgue measure of $A$.
Let parameter $h$ be small enough to consider density as constant inside hypercube centered at $a$ with side length equal to $h$. The volume of this hypercube is equal to $h^m$ and point $x$ lies inside this hypercube iff $K(fracx-ah) = 1$ where
$$K(u) =cases
1textfracu^k - a^khcr
0text, otherwise
$$
It's easy to see that the number of observations inside this hypercube equals to
$$k = sum_i = 1^n K(fracx-ah)$$
and so the estimation described above gets the following form:
$$p(a) approx frac1n h^m sum_i = 1^n K(fracx-ah) $$
We can interpret $K$ as "weight" given to particular observations and one of the drawbacks of hypercubic approach is that all observations lying inside hypercube have equal weights despite having different distances from $a$. Yet another drawback is that the resulting estimate is not continuous. That's what i understand to be the main reason of using non-hypercubic kernels such as gaussian kernel which give more weight to points close to $a$ and yields continuous estimate.
But i have troubles with interpreting the usage of such kernels. The sum $sum_i = 1^n K(fracx-ah)$ is no longer equal to $k$ so we can't justify the usage of these kernels by formula $p(a) approx frack / nA $. Finally here are my questions: how do we justify the usage of smooth kernels? how can one interpret this usage?
Thank you for any ideas.
probability
$endgroup$
bumped to the homepage by Community♦ 37 mins ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
add a comment |
$begingroup$
Suppose that we perform density estimation in m-dimensional space: we estimate the value $p(a)$ for some point $a$ given observations $x_1, dots, x_n $.
It is known that if region $A subset mathbbR^m$ is "small" enough to consider density being constant on points from $A$ then we can make the following estimate:
$$ p(a) approx frack / nA $$
where $k$ is the number of observations that lie in $A$ and $|A|$ is Lebesgue measure of $A$.
Let parameter $h$ be small enough to consider density as constant inside hypercube centered at $a$ with side length equal to $h$. The volume of this hypercube is equal to $h^m$ and point $x$ lies inside this hypercube iff $K(fracx-ah) = 1$ where
$$K(u) =cases
1textfracu^k - a^khcr
0text, otherwise
$$
It's easy to see that the number of observations inside this hypercube equals to
$$k = sum_i = 1^n K(fracx-ah)$$
and so the estimation described above gets the following form:
$$p(a) approx frac1n h^m sum_i = 1^n K(fracx-ah) $$
We can interpret $K$ as "weight" given to particular observations and one of the drawbacks of hypercubic approach is that all observations lying inside hypercube have equal weights despite having different distances from $a$. Yet another drawback is that the resulting estimate is not continuous. That's what i understand to be the main reason of using non-hypercubic kernels such as gaussian kernel which give more weight to points close to $a$ and yields continuous estimate.
But i have troubles with interpreting the usage of such kernels. The sum $sum_i = 1^n K(fracx-ah)$ is no longer equal to $k$ so we can't justify the usage of these kernels by formula $p(a) approx frack / nA $. Finally here are my questions: how do we justify the usage of smooth kernels? how can one interpret this usage?
Thank you for any ideas.
probability
$endgroup$
Suppose that we perform density estimation in m-dimensional space: we estimate the value $p(a)$ for some point $a$ given observations $x_1, dots, x_n $.
It is known that if region $A subset mathbbR^m$ is "small" enough to consider density being constant on points from $A$ then we can make the following estimate:
$$ p(a) approx frack / nA $$
where $k$ is the number of observations that lie in $A$ and $|A|$ is Lebesgue measure of $A$.
Let parameter $h$ be small enough to consider density as constant inside hypercube centered at $a$ with side length equal to $h$. The volume of this hypercube is equal to $h^m$ and point $x$ lies inside this hypercube iff $K(fracx-ah) = 1$ where
$$K(u) =cases
1textfracu^k - a^khcr
0text, otherwise
$$
It's easy to see that the number of observations inside this hypercube equals to
$$k = sum_i = 1^n K(fracx-ah)$$
and so the estimation described above gets the following form:
$$p(a) approx frac1n h^m sum_i = 1^n K(fracx-ah) $$
We can interpret $K$ as "weight" given to particular observations and one of the drawbacks of hypercubic approach is that all observations lying inside hypercube have equal weights despite having different distances from $a$. Yet another drawback is that the resulting estimate is not continuous. That's what i understand to be the main reason of using non-hypercubic kernels such as gaussian kernel which give more weight to points close to $a$ and yields continuous estimate.
But i have troubles with interpreting the usage of such kernels. The sum $sum_i = 1^n K(fracx-ah)$ is no longer equal to $k$ so we can't justify the usage of these kernels by formula $p(a) approx frack / nA $. Finally here are my questions: how do we justify the usage of smooth kernels? how can one interpret this usage?
Thank you for any ideas.
probability
probability
asked Jan 16 '18 at 18:36
IgorIgor
1144
1144
bumped to the homepage by Community♦ 37 mins ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
bumped to the homepage by Community♦ 37 mins ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
$begingroup$
Histograms and methods based on binning have a number of well-known problems. Different anchor points etc. can introduce artificial patterns that make interpretation unreliable. Smooth kernels don't use a grid and thus smooth out the noise.
This also has the advantage that it makes it easier to get a single overall picture of the data because it takes into account neighboring points and smooths the data into areas where no data is observed.
Smooth kernels can also be justified by their favorable statistical properties. Popular methods like fastKDE use the fact that one can find "an empirical kernel that is optimal in the sense that the integrated, squared difference between the resulting KDE and the true PDF is minimized."
$endgroup$
add a comment |
$begingroup$
If we're estimating a continious distribution's density, perhaps we should introduce an integral in here right? A kernel estimate should be such that $int_-infty^inftyK(x)dx = 1$. Therefore, it should be relatively easy to see that an estimate for $f(x)$ called $hatf(x)$ should have the following:
$int_-infty^inftyhatf(x)dx = frac1nsum_j=1^nfrac1hK(fracx-ah) $
$= frac1nsum_j=1^n1 = 1$. Naturally since, the kernal and the estimate for the pdf are greater than 1, then our hat function is also a probability density function.
Now for a bit more detail: $hatf(x)$ is usually derived from a definition of the derivative of the emperical CDF. So instead of justifying it via the way you would a parzen window, you instead just justify it from what it means to be a pdf and what you want a good estimate for that pdf to be.
edit: With regards to knn and your estimator. I think it's also important to realize that the for any fixed point the nearest neighhor estiamte is the kernel estimate. However, it is different estimate for each point. The kernel still remains an estimate because each individual estimate is a density so overall the kernel is a linear combination of densities. Furthermore the coefficients for the k estimates will sum up to 1.
$endgroup$
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
);
);
, "mathjax-editing");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f26706%2fintuition-behind-using-non-hypercubic-kernels-in-density-estimation%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
Histograms and methods based on binning have a number of well-known problems. Different anchor points etc. can introduce artificial patterns that make interpretation unreliable. Smooth kernels don't use a grid and thus smooth out the noise.
This also has the advantage that it makes it easier to get a single overall picture of the data because it takes into account neighboring points and smooths the data into areas where no data is observed.
Smooth kernels can also be justified by their favorable statistical properties. Popular methods like fastKDE use the fact that one can find "an empirical kernel that is optimal in the sense that the integrated, squared difference between the resulting KDE and the true PDF is minimized."
$endgroup$
add a comment |
$begingroup$
Histograms and methods based on binning have a number of well-known problems. Different anchor points etc. can introduce artificial patterns that make interpretation unreliable. Smooth kernels don't use a grid and thus smooth out the noise.
This also has the advantage that it makes it easier to get a single overall picture of the data because it takes into account neighboring points and smooths the data into areas where no data is observed.
Smooth kernels can also be justified by their favorable statistical properties. Popular methods like fastKDE use the fact that one can find "an empirical kernel that is optimal in the sense that the integrated, squared difference between the resulting KDE and the true PDF is minimized."
$endgroup$
add a comment |
$begingroup$
Histograms and methods based on binning have a number of well-known problems. Different anchor points etc. can introduce artificial patterns that make interpretation unreliable. Smooth kernels don't use a grid and thus smooth out the noise.
This also has the advantage that it makes it easier to get a single overall picture of the data because it takes into account neighboring points and smooths the data into areas where no data is observed.
Smooth kernels can also be justified by their favorable statistical properties. Popular methods like fastKDE use the fact that one can find "an empirical kernel that is optimal in the sense that the integrated, squared difference between the resulting KDE and the true PDF is minimized."
$endgroup$
Histograms and methods based on binning have a number of well-known problems. Different anchor points etc. can introduce artificial patterns that make interpretation unreliable. Smooth kernels don't use a grid and thus smooth out the noise.
This also has the advantage that it makes it easier to get a single overall picture of the data because it takes into account neighboring points and smooths the data into areas where no data is observed.
Smooth kernels can also be justified by their favorable statistical properties. Popular methods like fastKDE use the fact that one can find "an empirical kernel that is optimal in the sense that the integrated, squared difference between the resulting KDE and the true PDF is minimized."
answered Jan 16 '18 at 19:28
oW_♦oW_
3,306933
3,306933
add a comment |
add a comment |
$begingroup$
If we're estimating a continious distribution's density, perhaps we should introduce an integral in here right? A kernel estimate should be such that $int_-infty^inftyK(x)dx = 1$. Therefore, it should be relatively easy to see that an estimate for $f(x)$ called $hatf(x)$ should have the following:
$int_-infty^inftyhatf(x)dx = frac1nsum_j=1^nfrac1hK(fracx-ah) $
$= frac1nsum_j=1^n1 = 1$. Naturally since, the kernal and the estimate for the pdf are greater than 1, then our hat function is also a probability density function.
Now for a bit more detail: $hatf(x)$ is usually derived from a definition of the derivative of the emperical CDF. So instead of justifying it via the way you would a parzen window, you instead just justify it from what it means to be a pdf and what you want a good estimate for that pdf to be.
edit: With regards to knn and your estimator. I think it's also important to realize that the for any fixed point the nearest neighhor estiamte is the kernel estimate. However, it is different estimate for each point. The kernel still remains an estimate because each individual estimate is a density so overall the kernel is a linear combination of densities. Furthermore the coefficients for the k estimates will sum up to 1.
$endgroup$
add a comment |
$begingroup$
If we're estimating a continious distribution's density, perhaps we should introduce an integral in here right? A kernel estimate should be such that $int_-infty^inftyK(x)dx = 1$. Therefore, it should be relatively easy to see that an estimate for $f(x)$ called $hatf(x)$ should have the following:
$int_-infty^inftyhatf(x)dx = frac1nsum_j=1^nfrac1hK(fracx-ah) $
$= frac1nsum_j=1^n1 = 1$. Naturally since, the kernal and the estimate for the pdf are greater than 1, then our hat function is also a probability density function.
Now for a bit more detail: $hatf(x)$ is usually derived from a definition of the derivative of the emperical CDF. So instead of justifying it via the way you would a parzen window, you instead just justify it from what it means to be a pdf and what you want a good estimate for that pdf to be.
edit: With regards to knn and your estimator. I think it's also important to realize that the for any fixed point the nearest neighhor estiamte is the kernel estimate. However, it is different estimate for each point. The kernel still remains an estimate because each individual estimate is a density so overall the kernel is a linear combination of densities. Furthermore the coefficients for the k estimates will sum up to 1.
$endgroup$
add a comment |
$begingroup$
If we're estimating a continious distribution's density, perhaps we should introduce an integral in here right? A kernel estimate should be such that $int_-infty^inftyK(x)dx = 1$. Therefore, it should be relatively easy to see that an estimate for $f(x)$ called $hatf(x)$ should have the following:
$int_-infty^inftyhatf(x)dx = frac1nsum_j=1^nfrac1hK(fracx-ah) $
$= frac1nsum_j=1^n1 = 1$. Naturally since, the kernal and the estimate for the pdf are greater than 1, then our hat function is also a probability density function.
Now for a bit more detail: $hatf(x)$ is usually derived from a definition of the derivative of the emperical CDF. So instead of justifying it via the way you would a parzen window, you instead just justify it from what it means to be a pdf and what you want a good estimate for that pdf to be.
edit: With regards to knn and your estimator. I think it's also important to realize that the for any fixed point the nearest neighhor estiamte is the kernel estimate. However, it is different estimate for each point. The kernel still remains an estimate because each individual estimate is a density so overall the kernel is a linear combination of densities. Furthermore the coefficients for the k estimates will sum up to 1.
$endgroup$
If we're estimating a continious distribution's density, perhaps we should introduce an integral in here right? A kernel estimate should be such that $int_-infty^inftyK(x)dx = 1$. Therefore, it should be relatively easy to see that an estimate for $f(x)$ called $hatf(x)$ should have the following:
$int_-infty^inftyhatf(x)dx = frac1nsum_j=1^nfrac1hK(fracx-ah) $
$= frac1nsum_j=1^n1 = 1$. Naturally since, the kernal and the estimate for the pdf are greater than 1, then our hat function is also a probability density function.
Now for a bit more detail: $hatf(x)$ is usually derived from a definition of the derivative of the emperical CDF. So instead of justifying it via the way you would a parzen window, you instead just justify it from what it means to be a pdf and what you want a good estimate for that pdf to be.
edit: With regards to knn and your estimator. I think it's also important to realize that the for any fixed point the nearest neighhor estiamte is the kernel estimate. However, it is different estimate for each point. The kernel still remains an estimate because each individual estimate is a density so overall the kernel is a linear combination of densities. Furthermore the coefficients for the k estimates will sum up to 1.
edited Jan 16 '18 at 20:24
answered Jan 16 '18 at 20:09
TophatTophat
1,382212
1,382212
add a comment |
add a comment |
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f26706%2fintuition-behind-using-non-hypercubic-kernels-in-density-estimation%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown