Intuition behind using non-hypercubic kernels in density estimation The 2019 Stack Overflow Developer Survey Results Are Inintuition behind the difference between likelihood function of discriminative and generative algorithmsPoisson point process application and terminology

How to support a colleague who finds meetings extremely tiring?

What to do when moving next to a bird sanctuary with a loosely-domesticated cat?

Old scifi movie from the 50s or 60s with men in solid red uniforms who interrogate a spy from the past

How to type this arrow in math mode?

Why is the maximum length of OpenWrt’s root password 8 characters?

How to translate "being like"?

"as much details as you can remember"

How to charge AirPods to keep battery healthy?

If I score a critical hit on an 18 or higher, what are my chances of getting a critical hit if I roll 3d20?

Likelihood that a superbug or lethal virus could come from a landfill

Can we generate random numbers using irrational numbers like π and e?

What could be the right powersource for 15 seconds lifespan disposable giant chainsaw?

Match Roman Numerals

Loose spokes after only a few rides

Why not take a picture of a closer black hole?

Is Cinnamon a desktop environment or a window manager? (Or both?)

Can a flute soloist sit?

What is preventing me from simply constructing a hash that's lower than the current target?

The phrase "to the numbers born"?

How come people say “Would of”?

How to notate time signature switching consistently every measure

Getting crown tickets for Statue of Liberty

What is this business jet?

Pokemon Turn Based battle (Python)



Intuition behind using non-hypercubic kernels in density estimation



The 2019 Stack Overflow Developer Survey Results Are Inintuition behind the difference between likelihood function of discriminative and generative algorithmsPoisson point process application and terminology










1












$begingroup$


Suppose that we perform density estimation in m-dimensional space: we estimate the value $p(a)$ for some point $a$ given observations $x_1, dots, x_n $.



It is known that if region $A subset mathbbR^m$ is "small" enough to consider density being constant on points from $A$ then we can make the following estimate:
$$ p(a) approx frack / nA $$
where $k$ is the number of observations that lie in $A$ and $|A|$ is Lebesgue measure of $A$.



Let parameter $h$ be small enough to consider density as constant inside hypercube centered at $a$ with side length equal to $h$. The volume of this hypercube is equal to $h^m$ and point $x$ lies inside this hypercube iff $K(fracx-ah) = 1$ where
$$K(u) =cases
1textfracu^k - a^khcr
0text, otherwise
$$
It's easy to see that the number of observations inside this hypercube equals to
$$k = sum_i = 1^n K(fracx-ah)$$
and so the estimation described above gets the following form:
$$p(a) approx frac1n h^m sum_i = 1^n K(fracx-ah) $$



We can interpret $K$ as "weight" given to particular observations and one of the drawbacks of hypercubic approach is that all observations lying inside hypercube have equal weights despite having different distances from $a$. Yet another drawback is that the resulting estimate is not continuous. That's what i understand to be the main reason of using non-hypercubic kernels such as gaussian kernel which give more weight to points close to $a$ and yields continuous estimate.



But i have troubles with interpreting the usage of such kernels. The sum $sum_i = 1^n K(fracx-ah)$ is no longer equal to $k$ so we can't justify the usage of these kernels by formula $p(a) approx frack / nA $. Finally here are my questions: how do we justify the usage of smooth kernels? how can one interpret this usage?



Thank you for any ideas.










share|improve this question









$endgroup$




bumped to the homepage by Community 37 mins ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.



















    1












    $begingroup$


    Suppose that we perform density estimation in m-dimensional space: we estimate the value $p(a)$ for some point $a$ given observations $x_1, dots, x_n $.



    It is known that if region $A subset mathbbR^m$ is "small" enough to consider density being constant on points from $A$ then we can make the following estimate:
    $$ p(a) approx frack / nA $$
    where $k$ is the number of observations that lie in $A$ and $|A|$ is Lebesgue measure of $A$.



    Let parameter $h$ be small enough to consider density as constant inside hypercube centered at $a$ with side length equal to $h$. The volume of this hypercube is equal to $h^m$ and point $x$ lies inside this hypercube iff $K(fracx-ah) = 1$ where
    $$K(u) =cases
    1textfracu^k - a^khcr
    0text, otherwise
    $$
    It's easy to see that the number of observations inside this hypercube equals to
    $$k = sum_i = 1^n K(fracx-ah)$$
    and so the estimation described above gets the following form:
    $$p(a) approx frac1n h^m sum_i = 1^n K(fracx-ah) $$



    We can interpret $K$ as "weight" given to particular observations and one of the drawbacks of hypercubic approach is that all observations lying inside hypercube have equal weights despite having different distances from $a$. Yet another drawback is that the resulting estimate is not continuous. That's what i understand to be the main reason of using non-hypercubic kernels such as gaussian kernel which give more weight to points close to $a$ and yields continuous estimate.



    But i have troubles with interpreting the usage of such kernels. The sum $sum_i = 1^n K(fracx-ah)$ is no longer equal to $k$ so we can't justify the usage of these kernels by formula $p(a) approx frack / nA $. Finally here are my questions: how do we justify the usage of smooth kernels? how can one interpret this usage?



    Thank you for any ideas.










    share|improve this question









    $endgroup$




    bumped to the homepage by Community 37 mins ago


    This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.

















      1












      1








      1


      0



      $begingroup$


      Suppose that we perform density estimation in m-dimensional space: we estimate the value $p(a)$ for some point $a$ given observations $x_1, dots, x_n $.



      It is known that if region $A subset mathbbR^m$ is "small" enough to consider density being constant on points from $A$ then we can make the following estimate:
      $$ p(a) approx frack / nA $$
      where $k$ is the number of observations that lie in $A$ and $|A|$ is Lebesgue measure of $A$.



      Let parameter $h$ be small enough to consider density as constant inside hypercube centered at $a$ with side length equal to $h$. The volume of this hypercube is equal to $h^m$ and point $x$ lies inside this hypercube iff $K(fracx-ah) = 1$ where
      $$K(u) =cases
      1textfracu^k - a^khcr
      0text, otherwise
      $$
      It's easy to see that the number of observations inside this hypercube equals to
      $$k = sum_i = 1^n K(fracx-ah)$$
      and so the estimation described above gets the following form:
      $$p(a) approx frac1n h^m sum_i = 1^n K(fracx-ah) $$



      We can interpret $K$ as "weight" given to particular observations and one of the drawbacks of hypercubic approach is that all observations lying inside hypercube have equal weights despite having different distances from $a$. Yet another drawback is that the resulting estimate is not continuous. That's what i understand to be the main reason of using non-hypercubic kernels such as gaussian kernel which give more weight to points close to $a$ and yields continuous estimate.



      But i have troubles with interpreting the usage of such kernels. The sum $sum_i = 1^n K(fracx-ah)$ is no longer equal to $k$ so we can't justify the usage of these kernels by formula $p(a) approx frack / nA $. Finally here are my questions: how do we justify the usage of smooth kernels? how can one interpret this usage?



      Thank you for any ideas.










      share|improve this question









      $endgroup$




      Suppose that we perform density estimation in m-dimensional space: we estimate the value $p(a)$ for some point $a$ given observations $x_1, dots, x_n $.



      It is known that if region $A subset mathbbR^m$ is "small" enough to consider density being constant on points from $A$ then we can make the following estimate:
      $$ p(a) approx frack / nA $$
      where $k$ is the number of observations that lie in $A$ and $|A|$ is Lebesgue measure of $A$.



      Let parameter $h$ be small enough to consider density as constant inside hypercube centered at $a$ with side length equal to $h$. The volume of this hypercube is equal to $h^m$ and point $x$ lies inside this hypercube iff $K(fracx-ah) = 1$ where
      $$K(u) =cases
      1textfracu^k - a^khcr
      0text, otherwise
      $$
      It's easy to see that the number of observations inside this hypercube equals to
      $$k = sum_i = 1^n K(fracx-ah)$$
      and so the estimation described above gets the following form:
      $$p(a) approx frac1n h^m sum_i = 1^n K(fracx-ah) $$



      We can interpret $K$ as "weight" given to particular observations and one of the drawbacks of hypercubic approach is that all observations lying inside hypercube have equal weights despite having different distances from $a$. Yet another drawback is that the resulting estimate is not continuous. That's what i understand to be the main reason of using non-hypercubic kernels such as gaussian kernel which give more weight to points close to $a$ and yields continuous estimate.



      But i have troubles with interpreting the usage of such kernels. The sum $sum_i = 1^n K(fracx-ah)$ is no longer equal to $k$ so we can't justify the usage of these kernels by formula $p(a) approx frack / nA $. Finally here are my questions: how do we justify the usage of smooth kernels? how can one interpret this usage?



      Thank you for any ideas.







      probability






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Jan 16 '18 at 18:36









      IgorIgor

      1144




      1144





      bumped to the homepage by Community 37 mins ago


      This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.







      bumped to the homepage by Community 37 mins ago


      This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.






















          2 Answers
          2






          active

          oldest

          votes


















          0












          $begingroup$

          Histograms and methods based on binning have a number of well-known problems. Different anchor points etc. can introduce artificial patterns that make interpretation unreliable. Smooth kernels don't use a grid and thus smooth out the noise.



          This also has the advantage that it makes it easier to get a single overall picture of the data because it takes into account neighboring points and smooths the data into areas where no data is observed.



          Smooth kernels can also be justified by their favorable statistical properties. Popular methods like fastKDE use the fact that one can find "an empirical kernel that is optimal in the sense that the integrated, squared difference between the resulting KDE and the true PDF is minimized."






          share|improve this answer









          $endgroup$




















            0












            $begingroup$

            If we're estimating a continious distribution's density, perhaps we should introduce an integral in here right? A kernel estimate should be such that $int_-infty^inftyK(x)dx = 1$. Therefore, it should be relatively easy to see that an estimate for $f(x)$ called $hatf(x)$ should have the following:



            $int_-infty^inftyhatf(x)dx = frac1nsum_j=1^nfrac1hK(fracx-ah) $
            $= frac1nsum_j=1^n1 = 1$. Naturally since, the kernal and the estimate for the pdf are greater than 1, then our hat function is also a probability density function.



            Now for a bit more detail: $hatf(x)$ is usually derived from a definition of the derivative of the emperical CDF. So instead of justifying it via the way you would a parzen window, you instead just justify it from what it means to be a pdf and what you want a good estimate for that pdf to be.



            edit: With regards to knn and your estimator. I think it's also important to realize that the for any fixed point the nearest neighhor estiamte is the kernel estimate. However, it is different estimate for each point. The kernel still remains an estimate because each individual estimate is a density so overall the kernel is a linear combination of densities. Furthermore the coefficients for the k estimates will sum up to 1.






            share|improve this answer











            $endgroup$













              Your Answer





              StackExchange.ifUsing("editor", function ()
              return StackExchange.using("mathjaxEditing", function ()
              StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
              StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
              );
              );
              , "mathjax-editing");

              StackExchange.ready(function()
              var channelOptions =
              tags: "".split(" "),
              id: "557"
              ;
              initTagRenderer("".split(" "), "".split(" "), channelOptions);

              StackExchange.using("externalEditor", function()
              // Have to fire editor after snippets, if snippets enabled
              if (StackExchange.settings.snippets.snippetsEnabled)
              StackExchange.using("snippets", function()
              createEditor();
              );

              else
              createEditor();

              );

              function createEditor()
              StackExchange.prepareEditor(
              heartbeatType: 'answer',
              autoActivateHeartbeat: false,
              convertImagesToLinks: false,
              noModals: true,
              showLowRepImageUploadWarning: true,
              reputationToPostImages: null,
              bindNavPrevention: true,
              postfix: "",
              imageUploader:
              brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
              contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
              allowUrls: true
              ,
              onDemand: true,
              discardSelector: ".discard-answer"
              ,immediatelyShowMarkdownHelp:true
              );



              );













              draft saved

              draft discarded


















              StackExchange.ready(
              function ()
              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f26706%2fintuition-behind-using-non-hypercubic-kernels-in-density-estimation%23new-answer', 'question_page');

              );

              Post as a guest















              Required, but never shown

























              2 Answers
              2






              active

              oldest

              votes








              2 Answers
              2






              active

              oldest

              votes









              active

              oldest

              votes






              active

              oldest

              votes









              0












              $begingroup$

              Histograms and methods based on binning have a number of well-known problems. Different anchor points etc. can introduce artificial patterns that make interpretation unreliable. Smooth kernels don't use a grid and thus smooth out the noise.



              This also has the advantage that it makes it easier to get a single overall picture of the data because it takes into account neighboring points and smooths the data into areas where no data is observed.



              Smooth kernels can also be justified by their favorable statistical properties. Popular methods like fastKDE use the fact that one can find "an empirical kernel that is optimal in the sense that the integrated, squared difference between the resulting KDE and the true PDF is minimized."






              share|improve this answer









              $endgroup$

















                0












                $begingroup$

                Histograms and methods based on binning have a number of well-known problems. Different anchor points etc. can introduce artificial patterns that make interpretation unreliable. Smooth kernels don't use a grid and thus smooth out the noise.



                This also has the advantage that it makes it easier to get a single overall picture of the data because it takes into account neighboring points and smooths the data into areas where no data is observed.



                Smooth kernels can also be justified by their favorable statistical properties. Popular methods like fastKDE use the fact that one can find "an empirical kernel that is optimal in the sense that the integrated, squared difference between the resulting KDE and the true PDF is minimized."






                share|improve this answer









                $endgroup$















                  0












                  0








                  0





                  $begingroup$

                  Histograms and methods based on binning have a number of well-known problems. Different anchor points etc. can introduce artificial patterns that make interpretation unreliable. Smooth kernels don't use a grid and thus smooth out the noise.



                  This also has the advantage that it makes it easier to get a single overall picture of the data because it takes into account neighboring points and smooths the data into areas where no data is observed.



                  Smooth kernels can also be justified by their favorable statistical properties. Popular methods like fastKDE use the fact that one can find "an empirical kernel that is optimal in the sense that the integrated, squared difference between the resulting KDE and the true PDF is minimized."






                  share|improve this answer









                  $endgroup$



                  Histograms and methods based on binning have a number of well-known problems. Different anchor points etc. can introduce artificial patterns that make interpretation unreliable. Smooth kernels don't use a grid and thus smooth out the noise.



                  This also has the advantage that it makes it easier to get a single overall picture of the data because it takes into account neighboring points and smooths the data into areas where no data is observed.



                  Smooth kernels can also be justified by their favorable statistical properties. Popular methods like fastKDE use the fact that one can find "an empirical kernel that is optimal in the sense that the integrated, squared difference between the resulting KDE and the true PDF is minimized."







                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Jan 16 '18 at 19:28









                  oW_oW_

                  3,306933




                  3,306933





















                      0












                      $begingroup$

                      If we're estimating a continious distribution's density, perhaps we should introduce an integral in here right? A kernel estimate should be such that $int_-infty^inftyK(x)dx = 1$. Therefore, it should be relatively easy to see that an estimate for $f(x)$ called $hatf(x)$ should have the following:



                      $int_-infty^inftyhatf(x)dx = frac1nsum_j=1^nfrac1hK(fracx-ah) $
                      $= frac1nsum_j=1^n1 = 1$. Naturally since, the kernal and the estimate for the pdf are greater than 1, then our hat function is also a probability density function.



                      Now for a bit more detail: $hatf(x)$ is usually derived from a definition of the derivative of the emperical CDF. So instead of justifying it via the way you would a parzen window, you instead just justify it from what it means to be a pdf and what you want a good estimate for that pdf to be.



                      edit: With regards to knn and your estimator. I think it's also important to realize that the for any fixed point the nearest neighhor estiamte is the kernel estimate. However, it is different estimate for each point. The kernel still remains an estimate because each individual estimate is a density so overall the kernel is a linear combination of densities. Furthermore the coefficients for the k estimates will sum up to 1.






                      share|improve this answer











                      $endgroup$

















                        0












                        $begingroup$

                        If we're estimating a continious distribution's density, perhaps we should introduce an integral in here right? A kernel estimate should be such that $int_-infty^inftyK(x)dx = 1$. Therefore, it should be relatively easy to see that an estimate for $f(x)$ called $hatf(x)$ should have the following:



                        $int_-infty^inftyhatf(x)dx = frac1nsum_j=1^nfrac1hK(fracx-ah) $
                        $= frac1nsum_j=1^n1 = 1$. Naturally since, the kernal and the estimate for the pdf are greater than 1, then our hat function is also a probability density function.



                        Now for a bit more detail: $hatf(x)$ is usually derived from a definition of the derivative of the emperical CDF. So instead of justifying it via the way you would a parzen window, you instead just justify it from what it means to be a pdf and what you want a good estimate for that pdf to be.



                        edit: With regards to knn and your estimator. I think it's also important to realize that the for any fixed point the nearest neighhor estiamte is the kernel estimate. However, it is different estimate for each point. The kernel still remains an estimate because each individual estimate is a density so overall the kernel is a linear combination of densities. Furthermore the coefficients for the k estimates will sum up to 1.






                        share|improve this answer











                        $endgroup$















                          0












                          0








                          0





                          $begingroup$

                          If we're estimating a continious distribution's density, perhaps we should introduce an integral in here right? A kernel estimate should be such that $int_-infty^inftyK(x)dx = 1$. Therefore, it should be relatively easy to see that an estimate for $f(x)$ called $hatf(x)$ should have the following:



                          $int_-infty^inftyhatf(x)dx = frac1nsum_j=1^nfrac1hK(fracx-ah) $
                          $= frac1nsum_j=1^n1 = 1$. Naturally since, the kernal and the estimate for the pdf are greater than 1, then our hat function is also a probability density function.



                          Now for a bit more detail: $hatf(x)$ is usually derived from a definition of the derivative of the emperical CDF. So instead of justifying it via the way you would a parzen window, you instead just justify it from what it means to be a pdf and what you want a good estimate for that pdf to be.



                          edit: With regards to knn and your estimator. I think it's also important to realize that the for any fixed point the nearest neighhor estiamte is the kernel estimate. However, it is different estimate for each point. The kernel still remains an estimate because each individual estimate is a density so overall the kernel is a linear combination of densities. Furthermore the coefficients for the k estimates will sum up to 1.






                          share|improve this answer











                          $endgroup$



                          If we're estimating a continious distribution's density, perhaps we should introduce an integral in here right? A kernel estimate should be such that $int_-infty^inftyK(x)dx = 1$. Therefore, it should be relatively easy to see that an estimate for $f(x)$ called $hatf(x)$ should have the following:



                          $int_-infty^inftyhatf(x)dx = frac1nsum_j=1^nfrac1hK(fracx-ah) $
                          $= frac1nsum_j=1^n1 = 1$. Naturally since, the kernal and the estimate for the pdf are greater than 1, then our hat function is also a probability density function.



                          Now for a bit more detail: $hatf(x)$ is usually derived from a definition of the derivative of the emperical CDF. So instead of justifying it via the way you would a parzen window, you instead just justify it from what it means to be a pdf and what you want a good estimate for that pdf to be.



                          edit: With regards to knn and your estimator. I think it's also important to realize that the for any fixed point the nearest neighhor estiamte is the kernel estimate. However, it is different estimate for each point. The kernel still remains an estimate because each individual estimate is a density so overall the kernel is a linear combination of densities. Furthermore the coefficients for the k estimates will sum up to 1.







                          share|improve this answer














                          share|improve this answer



                          share|improve this answer








                          edited Jan 16 '18 at 20:24

























                          answered Jan 16 '18 at 20:09









                          TophatTophat

                          1,382212




                          1,382212



























                              draft saved

                              draft discarded
















































                              Thanks for contributing an answer to Data Science Stack Exchange!


                              • Please be sure to answer the question. Provide details and share your research!

                              But avoid


                              • Asking for help, clarification, or responding to other answers.

                              • Making statements based on opinion; back them up with references or personal experience.

                              Use MathJax to format equations. MathJax reference.


                              To learn more, see our tips on writing great answers.




                              draft saved


                              draft discarded














                              StackExchange.ready(
                              function ()
                              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f26706%2fintuition-behind-using-non-hypercubic-kernels-in-density-estimation%23new-answer', 'question_page');

                              );

                              Post as a guest















                              Required, but never shown





















































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown

































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown







                              Popular posts from this blog

                              ValueError: Error when checking input: expected conv2d_13_input to have shape (3, 150, 150) but got array with shape (150, 150, 3)2019 Community Moderator ElectionError when checking : expected dense_1_input to have shape (None, 5) but got array with shape (200, 1)Error 'Expected 2D array, got 1D array instead:'ValueError: Error when checking input: expected lstm_41_input to have 3 dimensions, but got array with shape (40000,100)ValueError: Error when checking target: expected dense_1 to have shape (7,) but got array with shape (1,)ValueError: Error when checking target: expected dense_2 to have shape (1,) but got array with shape (0,)Keras exception: ValueError: Error when checking input: expected conv2d_1_input to have shape (150, 150, 3) but got array with shape (256, 256, 3)Steps taking too long to completewhen checking input: expected dense_1_input to have shape (13328,) but got array with shape (317,)ValueError: Error when checking target: expected dense_3 to have shape (None, 1) but got array with shape (7715, 40000)Keras exception: Error when checking input: expected dense_input to have shape (2,) but got array with shape (1,)

                              Ружовы пелікан Змест Знешні выгляд | Пашырэнне | Асаблівасці біялогіі | Літаратура | НавігацыяДагледжаная версіяправерана1 зменаДагледжаная версіяправерана1 змена/ 22697590 Сістэматыкана ВіківідахВыявына Вікісховішчы174693363011049382

                              Illegal assignment from SObject to ContactFetching String, Id from Map - Illegal Assignment Id to Field / ObjectError: Compile Error: Illegal assignment from String to BooleanError: List has no rows for assignment to SObjectError on Test Class - System.QueryException: List has no rows for assignment to SObjectRemote action problemDML requires SObject or SObject list type error“Illegal assignment from List to List”Test Class Fail: Batch Class: System.QueryException: List has no rows for assignment to SObjectMapping to a user'List has no rows for assignment to SObject' Mystery