Recursively updating the MLE as new observations stream inSimple MLE Question4 cases of Maximum Likelihood Estimation of Gaussian distribution parameterssimulating random samples with a given MLEFor the family of distributions, $f_theta(x) = theta x^theta-1$, what is the sufficient statistic corresponding to the monotone likelihood ratio?Prove that MLE does not depend on the dominating measureDetermining an MLEMLE of $f(xmidtheta) = theta x^theta−1e^−x^thetaI_(0,infty)(x)$Sufficient statistic when $Xsim U(theta,2 theta)$Estimating the MLE where the parameter is also the constraintTrouble with MLE

Is there any common country to visit for uk and schengen visa?

What is the tangent at a sharp point on a curve?

Symbolism of 18 Journeyers

Print last inputted byte

How to test the sharpness of a knife?

Justification failure in beamer enumerate list

Can a university suspend a student even when he has left university?

The English Debate

Recursively updating the MLE as new observations stream in

Does convergence of polynomials imply that of its coefficients?

Writing in a Christian voice

Why I don't get the wanted width of tcbox?

Why is "la Gestapo" feminine?

Can other pieces capture a threatening piece and prevent a checkmate?

Single word to change groups

Pre-Employment Background Check With Consent For Future Checks

Isn't the word "experience" wrongly used in this context?

Was World War I a war of liberals against authoritarians?

Why is indicated airspeed rather than ground speed used during the takeoff roll?

Why is participating in the European Parliamentary elections used as a threat?

How to balance a monster modification (zombie)?

label a part of commutative diagram

Do native speakers use "ultima" and "proxima" frequently in spoken English?

What are the rules for concealing thieves' tools (or items in general)?



Recursively updating the MLE as new observations stream in


Simple MLE Question4 cases of Maximum Likelihood Estimation of Gaussian distribution parameterssimulating random samples with a given MLEFor the family of distributions, $f_theta(x) = theta x^theta-1$, what is the sufficient statistic corresponding to the monotone likelihood ratio?Prove that MLE does not depend on the dominating measureDetermining an MLEMLE of $f(xmidtheta) = theta x^theta−1e^−x^thetaI_(0,infty)(x)$Sufficient statistic when $Xsim U(theta,2 theta)$Estimating the MLE where the parameter is also the constraintTrouble with MLE













7












$begingroup$


General Question



Say we have iid data $x_1$, $x_2$, ... $sim f(x,|,boldsymboltheta)$ streaming in. We want to recursively compute the maximum likelihood estimate of $boldsymboltheta$. That is, having computed
$$hatboldsymboltheta_n-1=undersetboldsymbolthetainmathbbR^pargmaxprod_i=1^n-1f(x_i,|,boldsymboltheta),$$
we observe a new $x_n$, and wish to somehow incrementally update our estimate
$$hatboldsymboltheta_n-1,,x_n to hatboldsymboltheta_n$$
without having to start from scratch. Are there generic algorithms for this?



Toy Example



If $x_1$, $x_2$, ... $sim N(x,|,mu, 1)$, then
$$hatmu_n-1 = frac1n-1sumlimits_i=1^n-1x_iquadtextandquadhatmu_n = frac1nsumlimits_i=1^nx_i,$$
so
$$hatmu_n=frac1nleft[(n-1)hatmu_n-1 + x_nright].$$










share|cite|improve this question











$endgroup$











  • $begingroup$
    Awesome question!
    $endgroup$
    – dlnB
    4 hours ago






  • 2




    $begingroup$
    Don't forget the inverse of this problem: updating the estimator as old observations are deleted.
    $endgroup$
    – Hong Ooi
    2 hours ago















7












$begingroup$


General Question



Say we have iid data $x_1$, $x_2$, ... $sim f(x,|,boldsymboltheta)$ streaming in. We want to recursively compute the maximum likelihood estimate of $boldsymboltheta$. That is, having computed
$$hatboldsymboltheta_n-1=undersetboldsymbolthetainmathbbR^pargmaxprod_i=1^n-1f(x_i,|,boldsymboltheta),$$
we observe a new $x_n$, and wish to somehow incrementally update our estimate
$$hatboldsymboltheta_n-1,,x_n to hatboldsymboltheta_n$$
without having to start from scratch. Are there generic algorithms for this?



Toy Example



If $x_1$, $x_2$, ... $sim N(x,|,mu, 1)$, then
$$hatmu_n-1 = frac1n-1sumlimits_i=1^n-1x_iquadtextandquadhatmu_n = frac1nsumlimits_i=1^nx_i,$$
so
$$hatmu_n=frac1nleft[(n-1)hatmu_n-1 + x_nright].$$










share|cite|improve this question











$endgroup$











  • $begingroup$
    Awesome question!
    $endgroup$
    – dlnB
    4 hours ago






  • 2




    $begingroup$
    Don't forget the inverse of this problem: updating the estimator as old observations are deleted.
    $endgroup$
    – Hong Ooi
    2 hours ago













7












7








7


2



$begingroup$


General Question



Say we have iid data $x_1$, $x_2$, ... $sim f(x,|,boldsymboltheta)$ streaming in. We want to recursively compute the maximum likelihood estimate of $boldsymboltheta$. That is, having computed
$$hatboldsymboltheta_n-1=undersetboldsymbolthetainmathbbR^pargmaxprod_i=1^n-1f(x_i,|,boldsymboltheta),$$
we observe a new $x_n$, and wish to somehow incrementally update our estimate
$$hatboldsymboltheta_n-1,,x_n to hatboldsymboltheta_n$$
without having to start from scratch. Are there generic algorithms for this?



Toy Example



If $x_1$, $x_2$, ... $sim N(x,|,mu, 1)$, then
$$hatmu_n-1 = frac1n-1sumlimits_i=1^n-1x_iquadtextandquadhatmu_n = frac1nsumlimits_i=1^nx_i,$$
so
$$hatmu_n=frac1nleft[(n-1)hatmu_n-1 + x_nright].$$










share|cite|improve this question











$endgroup$




General Question



Say we have iid data $x_1$, $x_2$, ... $sim f(x,|,boldsymboltheta)$ streaming in. We want to recursively compute the maximum likelihood estimate of $boldsymboltheta$. That is, having computed
$$hatboldsymboltheta_n-1=undersetboldsymbolthetainmathbbR^pargmaxprod_i=1^n-1f(x_i,|,boldsymboltheta),$$
we observe a new $x_n$, and wish to somehow incrementally update our estimate
$$hatboldsymboltheta_n-1,,x_n to hatboldsymboltheta_n$$
without having to start from scratch. Are there generic algorithms for this?



Toy Example



If $x_1$, $x_2$, ... $sim N(x,|,mu, 1)$, then
$$hatmu_n-1 = frac1n-1sumlimits_i=1^n-1x_iquadtextandquadhatmu_n = frac1nsumlimits_i=1^nx_i,$$
so
$$hatmu_n=frac1nleft[(n-1)hatmu_n-1 + x_nright].$$







maximum-likelihood online






share|cite|improve this question















share|cite|improve this question













share|cite|improve this question




share|cite|improve this question








edited 3 hours ago







bamts

















asked 4 hours ago









bamtsbamts

775313




775313











  • $begingroup$
    Awesome question!
    $endgroup$
    – dlnB
    4 hours ago






  • 2




    $begingroup$
    Don't forget the inverse of this problem: updating the estimator as old observations are deleted.
    $endgroup$
    – Hong Ooi
    2 hours ago
















  • $begingroup$
    Awesome question!
    $endgroup$
    – dlnB
    4 hours ago






  • 2




    $begingroup$
    Don't forget the inverse of this problem: updating the estimator as old observations are deleted.
    $endgroup$
    – Hong Ooi
    2 hours ago















$begingroup$
Awesome question!
$endgroup$
– dlnB
4 hours ago




$begingroup$
Awesome question!
$endgroup$
– dlnB
4 hours ago




2




2




$begingroup$
Don't forget the inverse of this problem: updating the estimator as old observations are deleted.
$endgroup$
– Hong Ooi
2 hours ago




$begingroup$
Don't forget the inverse of this problem: updating the estimator as old observations are deleted.
$endgroup$
– Hong Ooi
2 hours ago










2 Answers
2






active

oldest

votes


















5












$begingroup$

See the concept of sufficiency and in particular, minimal sufficient statistics. In many cases you need the whole sample to compute the estimate at a given sample size, with no trivial way to update from a sample one size smaller (i.e. there's no convenient general result).



If the distribution is exponential family (and in some other cases besides; the uniform is a neat example) there's a nice sufficient statistic that can in many cases be updated in the manner you seek (i.e. with a number of commonly used distributions there would be a fast update).



One example I'm not aware of any direct way to either calculate or update is the estimate for the location of the Cauchy distribution (e.g. with unit scale, to make the problem a simple one-parameter problem). There may be a faster update, however, that I simply haven't noticed - I can't say I've really done more than glance at it for considering the updating case.



On the other hand, with MLEs that are obtained via numerical optimization methods, the previous estimate would in many cases be a great starting point, since typically the previous estimate would be very close to the updated estimate; in that sense at least, rapid updating should often be possible. Even this isn't the general case, though -- with multimodal likelihood functions (again, see the Cauchy for an example), a new observation might lead to the highest mode being some distance from the previous one (even if the locations of each of the biggest few modes didn't shift much, which one is highest could well change).






share|cite|improve this answer











$endgroup$












  • $begingroup$
    Thanks! The point about the MLE possibly switching modes midstream is particularly helpful for understanding why this would be hard in general.
    $endgroup$
    – bamts
    57 mins ago



















0












$begingroup$

In machine learning, this is referred to as online learning.



As @Glen_b pointed out, there are special cases in which the MLE can be updated without needing to access all the previous data. As he also points out, I don't believe there's a generic solution for finding the MLE.



A fairly generic approach for finding the approximate solution is to use something like stochastic gradient descent. In this case, as each observation comes in, we compute the gradient with respect to this individual observation and move the parameter values a very small amount in this direction. Under certain conditions, we can show that this will converge to a neighborhood of the MLE with high probability; the neighborhood is tighter and tighter as we reduce the step size, but more data is required for convergence. However, these stochastic methods in general require much more fiddling to obtain good performance than, say, closed form updates.






share|cite|improve this answer









$endgroup$












    Your Answer





    StackExchange.ifUsing("editor", function ()
    return StackExchange.using("mathjaxEditing", function ()
    StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
    StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
    );
    );
    , "mathjax-editing");

    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "65"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













    draft saved

    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f398220%2frecursively-updating-the-mle-as-new-observations-stream-in%23new-answer', 'question_page');

    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    5












    $begingroup$

    See the concept of sufficiency and in particular, minimal sufficient statistics. In many cases you need the whole sample to compute the estimate at a given sample size, with no trivial way to update from a sample one size smaller (i.e. there's no convenient general result).



    If the distribution is exponential family (and in some other cases besides; the uniform is a neat example) there's a nice sufficient statistic that can in many cases be updated in the manner you seek (i.e. with a number of commonly used distributions there would be a fast update).



    One example I'm not aware of any direct way to either calculate or update is the estimate for the location of the Cauchy distribution (e.g. with unit scale, to make the problem a simple one-parameter problem). There may be a faster update, however, that I simply haven't noticed - I can't say I've really done more than glance at it for considering the updating case.



    On the other hand, with MLEs that are obtained via numerical optimization methods, the previous estimate would in many cases be a great starting point, since typically the previous estimate would be very close to the updated estimate; in that sense at least, rapid updating should often be possible. Even this isn't the general case, though -- with multimodal likelihood functions (again, see the Cauchy for an example), a new observation might lead to the highest mode being some distance from the previous one (even if the locations of each of the biggest few modes didn't shift much, which one is highest could well change).






    share|cite|improve this answer











    $endgroup$












    • $begingroup$
      Thanks! The point about the MLE possibly switching modes midstream is particularly helpful for understanding why this would be hard in general.
      $endgroup$
      – bamts
      57 mins ago
















    5












    $begingroup$

    See the concept of sufficiency and in particular, minimal sufficient statistics. In many cases you need the whole sample to compute the estimate at a given sample size, with no trivial way to update from a sample one size smaller (i.e. there's no convenient general result).



    If the distribution is exponential family (and in some other cases besides; the uniform is a neat example) there's a nice sufficient statistic that can in many cases be updated in the manner you seek (i.e. with a number of commonly used distributions there would be a fast update).



    One example I'm not aware of any direct way to either calculate or update is the estimate for the location of the Cauchy distribution (e.g. with unit scale, to make the problem a simple one-parameter problem). There may be a faster update, however, that I simply haven't noticed - I can't say I've really done more than glance at it for considering the updating case.



    On the other hand, with MLEs that are obtained via numerical optimization methods, the previous estimate would in many cases be a great starting point, since typically the previous estimate would be very close to the updated estimate; in that sense at least, rapid updating should often be possible. Even this isn't the general case, though -- with multimodal likelihood functions (again, see the Cauchy for an example), a new observation might lead to the highest mode being some distance from the previous one (even if the locations of each of the biggest few modes didn't shift much, which one is highest could well change).






    share|cite|improve this answer











    $endgroup$












    • $begingroup$
      Thanks! The point about the MLE possibly switching modes midstream is particularly helpful for understanding why this would be hard in general.
      $endgroup$
      – bamts
      57 mins ago














    5












    5








    5





    $begingroup$

    See the concept of sufficiency and in particular, minimal sufficient statistics. In many cases you need the whole sample to compute the estimate at a given sample size, with no trivial way to update from a sample one size smaller (i.e. there's no convenient general result).



    If the distribution is exponential family (and in some other cases besides; the uniform is a neat example) there's a nice sufficient statistic that can in many cases be updated in the manner you seek (i.e. with a number of commonly used distributions there would be a fast update).



    One example I'm not aware of any direct way to either calculate or update is the estimate for the location of the Cauchy distribution (e.g. with unit scale, to make the problem a simple one-parameter problem). There may be a faster update, however, that I simply haven't noticed - I can't say I've really done more than glance at it for considering the updating case.



    On the other hand, with MLEs that are obtained via numerical optimization methods, the previous estimate would in many cases be a great starting point, since typically the previous estimate would be very close to the updated estimate; in that sense at least, rapid updating should often be possible. Even this isn't the general case, though -- with multimodal likelihood functions (again, see the Cauchy for an example), a new observation might lead to the highest mode being some distance from the previous one (even if the locations of each of the biggest few modes didn't shift much, which one is highest could well change).






    share|cite|improve this answer











    $endgroup$



    See the concept of sufficiency and in particular, minimal sufficient statistics. In many cases you need the whole sample to compute the estimate at a given sample size, with no trivial way to update from a sample one size smaller (i.e. there's no convenient general result).



    If the distribution is exponential family (and in some other cases besides; the uniform is a neat example) there's a nice sufficient statistic that can in many cases be updated in the manner you seek (i.e. with a number of commonly used distributions there would be a fast update).



    One example I'm not aware of any direct way to either calculate or update is the estimate for the location of the Cauchy distribution (e.g. with unit scale, to make the problem a simple one-parameter problem). There may be a faster update, however, that I simply haven't noticed - I can't say I've really done more than glance at it for considering the updating case.



    On the other hand, with MLEs that are obtained via numerical optimization methods, the previous estimate would in many cases be a great starting point, since typically the previous estimate would be very close to the updated estimate; in that sense at least, rapid updating should often be possible. Even this isn't the general case, though -- with multimodal likelihood functions (again, see the Cauchy for an example), a new observation might lead to the highest mode being some distance from the previous one (even if the locations of each of the biggest few modes didn't shift much, which one is highest could well change).







    share|cite|improve this answer














    share|cite|improve this answer



    share|cite|improve this answer








    edited 3 hours ago

























    answered 3 hours ago









    Glen_bGlen_b

    214k22414764




    214k22414764











    • $begingroup$
      Thanks! The point about the MLE possibly switching modes midstream is particularly helpful for understanding why this would be hard in general.
      $endgroup$
      – bamts
      57 mins ago

















    • $begingroup$
      Thanks! The point about the MLE possibly switching modes midstream is particularly helpful for understanding why this would be hard in general.
      $endgroup$
      – bamts
      57 mins ago
















    $begingroup$
    Thanks! The point about the MLE possibly switching modes midstream is particularly helpful for understanding why this would be hard in general.
    $endgroup$
    – bamts
    57 mins ago





    $begingroup$
    Thanks! The point about the MLE possibly switching modes midstream is particularly helpful for understanding why this would be hard in general.
    $endgroup$
    – bamts
    57 mins ago














    0












    $begingroup$

    In machine learning, this is referred to as online learning.



    As @Glen_b pointed out, there are special cases in which the MLE can be updated without needing to access all the previous data. As he also points out, I don't believe there's a generic solution for finding the MLE.



    A fairly generic approach for finding the approximate solution is to use something like stochastic gradient descent. In this case, as each observation comes in, we compute the gradient with respect to this individual observation and move the parameter values a very small amount in this direction. Under certain conditions, we can show that this will converge to a neighborhood of the MLE with high probability; the neighborhood is tighter and tighter as we reduce the step size, but more data is required for convergence. However, these stochastic methods in general require much more fiddling to obtain good performance than, say, closed form updates.






    share|cite|improve this answer









    $endgroup$

















      0












      $begingroup$

      In machine learning, this is referred to as online learning.



      As @Glen_b pointed out, there are special cases in which the MLE can be updated without needing to access all the previous data. As he also points out, I don't believe there's a generic solution for finding the MLE.



      A fairly generic approach for finding the approximate solution is to use something like stochastic gradient descent. In this case, as each observation comes in, we compute the gradient with respect to this individual observation and move the parameter values a very small amount in this direction. Under certain conditions, we can show that this will converge to a neighborhood of the MLE with high probability; the neighborhood is tighter and tighter as we reduce the step size, but more data is required for convergence. However, these stochastic methods in general require much more fiddling to obtain good performance than, say, closed form updates.






      share|cite|improve this answer









      $endgroup$















        0












        0








        0





        $begingroup$

        In machine learning, this is referred to as online learning.



        As @Glen_b pointed out, there are special cases in which the MLE can be updated without needing to access all the previous data. As he also points out, I don't believe there's a generic solution for finding the MLE.



        A fairly generic approach for finding the approximate solution is to use something like stochastic gradient descent. In this case, as each observation comes in, we compute the gradient with respect to this individual observation and move the parameter values a very small amount in this direction. Under certain conditions, we can show that this will converge to a neighborhood of the MLE with high probability; the neighborhood is tighter and tighter as we reduce the step size, but more data is required for convergence. However, these stochastic methods in general require much more fiddling to obtain good performance than, say, closed form updates.






        share|cite|improve this answer









        $endgroup$



        In machine learning, this is referred to as online learning.



        As @Glen_b pointed out, there are special cases in which the MLE can be updated without needing to access all the previous data. As he also points out, I don't believe there's a generic solution for finding the MLE.



        A fairly generic approach for finding the approximate solution is to use something like stochastic gradient descent. In this case, as each observation comes in, we compute the gradient with respect to this individual observation and move the parameter values a very small amount in this direction. Under certain conditions, we can show that this will converge to a neighborhood of the MLE with high probability; the neighborhood is tighter and tighter as we reduce the step size, but more data is required for convergence. However, these stochastic methods in general require much more fiddling to obtain good performance than, say, closed form updates.







        share|cite|improve this answer












        share|cite|improve this answer



        share|cite|improve this answer










        answered 2 hours ago









        Cliff ABCliff AB

        13.6k12567




        13.6k12567



























            draft saved

            draft discarded
















































            Thanks for contributing an answer to Cross Validated!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            Use MathJax to format equations. MathJax reference.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f398220%2frecursively-updating-the-mle-as-new-observations-stream-in%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            ValueError: Error when checking input: expected conv2d_13_input to have shape (3, 150, 150) but got array with shape (150, 150, 3)2019 Community Moderator ElectionError when checking : expected dense_1_input to have shape (None, 5) but got array with shape (200, 1)Error 'Expected 2D array, got 1D array instead:'ValueError: Error when checking input: expected lstm_41_input to have 3 dimensions, but got array with shape (40000,100)ValueError: Error when checking target: expected dense_1 to have shape (7,) but got array with shape (1,)ValueError: Error when checking target: expected dense_2 to have shape (1,) but got array with shape (0,)Keras exception: ValueError: Error when checking input: expected conv2d_1_input to have shape (150, 150, 3) but got array with shape (256, 256, 3)Steps taking too long to completewhen checking input: expected dense_1_input to have shape (13328,) but got array with shape (317,)ValueError: Error when checking target: expected dense_3 to have shape (None, 1) but got array with shape (7715, 40000)Keras exception: Error when checking input: expected dense_input to have shape (2,) but got array with shape (1,)

            Ружовы пелікан Змест Знешні выгляд | Пашырэнне | Асаблівасці біялогіі | Літаратура | НавігацыяДагледжаная версіяправерана1 зменаДагледжаная версіяправерана1 змена/ 22697590 Сістэматыкана ВіківідахВыявына Вікісховішчы174693363011049382

            Illegal assignment from SObject to ContactFetching String, Id from Map - Illegal Assignment Id to Field / ObjectError: Compile Error: Illegal assignment from String to BooleanError: List has no rows for assignment to SObjectError on Test Class - System.QueryException: List has no rows for assignment to SObjectRemote action problemDML requires SObject or SObject list type error“Illegal assignment from List to List”Test Class Fail: Batch Class: System.QueryException: List has no rows for assignment to SObjectMapping to a user'List has no rows for assignment to SObject' Mystery