How to show the equivalence between the regularized regression and their constraint formulas using KKTThe proof of equivalent formulas of ridge regressionRidge regression formulation as constrained versus penalized: How are they equivalent?Equivalence between Elastic Net formulationsCalculating $R^2$ for Elastic NetEquivalence between Elastic Net formulationsWhy is “relaxed lasso” different from standard lasso?Bridge penalty vs. Elastic Net regularizationLogistic regression coefficients are wildlyHow to explain differences in formulas of ridge regression, lasso, and elastic netIntuition Behind the Elastic Net PenaltyRegularized Logistic Regression: Lasso vs. Ridge vs. Elastic NetCan you predict the residuals from a regularized regression using the same data?Elastic Net and collinearity

What does it mean to describe someone as a butt steak?

Is the Joker left-handed?

Do I have a twin with permutated remainders?

Will google still index a page if I use a $_SESSION variable?

Forgetting the musical notes while performing in concert

Is it unprofessional to ask if a job posting on GlassDoor is real?

Does casting Light, or a similar spell, have any effect when the caster is swallowed by a monster?

Why does Kotter return in Welcome Back Kotter

Why is consensus so controversial in Britain?

Took a trip to a parallel universe, need help deciphering

How badly should I try to prevent a user from XSSing themselves?

How to take photos in burst mode, without vibration?

What is going on with Captain Marvel's blood colour?

What's the point of deactivating Num Lock on login screens?

prove that the matrix A is diagonalizable

Fully-Firstable Anagram Sets

How can I tell someone that I want to be his or her friend?

Today is the Center

Western buddy movie with a supernatural twist where a woman turns into an eagle at the end

Emailing HOD to enhance faculty application

Etiquette around loan refinance - decision is going to cost first broker a lot of money

90's TV series where a boy goes to another dimension through portal near power lines

What reasons are there for a Capitalist to oppose a 100% inheritance tax?

Where does SFDX store details about scratch orgs?



How to show the equivalence between the regularized regression and their constraint formulas using KKT


The proof of equivalent formulas of ridge regressionRidge regression formulation as constrained versus penalized: How are they equivalent?Equivalence between Elastic Net formulationsCalculating $R^2$ for Elastic NetEquivalence between Elastic Net formulationsWhy is “relaxed lasso” different from standard lasso?Bridge penalty vs. Elastic Net regularizationLogistic regression coefficients are wildlyHow to explain differences in formulas of ridge regression, lasso, and elastic netIntuition Behind the Elastic Net PenaltyRegularized Logistic Regression: Lasso vs. Ridge vs. Elastic NetCan you predict the residuals from a regularized regression using the same data?Elastic Net and collinearity






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








6












$begingroup$


According to the following references



Book 1, Book 2 and paper.



It has been mentioned that there is an equivalence between the regularized regression (Ridge, LASSO and Elastic Net) and their constraint formulas.



I have also looked at Cross Validated 1, and Cross Validated 2, but I can not see a clear answer show that equivalence or logic.



My question is how to show that equivalence using Karush–Kuhn–Tucker (KKT)?



These formulas are for Ridge regression.



Ridge



These formulas are for LASSO regression.



|LASSO



These formulas are for Elastic Net regression.



Elastic Net



NOTE



This question is not homework. It is only to increase my comprehension of this topic.










share|cite|improve this question











$endgroup$


















    6












    $begingroup$


    According to the following references



    Book 1, Book 2 and paper.



    It has been mentioned that there is an equivalence between the regularized regression (Ridge, LASSO and Elastic Net) and their constraint formulas.



    I have also looked at Cross Validated 1, and Cross Validated 2, but I can not see a clear answer show that equivalence or logic.



    My question is how to show that equivalence using Karush–Kuhn–Tucker (KKT)?



    These formulas are for Ridge regression.



    Ridge



    These formulas are for LASSO regression.



    |LASSO



    These formulas are for Elastic Net regression.



    Elastic Net



    NOTE



    This question is not homework. It is only to increase my comprehension of this topic.










    share|cite|improve this question











    $endgroup$














      6












      6








      6


      2



      $begingroup$


      According to the following references



      Book 1, Book 2 and paper.



      It has been mentioned that there is an equivalence between the regularized regression (Ridge, LASSO and Elastic Net) and their constraint formulas.



      I have also looked at Cross Validated 1, and Cross Validated 2, but I can not see a clear answer show that equivalence or logic.



      My question is how to show that equivalence using Karush–Kuhn–Tucker (KKT)?



      These formulas are for Ridge regression.



      Ridge



      These formulas are for LASSO regression.



      |LASSO



      These formulas are for Elastic Net regression.



      Elastic Net



      NOTE



      This question is not homework. It is only to increase my comprehension of this topic.










      share|cite|improve this question











      $endgroup$




      According to the following references



      Book 1, Book 2 and paper.



      It has been mentioned that there is an equivalence between the regularized regression (Ridge, LASSO and Elastic Net) and their constraint formulas.



      I have also looked at Cross Validated 1, and Cross Validated 2, but I can not see a clear answer show that equivalence or logic.



      My question is how to show that equivalence using Karush–Kuhn–Tucker (KKT)?



      These formulas are for Ridge regression.



      Ridge



      These formulas are for LASSO regression.



      |LASSO



      These formulas are for Elastic Net regression.



      Elastic Net



      NOTE



      This question is not homework. It is only to increase my comprehension of this topic.







      regression optimization lasso ridge-regression elastic-net






      share|cite|improve this question















      share|cite|improve this question













      share|cite|improve this question




      share|cite|improve this question








      edited 2 hours ago







      jeza

















      asked 8 hours ago









      jezajeza

      470420




      470420




















          1 Answer
          1






          active

          oldest

          votes


















          6












          $begingroup$

          The more technical answer is because the constrained optimization problem can be written in terms of Lagrange multipliers. In particular, the Lagrangian associated with the constrained optimization problem is given by
          $$mathcal L(beta) = undersetbetamathrmargmin,leftsum_i=1^N left(y_i - sum_j=1^p x_ij beta_jright)^2right + mu left + alpha sum_j=1^p beta_j^2right$$
          where $mu$ is a multiplier chosen to satisfy the constraints of the problem. The first order conditions (which are sufficient since you are working with nice proper convex functions) for this optimization problem can thus be obtained by differentiating the Lagrangian with respect to $beta$ and setting the derivatives equal to 0 (it's a bit more nuanced since the LASSO part has undifferentiable points, but there are methods from convex analysis to generalize the derivative to make the first order condition still work). It is clear that these first order conditions are identical to the first order conditions of the unconstrained problem you wrote down.



          However, I think it's useful to see why in general, with these optimization problems, it is often possible to think about the problem either through the lens of a constrained optimization problem or through the lens of an unconstrained problem. More concretely, suppose we have an unconstrained optimization problem of the following form:
          $$max_x f(x) + lambda g(x)$$
          We can always try to solve this optimization directly, but sometimes, it might make sense to break this problem into subcomponents. In particular, it is not hard to see that
          $$max_x f(x) + lambda g(x) = max_t left(max_x f(x) mathrm s.t g(x) = tright) + lambda t$$
          So for a fixed value of $lambda$ (and assuming the functions to be optimized actually achieve their optima), we can associate with it a value $t^*$ that solves the outer optimization problem. This gives us a sort of mapping from unconstrained optimization problems to constrained problems. In your particular setting, since everything is nicely behaved for elastic net regression, this mapping should in fact be one to one, so it will be useful to be able to switch between these two contexts depending on which is more useful to a particular application. In general, this relationship between constrained and unconstrained problems may be less well behaved, but it may still be useful to think about to what extent you can move between the constrained and unconstrained problem.






          share|cite|improve this answer











          $endgroup$













            Your Answer





            StackExchange.ifUsing("editor", function ()
            return StackExchange.using("mathjaxEditing", function ()
            StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
            StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
            );
            );
            , "mathjax-editing");

            StackExchange.ready(function()
            var channelOptions =
            tags: "".split(" "),
            id: "65"
            ;
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function()
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled)
            StackExchange.using("snippets", function()
            createEditor();
            );

            else
            createEditor();

            );

            function createEditor()
            StackExchange.prepareEditor(
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: false,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: null,
            bindNavPrevention: true,
            postfix: "",
            imageUploader:
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            ,
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            );



            );













            draft saved

            draft discarded


















            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f401212%2fhow-to-show-the-equivalence-between-the-regularized-regression-and-their-constra%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown

























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            6












            $begingroup$

            The more technical answer is because the constrained optimization problem can be written in terms of Lagrange multipliers. In particular, the Lagrangian associated with the constrained optimization problem is given by
            $$mathcal L(beta) = undersetbetamathrmargmin,leftsum_i=1^N left(y_i - sum_j=1^p x_ij beta_jright)^2right + mu left + alpha sum_j=1^p beta_j^2right$$
            where $mu$ is a multiplier chosen to satisfy the constraints of the problem. The first order conditions (which are sufficient since you are working with nice proper convex functions) for this optimization problem can thus be obtained by differentiating the Lagrangian with respect to $beta$ and setting the derivatives equal to 0 (it's a bit more nuanced since the LASSO part has undifferentiable points, but there are methods from convex analysis to generalize the derivative to make the first order condition still work). It is clear that these first order conditions are identical to the first order conditions of the unconstrained problem you wrote down.



            However, I think it's useful to see why in general, with these optimization problems, it is often possible to think about the problem either through the lens of a constrained optimization problem or through the lens of an unconstrained problem. More concretely, suppose we have an unconstrained optimization problem of the following form:
            $$max_x f(x) + lambda g(x)$$
            We can always try to solve this optimization directly, but sometimes, it might make sense to break this problem into subcomponents. In particular, it is not hard to see that
            $$max_x f(x) + lambda g(x) = max_t left(max_x f(x) mathrm s.t g(x) = tright) + lambda t$$
            So for a fixed value of $lambda$ (and assuming the functions to be optimized actually achieve their optima), we can associate with it a value $t^*$ that solves the outer optimization problem. This gives us a sort of mapping from unconstrained optimization problems to constrained problems. In your particular setting, since everything is nicely behaved for elastic net regression, this mapping should in fact be one to one, so it will be useful to be able to switch between these two contexts depending on which is more useful to a particular application. In general, this relationship between constrained and unconstrained problems may be less well behaved, but it may still be useful to think about to what extent you can move between the constrained and unconstrained problem.






            share|cite|improve this answer











            $endgroup$

















              6












              $begingroup$

              The more technical answer is because the constrained optimization problem can be written in terms of Lagrange multipliers. In particular, the Lagrangian associated with the constrained optimization problem is given by
              $$mathcal L(beta) = undersetbetamathrmargmin,leftsum_i=1^N left(y_i - sum_j=1^p x_ij beta_jright)^2right + mu left + alpha sum_j=1^p beta_j^2right$$
              where $mu$ is a multiplier chosen to satisfy the constraints of the problem. The first order conditions (which are sufficient since you are working with nice proper convex functions) for this optimization problem can thus be obtained by differentiating the Lagrangian with respect to $beta$ and setting the derivatives equal to 0 (it's a bit more nuanced since the LASSO part has undifferentiable points, but there are methods from convex analysis to generalize the derivative to make the first order condition still work). It is clear that these first order conditions are identical to the first order conditions of the unconstrained problem you wrote down.



              However, I think it's useful to see why in general, with these optimization problems, it is often possible to think about the problem either through the lens of a constrained optimization problem or through the lens of an unconstrained problem. More concretely, suppose we have an unconstrained optimization problem of the following form:
              $$max_x f(x) + lambda g(x)$$
              We can always try to solve this optimization directly, but sometimes, it might make sense to break this problem into subcomponents. In particular, it is not hard to see that
              $$max_x f(x) + lambda g(x) = max_t left(max_x f(x) mathrm s.t g(x) = tright) + lambda t$$
              So for a fixed value of $lambda$ (and assuming the functions to be optimized actually achieve their optima), we can associate with it a value $t^*$ that solves the outer optimization problem. This gives us a sort of mapping from unconstrained optimization problems to constrained problems. In your particular setting, since everything is nicely behaved for elastic net regression, this mapping should in fact be one to one, so it will be useful to be able to switch between these two contexts depending on which is more useful to a particular application. In general, this relationship between constrained and unconstrained problems may be less well behaved, but it may still be useful to think about to what extent you can move between the constrained and unconstrained problem.






              share|cite|improve this answer











              $endgroup$















                6












                6








                6





                $begingroup$

                The more technical answer is because the constrained optimization problem can be written in terms of Lagrange multipliers. In particular, the Lagrangian associated with the constrained optimization problem is given by
                $$mathcal L(beta) = undersetbetamathrmargmin,leftsum_i=1^N left(y_i - sum_j=1^p x_ij beta_jright)^2right + mu left + alpha sum_j=1^p beta_j^2right$$
                where $mu$ is a multiplier chosen to satisfy the constraints of the problem. The first order conditions (which are sufficient since you are working with nice proper convex functions) for this optimization problem can thus be obtained by differentiating the Lagrangian with respect to $beta$ and setting the derivatives equal to 0 (it's a bit more nuanced since the LASSO part has undifferentiable points, but there are methods from convex analysis to generalize the derivative to make the first order condition still work). It is clear that these first order conditions are identical to the first order conditions of the unconstrained problem you wrote down.



                However, I think it's useful to see why in general, with these optimization problems, it is often possible to think about the problem either through the lens of a constrained optimization problem or through the lens of an unconstrained problem. More concretely, suppose we have an unconstrained optimization problem of the following form:
                $$max_x f(x) + lambda g(x)$$
                We can always try to solve this optimization directly, but sometimes, it might make sense to break this problem into subcomponents. In particular, it is not hard to see that
                $$max_x f(x) + lambda g(x) = max_t left(max_x f(x) mathrm s.t g(x) = tright) + lambda t$$
                So for a fixed value of $lambda$ (and assuming the functions to be optimized actually achieve their optima), we can associate with it a value $t^*$ that solves the outer optimization problem. This gives us a sort of mapping from unconstrained optimization problems to constrained problems. In your particular setting, since everything is nicely behaved for elastic net regression, this mapping should in fact be one to one, so it will be useful to be able to switch between these two contexts depending on which is more useful to a particular application. In general, this relationship between constrained and unconstrained problems may be less well behaved, but it may still be useful to think about to what extent you can move between the constrained and unconstrained problem.






                share|cite|improve this answer











                $endgroup$



                The more technical answer is because the constrained optimization problem can be written in terms of Lagrange multipliers. In particular, the Lagrangian associated with the constrained optimization problem is given by
                $$mathcal L(beta) = undersetbetamathrmargmin,leftsum_i=1^N left(y_i - sum_j=1^p x_ij beta_jright)^2right + mu left + alpha sum_j=1^p beta_j^2right$$
                where $mu$ is a multiplier chosen to satisfy the constraints of the problem. The first order conditions (which are sufficient since you are working with nice proper convex functions) for this optimization problem can thus be obtained by differentiating the Lagrangian with respect to $beta$ and setting the derivatives equal to 0 (it's a bit more nuanced since the LASSO part has undifferentiable points, but there are methods from convex analysis to generalize the derivative to make the first order condition still work). It is clear that these first order conditions are identical to the first order conditions of the unconstrained problem you wrote down.



                However, I think it's useful to see why in general, with these optimization problems, it is often possible to think about the problem either through the lens of a constrained optimization problem or through the lens of an unconstrained problem. More concretely, suppose we have an unconstrained optimization problem of the following form:
                $$max_x f(x) + lambda g(x)$$
                We can always try to solve this optimization directly, but sometimes, it might make sense to break this problem into subcomponents. In particular, it is not hard to see that
                $$max_x f(x) + lambda g(x) = max_t left(max_x f(x) mathrm s.t g(x) = tright) + lambda t$$
                So for a fixed value of $lambda$ (and assuming the functions to be optimized actually achieve their optima), we can associate with it a value $t^*$ that solves the outer optimization problem. This gives us a sort of mapping from unconstrained optimization problems to constrained problems. In your particular setting, since everything is nicely behaved for elastic net regression, this mapping should in fact be one to one, so it will be useful to be able to switch between these two contexts depending on which is more useful to a particular application. In general, this relationship between constrained and unconstrained problems may be less well behaved, but it may still be useful to think about to what extent you can move between the constrained and unconstrained problem.







                share|cite|improve this answer














                share|cite|improve this answer



                share|cite|improve this answer








                edited 5 hours ago

























                answered 7 hours ago









                stats_modelstats_model

                20216




                20216



























                    draft saved

                    draft discarded
















































                    Thanks for contributing an answer to Cross Validated!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid


                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.

                    Use MathJax to format equations. MathJax reference.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function ()
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f401212%2fhow-to-show-the-equivalence-between-the-regularized-regression-and-their-constra%23new-answer', 'question_page');

                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Францішак Багушэвіч Змест Сям'я | Біяграфія | Творчасць | Мова Багушэвіча | Ацэнкі дзейнасці | Цікавыя факты | Спадчына | Выбраная бібліяграфія | Ушанаванне памяці | У філатэліі | Зноскі | Літаратура | Спасылкі | НавігацыяЛяхоўскі У. Рупіўся дзеля Бога і людзей: Жыццёвы шлях Лявона Вітан-Дубейкаўскага // Вольскі і Памідораў з песняй пра немца Адвакат, паэт, народны заступнік Ашмянскі веснікВ Минске появится площадь Богушевича и улица Сырокомли, Белорусская деловая газета, 19 июля 2001 г.Айцец беларускай нацыянальнай ідэі паўстаў у бронзе Сяргей Аляксандравіч Адашкевіч (1918, Мінск). 80-я гады. Бюст «Францішак Багушэвіч».Яўген Мікалаевіч Ціхановіч. «Партрэт Францішка Багушэвіча»Мікола Мікалаевіч Купава. «Партрэт зачынальніка новай беларускай літаратуры Францішка Багушэвіча»Уладзімір Іванавіч Мелехаў. На помніку «Змагарам за родную мову» Барэльеф «Францішак Багушэвіч»Памяць пра Багушэвіча на Віленшчыне Страчаная сталіца. Беларускія шыльды на вуліцах Вільні«Krynica». Ideologia i przywódcy białoruskiego katolicyzmuФранцішак БагушэвічТворы на knihi.comТворы Францішка Багушэвіча на bellib.byСодаль Уладзімір. Францішак Багушэвіч на Лідчыне;Луцкевіч Антон. Жыцьцё і творчасьць Фр. Багушэвіча ў успамінах ягоных сучасьнікаў // Запісы Беларускага Навуковага таварыства. Вільня, 1938. Сшытак 1. С. 16-34.Большая российская1188761710000 0000 5537 633Xn9209310021619551927869394п

                    Partai Komunis Tiongkok Daftar isi Kepemimpinan | Pranala luar | Referensi | Menu navigasidiperiksa1 perubahan tertundacpc.people.com.cnSitus resmiSurat kabar resmi"Why the Communist Party is alive, well and flourishing in China"0307-1235"Full text of Constitution of Communist Party of China"smengembangkannyas

                    ValueError: Expected n_neighbors <= n_samples, but n_samples = 1, n_neighbors = 6 (SMOTE) The 2019 Stack Overflow Developer Survey Results Are InCan SMOTE be applied over sequence of words (sentences)?ValueError when doing validation with random forestsSMOTE and multi class oversamplingLogic behind SMOTE-NC?ValueError: Error when checking target: expected dense_1 to have shape (7,) but got array with shape (1,)SmoteBoost: Should SMOTE be ran individually for each iteration/tree in the boosting?solving multi-class imbalance classification using smote and OSSUsing SMOTE for Synthetic Data generation to improve performance on unbalanced dataproblem of entry format for a simple model in KerasSVM SMOTE fit_resample() function runs forever with no result