Optimization methods used in machine learning Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern) 2019 Moderator Election Q&A - Questionnaire 2019 Community Moderator Election ResultsDoes gradient descent always converge to an optimum?Machine Learning for hedging/ portfolio optimization?When to use what - Machine LearningWhy does applying PCA on targets causes underfitting?Function Callers Vs Data ScientistsThe connection between optimization and generalizationBreaking through an accuracy brickwall with my LSTMCommon Techniques to Generate from a Regression Neural Network ModelMethods of building machine learning modelsHow to get out of local minimums on stochastic gradient descent?Machine Learning methods suited for CPU

Why did Israel vote against lifting the American embargo on Cuba?

What is the evidence that custom checks in Northern Ireland are going to result in violence?

Reflections in a Square

Why aren't these two solutions equivalent? Combinatorics problem

Short story about an alien named Ushtu(?) coming from a future Earth, when ours was destroyed by a nuclear explosion

Can 'non' with gerundive mean both lack of obligation and negative obligation?

Trying to enter the Fox's den

How to ask rejected full-time candidates to apply to teach individual courses?

How do I overlay a PNG over two videos (one video overlays another) in one command using FFmpeg?

Why isn't everyone flabbergasted about Bran's "gift"?

Can a Wizard take the Magic Initiate feat and select spells from the Wizard list?

Determine the generator of an ideal of ring of integers

“Since the train was delayed for more than an hour, passengers were given a full refund.” – Why is there no article before “passengers”?

Why are two-digit numbers in Jonathan Swift's "Gulliver's Travels" (1726) written in "German style"?

xkeyval -- read keys from file

Compiling and throwing simple dynamic exceptions at runtime for JVM

Can I ask an author to send me his ebook?

Etymology of 見舞い

Is Vivien of the Wilds + Wilderness Reclimation a competitive combo?

Should man-made satellites feature an intelligent inverted "cow catcher"?

How to leave only the following strings?

Why does my GNOME settings mention "Moto C Plus"?

lm and glm function in R

Converting a text document with special format to Pandas DataFrame

Optimization methods used in machine learning

Announcing the arrival of Valued Associate #679: Cesar Manara

Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern)

2019 Moderator Election Q&A - Questionnaire

2019 Community Moderator Election ResultsDoes gradient descent always converge to an optimum?Machine Learning for hedging/ portfolio optimization?When to use what - Machine LearningWhy does applying PCA on targets causes underfitting?Function Callers Vs Data ScientistsThe connection between optimization and generalizationBreaking through an accuracy brickwall with my LSTMCommon Techniques to Generate from a Regression Neural Network ModelMethods of building machine learning modelsHow to get out of local minimums on stochastic gradient descent?Machine Learning methods suited for CPU

I don't have too much knowledge in the field of ML, but from my naive point of view it always seems that some variant of gradient descent is used when training neutral networks. As such, I was wondering why more advanced methods don't seemed to be used, such as SQP algorithms or interior-point methods. Is it because training a neutral net is always a simple unconstrained optimization problem, and the above-mentioned methods would be unnecessary? Any insight would be great, thanks.

asked Feb 22 '18 at 16:49

InquisitiveInquirer

1061

bumped to the homepage by Community♦ 2 hours ago

This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.

1

$begingroup$
Because the more expensive methods don't offer enough advantage over simple gradient descent. Or maybe we do not know how to harness them well enough. Why gradient descent works as well as it does is still debated; cf. e.g. The Marginal Value of Adaptive Gradient Methods in Machine Learning. Welcome to the site!
$endgroup$
– Emre
Feb 22 '18 at 17:24

$begingroup$
@Emre Thanks for your answer. Don't you think GD approaches using momentum perform so much better?
$endgroup$
– Vaalizaadeh
Feb 22 '18 at 18:12

1

$begingroup$
It has for me; momentum functions as a dampener enabling the optimizer to power through rough patches of the loss surface, but here we have a paper that questions this folk wisdom. I'll keep using it until the dust settles.
$endgroup$
– Emre
Feb 22 '18 at 18:15

$begingroup$
Excuse me sir, @Emre If you want to train a network from scratch based on what you have referred to, you would prefer GD over Adam?
$endgroup$
– Vaalizaadeh
Feb 23 '18 at 13:24

$begingroup$
I would not, because GD needs tuning, and Adam will beat untuned GD. When I hear "advanced methods" I think of (quasi) second order or natural gradients.
$endgroup$
– Emre
Feb 23 '18 at 17:29

|
show 1 more comment

asked Feb 22 '18 at 16:49

InquisitiveInquirer

1061

bumped to the homepage by Community♦ 2 hours ago

This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.

1

$begingroup$
Because the more expensive methods don't offer enough advantage over simple gradient descent. Or maybe we do not know how to harness them well enough. Why gradient descent works as well as it does is still debated; cf. e.g. The Marginal Value of Adaptive Gradient Methods in Machine Learning. Welcome to the site!
$endgroup$
– Emre
Feb 22 '18 at 17:24

$begingroup$
@Emre Thanks for your answer. Don't you think GD approaches using momentum perform so much better?
$endgroup$
– Vaalizaadeh
Feb 22 '18 at 18:12

1

$begingroup$
It has for me; momentum functions as a dampener enabling the optimizer to power through rough patches of the loss surface, but here we have a paper that questions this folk wisdom. I'll keep using it until the dust settles.
$endgroup$
– Emre
Feb 22 '18 at 18:15

$begingroup$
Excuse me sir, @Emre If you want to train a network from scratch based on what you have referred to, you would prefer GD over Adam?
$endgroup$
– Vaalizaadeh
Feb 23 '18 at 13:24

$begingroup$
I would not, because GD needs tuning, and Adam will beat untuned GD. When I hear "advanced methods" I think of (quasi) second order or natural gradients.
$endgroup$
– Emre
Feb 23 '18 at 17:29

|
show 1 more comment

asked Feb 22 '18 at 16:49

InquisitiveInquirer

1061

machine-learning neural-network training

asked Feb 22 '18 at 16:49

InquisitiveInquirer

1061

asked Feb 22 '18 at 16:49

InquisitiveInquirer

1061

asked Feb 22 '18 at 16:49

InquisitiveInquirer

1061

asked Feb 22 '18 at 16:49

InquisitiveInquirer

1061

asked Feb 22 '18 at 16:49

InquisitiveInquirer

1061

bumped to the homepage by Community♦ 2 hours ago

This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.

bumped to the homepage by Community♦ 2 hours ago

This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.

1

$begingroup$
Because the more expensive methods don't offer enough advantage over simple gradient descent. Or maybe we do not know how to harness them well enough. Why gradient descent works as well as it does is still debated; cf. e.g. The Marginal Value of Adaptive Gradient Methods in Machine Learning. Welcome to the site!
$endgroup$
– Emre
Feb 22 '18 at 17:24

$begingroup$
@Emre Thanks for your answer. Don't you think GD approaches using momentum perform so much better?
$endgroup$
– Vaalizaadeh
Feb 22 '18 at 18:12

1

$begingroup$
It has for me; momentum functions as a dampener enabling the optimizer to power through rough patches of the loss surface, but here we have a paper that questions this folk wisdom. I'll keep using it until the dust settles.
$endgroup$
– Emre
Feb 22 '18 at 18:15

$begingroup$
Excuse me sir, @Emre If you want to train a network from scratch based on what you have referred to, you would prefer GD over Adam?
$endgroup$
– Vaalizaadeh
Feb 23 '18 at 13:24

$begingroup$
I would not, because GD needs tuning, and Adam will beat untuned GD. When I hear "advanced methods" I think of (quasi) second order or natural gradients.
$endgroup$
– Emre
Feb 23 '18 at 17:29

|
show 1 more comment

1

$begingroup$
Because the more expensive methods don't offer enough advantage over simple gradient descent. Or maybe we do not know how to harness them well enough. Why gradient descent works as well as it does is still debated; cf. e.g. The Marginal Value of Adaptive Gradient Methods in Machine Learning. Welcome to the site!
$endgroup$
– Emre
Feb 22 '18 at 17:24

$begingroup$
@Emre Thanks for your answer. Don't you think GD approaches using momentum perform so much better?
$endgroup$
– Vaalizaadeh
Feb 22 '18 at 18:12

1

$begingroup$
It has for me; momentum functions as a dampener enabling the optimizer to power through rough patches of the loss surface, but here we have a paper that questions this folk wisdom. I'll keep using it until the dust settles.
$endgroup$
– Emre
Feb 22 '18 at 18:15

$begingroup$
Excuse me sir, @Emre If you want to train a network from scratch based on what you have referred to, you would prefer GD over Adam?
$endgroup$
– Vaalizaadeh
Feb 23 '18 at 13:24

$begingroup$
I would not, because GD needs tuning, and Adam will beat untuned GD. When I hear "advanced methods" I think of (quasi) second order or natural gradients.
$endgroup$
– Emre
Feb 23 '18 at 17:29

Because the more expensive methods don't offer enough advantage over simple gradient descent. Or maybe we do not know how to harness them well enough. Why gradient descent works as well as it does is still debated; cf. e.g. The Marginal Value of Adaptive Gradient Methods in Machine Learning. Welcome to the site!

– Emre
Feb 22 '18 at 17:24

@Emre Thanks for your answer. Don't you think GD approaches using momentum perform so much better?

– Vaalizaadeh
Feb 22 '18 at 18:12

It has for me; momentum functions as a dampener enabling the optimizer to power through rough patches of the loss surface, but here we have a paper that questions this folk wisdom. I'll keep using it until the dust settles.

– Emre
Feb 22 '18 at 18:15

Excuse me sir, @Emre If you want to train a network from scratch based on what you have referred to, you would prefer GD over Adam?

– Vaalizaadeh
Feb 23 '18 at 13:24

I would not, because GD needs tuning, and Adam will beat untuned GD. When I hear "advanced methods" I think of (quasi) second order or natural gradients.

– Emre
Feb 23 '18 at 17:29

|
show 1 more comment

1 Answer
1

active

oldest

votes

In my reply here

Does gradient descent always converge to an optimum?

it is explained that standard gradient descent works well because backtracking gradient descent works well (proven in our recent paper mentioned in the post) and in the long run backtracking gradient descent behaves like the standard gradient descent.

answered Nov 23 '18 at 13:40

Tuyen

313

add a comment |

Your Answer

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f28194%2foptimization-methods-used-in-machine-learning%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

In my reply here

Does gradient descent always converge to an optimum?

answered Nov 23 '18 at 13:40

Tuyen

313

add a comment |

In my reply here

Does gradient descent always converge to an optimum?

answered Nov 23 '18 at 13:40

Tuyen

313

add a comment |

In my reply here

Does gradient descent always converge to an optimum?

answered Nov 23 '18 at 13:40

Tuyen

313

In my reply here

Does gradient descent always converge to an optimum?

answered Nov 23 '18 at 13:40

Tuyen

313

answered Nov 23 '18 at 13:40

Tuyen

313

answered Nov 23 '18 at 13:40

Tuyen

313

answered Nov 23 '18 at 13:40

Tuyen

313

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Data Science Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Hfrxdjt

bumped to the homepage by Community♦ 2 hours ago

bumped to the homepage by Community♦ 2 hours ago

bumped to the homepage by Community♦ 2 hours ago

bumped to the homepage by Community♦ 2 hours ago

1 Answer
1

Your Answer

Post as a guest

1 Answer
1

1 Answer
1

Post as a guest

Popular posts from this blog

bumped to the homepage by Community♦ 2 hours ago

bumped to the homepage by Community♦ 2 hours ago

bumped to the homepage by Community♦ 2 hours ago

bumped to the homepage by Community♦ 2 hours ago

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

1 Answer 1

1 Answer 1

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

1 Answer
1

1 Answer
1

1 Answer
1