Is Adam's optimization susceptible to Local Minima?2019 Community Moderator ElectionWhich Optimization method to use?Can overfitting occur in Advanced Optimization algorithms?Why is vanishing gradient a problem?Does a neural network continue to change after SGD stops improving?local minima vs saddle points in deep learningNeural Network: how to interpret this loss graph?Linear Regression OptimizationWhy are optimization algorithms slower at critical points?How does Gradient Descent and Backpropagation work together?Understanding general approach to updating optimization function parameters

What defenses are there against being summoned by the Gate spell?

Do Phineas and Ferb ever actually get busted in real time?

Do any Labour MPs support no-deal?

Representing power series as a function - what to do with the constant after integration?

What do you call a Matrix-like slowdown and camera movement effect?

How can I fix this gap between bookcases I made?

Download, install and reboot computer at night if needed

Today is the Center

Email Account under attack (really) - anything I can do?

Find original functions from a composite function

Should I join office cleaning event for free?

I probably found a bug with the sudo apt install function

What is the offset in a seaplane's hull?

A Journey Through Space and Time

Why Is Death Allowed In the Matrix?

DOS, create pipe for stdin/stdout of command.com(or 4dos.com) in C or Batch?

How can the DM most effectively choose 1 out of an odd number of players to be targeted by an attack or effect?

Copenhagen passport control - US citizen

How can I hide my bitcoin transactions to protect anonymity from others?

Is it possible to do 50 km distance without any previous training?

When blogging recipes, how can I support both readers who want the narrative/journey and ones who want the printer-friendly recipe?

Why linear maps act like matrix multiplication?

Suffixes -unt and -ut-

Disadvantages of online checking accounts?

Is Adam's optimization susceptible to Local Minima?

2019 Community Moderator ElectionWhich Optimization method to use?Can overfitting occur in Advanced Optimization algorithms?Why is vanishing gradient a problem?Does a neural network continue to change after SGD stops improving?local minima vs saddle points in deep learningNeural Network: how to interpret this loss graph?Linear Regression OptimizationWhy are optimization algorithms slower at critical points?How does Gradient Descent and Backpropagation work together?Understanding general approach to updating optimization function parameters

# Neural Network Architecture 

no_hid_layers = 1
hid = 3
no_out = 1

# Xavier Ininitialization of weights w

w1 = np.random.randn(hid, n+1)*np.sqrt(2/(hid+n+1))
w2 = np.random.randn(no_out, hid+1)*np.sqrt(2/(no_out+hid+1))

# Sigmoid Activation Function
def g(x):
 sig = 1/(1+np.exp(-x))
 return sig

def frwrd_prop(X, w1, w2):
 z2 = w1 @ X.T
 z2 = norm(z2, axis=0)
 a2 = np.insert(g(z2), 0, 1, axis=0)
 h = g((w2@a2))
 return (h,a2)

# Calculating Cost and Gradient

def Cost(X, y, w1, w2, lmbda=0):
 # Initializing Cost J and Gradients dw
 J = 0
 dw1 = np.zeros(w1.shape)
 dw2 = np.zeros(w2.shape)
 # Forward Propagation to calculate the value of the output
 h, a2 = frwrd_prop(X, w1, w2)
 # Calculate the Cost Function J 
 J = -(np.sum(y.T*np.log(h) + (1-y).T*np.log(1-h)) - lmbda/2*(np.sum(np.sum(w1[:,1:].T@w1[:,1:])) + np.sum(w2[:,1:].T@w2[:,1:])))/m
 # Applying Back Propagation to calculate the Gradients dw
 D3 = h-y
 D2 = (w2.T@D3)*a2*(1-a2)
 dw1[:,0] = (D2[1:]@X)[:,0]/m
 dw2[:,0] = (D3@a2.T)[:,0]/m
 dw1[:, 1:] = ((D2[1:]@X)[:,1:] + lmbda*w1[:,1:])/m
 dw2[:, 1:] = ((D3@a2.T)[:,1:] + lmbda*w2[:,1:])/m
 # Gradient clipping
 if(abs(np.linalg.norm(dw1))>4.5):
 dw1 = dw1*4.5/(np.linalg.norm(dw1))
 if(abs(np.linalg.norm(dw2))>4.5):
 dw1 = dw1*4.5/(np.linalg.norm(dw2))
 return (J, dw1, dw2)

# Adam's Optimization technique for training w 

def Train(w1, w2, maxIter=50):
 # Algorithm
 a, b1, b2, e = 0.001, 0.9, 0.999, 10**(-8)
 V1 = np.zeros(w1.shape)
 V2 = np.zeros(w2.shape)
 S1 = np.zeros(w1.shape)
 S2 = np.zeros(w2.shape)
 for i in range(maxIter):
 J, dw1, dw2 = Cost(X, y, w1, w2)
 V1 = b1*V1 + (1-b1)*dw1
 S1 = b2*S1 + (1-b2)*(dw1**2)
 V2 = b1*V2 + (1-b1)*dw2
 S2 = b2*S2 + (1-b2)*(dw2**2)
 if i!=0: 
 V1 = V1/(1-b1**i)
 S1 = S1/(1-b2**i)
 V2 = V2/(1-b1**i)
 S2 = S2/(1-b2**i)
 w1 = w1 - a*V1/(np.sqrt(S1)+e)*dw1
 w2 = w2 - a*V2/(np.sqrt(S2)+e)*dw2
 print("tttIteration : ", i+1, " tCost : ", J)
 return (w1, w2)

# Training Neural Network 

w1, w2 = Train(w1,w2)

I'm using Adam's Optimization to converge Gradient Descent to a global minima but the cost is becoming stagnant (not changing) after around 15 iterations(the number is not fixed). The initial cost due to random initialization of weights is changing very minutely before becoming constant. And this is giving training accuracy from 45% to 70% for different runs of the exact same code. Can you help me with the reason behind this?

asked 4 hours ago

Arka Patra

New contributor

$begingroup$
Welcome to SE.DataScience! Adam and similar optimizers (Nestrov, Nadam, etc.) are all converging to a local minimum, no global optimum is guaranteed. This high variability could be due to (1) too much parameters, (2) too few training samples, (3) bugs in implementation, etc.. As you see, there are many causes for this symptom. You better provide an executable code with all the imports for a fast assessment.
$endgroup$
– Esmailian
4 hours ago

$begingroup$
@Esmailian Hello and Thank you. Is there any way to prevent the gradient from falling into local minima? I think Geoffrey Hinton produced a paper on that but I'm not sure which one. And if that's not possible how to resolve the issue? Besides few training examples or more features is an issue when overfitting but low training accuracy seems to be an issue of underfitting and doesn't the training accuracy be more for less number of features because the weights will adjust more accurately if there's less training example? P.S. I'm writing this in python and have only imported Pandas and NumPy.
$endgroup$
– Arka Patra
2 hours ago

$begingroup$
Is there any way to prevent the gradient from falling into local minima? No. One optimizer may perform better, but all fall into local minima. The high instability of accuracy cannot be attributed to over- or under-fitting surely yet. Please place a code that can be executed with no modification.
$endgroup$
– Esmailian
2 hours ago

$begingroup$
kaggle.com/starkark31/ann-titanic-survival/code Here's a link to the kernel. @Esmailian
$endgroup$
– Arka Patra
2 hours ago

add a comment |

# Neural Network Architecture 

no_hid_layers = 1
hid = 3
no_out = 1

# Xavier Ininitialization of weights w

w1 = np.random.randn(hid, n+1)*np.sqrt(2/(hid+n+1))
w2 = np.random.randn(no_out, hid+1)*np.sqrt(2/(no_out+hid+1))

# Sigmoid Activation Function
def g(x):
 sig = 1/(1+np.exp(-x))
 return sig

def frwrd_prop(X, w1, w2):
 z2 = w1 @ X.T
 z2 = norm(z2, axis=0)
 a2 = np.insert(g(z2), 0, 1, axis=0)
 h = g((w2@a2))
 return (h,a2)

# Calculating Cost and Gradient

def Cost(X, y, w1, w2, lmbda=0):
 # Initializing Cost J and Gradients dw
 J = 0
 dw1 = np.zeros(w1.shape)
 dw2 = np.zeros(w2.shape)
 # Forward Propagation to calculate the value of the output
 h, a2 = frwrd_prop(X, w1, w2)
 # Calculate the Cost Function J 
 J = -(np.sum(y.T*np.log(h) + (1-y).T*np.log(1-h)) - lmbda/2*(np.sum(np.sum(w1[:,1:].T@w1[:,1:])) + np.sum(w2[:,1:].T@w2[:,1:])))/m
 # Applying Back Propagation to calculate the Gradients dw
 D3 = h-y
 D2 = (w2.T@D3)*a2*(1-a2)
 dw1[:,0] = (D2[1:]@X)[:,0]/m
 dw2[:,0] = (D3@a2.T)[:,0]/m
 dw1[:, 1:] = ((D2[1:]@X)[:,1:] + lmbda*w1[:,1:])/m
 dw2[:, 1:] = ((D3@a2.T)[:,1:] + lmbda*w2[:,1:])/m
 # Gradient clipping
 if(abs(np.linalg.norm(dw1))>4.5):
 dw1 = dw1*4.5/(np.linalg.norm(dw1))
 if(abs(np.linalg.norm(dw2))>4.5):
 dw1 = dw1*4.5/(np.linalg.norm(dw2))
 return (J, dw1, dw2)

# Adam's Optimization technique for training w 

def Train(w1, w2, maxIter=50):
 # Algorithm
 a, b1, b2, e = 0.001, 0.9, 0.999, 10**(-8)
 V1 = np.zeros(w1.shape)
 V2 = np.zeros(w2.shape)
 S1 = np.zeros(w1.shape)
 S2 = np.zeros(w2.shape)
 for i in range(maxIter):
 J, dw1, dw2 = Cost(X, y, w1, w2)
 V1 = b1*V1 + (1-b1)*dw1
 S1 = b2*S1 + (1-b2)*(dw1**2)
 V2 = b1*V2 + (1-b1)*dw2
 S2 = b2*S2 + (1-b2)*(dw2**2)
 if i!=0: 
 V1 = V1/(1-b1**i)
 S1 = S1/(1-b2**i)
 V2 = V2/(1-b1**i)
 S2 = S2/(1-b2**i)
 w1 = w1 - a*V1/(np.sqrt(S1)+e)*dw1
 w2 = w2 - a*V2/(np.sqrt(S2)+e)*dw2
 print("tttIteration : ", i+1, " tCost : ", J)
 return (w1, w2)

# Training Neural Network 

w1, w2 = Train(w1,w2)

asked 4 hours ago

Arka Patra

New contributor

$begingroup$
Welcome to SE.DataScience! Adam and similar optimizers (Nestrov, Nadam, etc.) are all converging to a local minimum, no global optimum is guaranteed. This high variability could be due to (1) too much parameters, (2) too few training samples, (3) bugs in implementation, etc.. As you see, there are many causes for this symptom. You better provide an executable code with all the imports for a fast assessment.
$endgroup$
– Esmailian
4 hours ago

$begingroup$
@Esmailian Hello and Thank you. Is there any way to prevent the gradient from falling into local minima? I think Geoffrey Hinton produced a paper on that but I'm not sure which one. And if that's not possible how to resolve the issue? Besides few training examples or more features is an issue when overfitting but low training accuracy seems to be an issue of underfitting and doesn't the training accuracy be more for less number of features because the weights will adjust more accurately if there's less training example? P.S. I'm writing this in python and have only imported Pandas and NumPy.
$endgroup$
– Arka Patra
2 hours ago

$begingroup$
Is there any way to prevent the gradient from falling into local minima? No. One optimizer may perform better, but all fall into local minima. The high instability of accuracy cannot be attributed to over- or under-fitting surely yet. Please place a code that can be executed with no modification.
$endgroup$
– Esmailian
2 hours ago

$begingroup$
kaggle.com/starkark31/ann-titanic-survival/code Here's a link to the kernel. @Esmailian
$endgroup$
– Arka Patra
2 hours ago

add a comment |

# Neural Network Architecture 

no_hid_layers = 1
hid = 3
no_out = 1

# Xavier Ininitialization of weights w

w1 = np.random.randn(hid, n+1)*np.sqrt(2/(hid+n+1))
w2 = np.random.randn(no_out, hid+1)*np.sqrt(2/(no_out+hid+1))

# Sigmoid Activation Function
def g(x):
 sig = 1/(1+np.exp(-x))
 return sig

def frwrd_prop(X, w1, w2):
 z2 = w1 @ X.T
 z2 = norm(z2, axis=0)
 a2 = np.insert(g(z2), 0, 1, axis=0)
 h = g((w2@a2))
 return (h,a2)

# Calculating Cost and Gradient

def Cost(X, y, w1, w2, lmbda=0):
 # Initializing Cost J and Gradients dw
 J = 0
 dw1 = np.zeros(w1.shape)
 dw2 = np.zeros(w2.shape)
 # Forward Propagation to calculate the value of the output
 h, a2 = frwrd_prop(X, w1, w2)
 # Calculate the Cost Function J 
 J = -(np.sum(y.T*np.log(h) + (1-y).T*np.log(1-h)) - lmbda/2*(np.sum(np.sum(w1[:,1:].T@w1[:,1:])) + np.sum(w2[:,1:].T@w2[:,1:])))/m
 # Applying Back Propagation to calculate the Gradients dw
 D3 = h-y
 D2 = (w2.T@D3)*a2*(1-a2)
 dw1[:,0] = (D2[1:]@X)[:,0]/m
 dw2[:,0] = (D3@a2.T)[:,0]/m
 dw1[:, 1:] = ((D2[1:]@X)[:,1:] + lmbda*w1[:,1:])/m
 dw2[:, 1:] = ((D3@a2.T)[:,1:] + lmbda*w2[:,1:])/m
 # Gradient clipping
 if(abs(np.linalg.norm(dw1))>4.5):
 dw1 = dw1*4.5/(np.linalg.norm(dw1))
 if(abs(np.linalg.norm(dw2))>4.5):
 dw1 = dw1*4.5/(np.linalg.norm(dw2))
 return (J, dw1, dw2)

# Adam's Optimization technique for training w 

def Train(w1, w2, maxIter=50):
 # Algorithm
 a, b1, b2, e = 0.001, 0.9, 0.999, 10**(-8)
 V1 = np.zeros(w1.shape)
 V2 = np.zeros(w2.shape)
 S1 = np.zeros(w1.shape)
 S2 = np.zeros(w2.shape)
 for i in range(maxIter):
 J, dw1, dw2 = Cost(X, y, w1, w2)
 V1 = b1*V1 + (1-b1)*dw1
 S1 = b2*S1 + (1-b2)*(dw1**2)
 V2 = b1*V2 + (1-b1)*dw2
 S2 = b2*S2 + (1-b2)*(dw2**2)
 if i!=0: 
 V1 = V1/(1-b1**i)
 S1 = S1/(1-b2**i)
 V2 = V2/(1-b1**i)
 S2 = S2/(1-b2**i)
 w1 = w1 - a*V1/(np.sqrt(S1)+e)*dw1
 w2 = w2 - a*V2/(np.sqrt(S2)+e)*dw2
 print("tttIteration : ", i+1, " tCost : ", J)
 return (w1, w2)

# Training Neural Network 

w1, w2 = Train(w1,w2)

asked 4 hours ago

Arka Patra

New contributor

# Neural Network Architecture 

no_hid_layers = 1
hid = 3
no_out = 1

# Xavier Ininitialization of weights w

w1 = np.random.randn(hid, n+1)*np.sqrt(2/(hid+n+1))
w2 = np.random.randn(no_out, hid+1)*np.sqrt(2/(no_out+hid+1))

# Sigmoid Activation Function
def g(x):
 sig = 1/(1+np.exp(-x))
 return sig

def frwrd_prop(X, w1, w2):
 z2 = w1 @ X.T
 z2 = norm(z2, axis=0)
 a2 = np.insert(g(z2), 0, 1, axis=0)
 h = g((w2@a2))
 return (h,a2)

# Calculating Cost and Gradient

def Cost(X, y, w1, w2, lmbda=0):
 # Initializing Cost J and Gradients dw
 J = 0
 dw1 = np.zeros(w1.shape)
 dw2 = np.zeros(w2.shape)
 # Forward Propagation to calculate the value of the output
 h, a2 = frwrd_prop(X, w1, w2)
 # Calculate the Cost Function J 
 J = -(np.sum(y.T*np.log(h) + (1-y).T*np.log(1-h)) - lmbda/2*(np.sum(np.sum(w1[:,1:].T@w1[:,1:])) + np.sum(w2[:,1:].T@w2[:,1:])))/m
 # Applying Back Propagation to calculate the Gradients dw
 D3 = h-y
 D2 = (w2.T@D3)*a2*(1-a2)
 dw1[:,0] = (D2[1:]@X)[:,0]/m
 dw2[:,0] = (D3@a2.T)[:,0]/m
 dw1[:, 1:] = ((D2[1:]@X)[:,1:] + lmbda*w1[:,1:])/m
 dw2[:, 1:] = ((D3@a2.T)[:,1:] + lmbda*w2[:,1:])/m
 # Gradient clipping
 if(abs(np.linalg.norm(dw1))>4.5):
 dw1 = dw1*4.5/(np.linalg.norm(dw1))
 if(abs(np.linalg.norm(dw2))>4.5):
 dw1 = dw1*4.5/(np.linalg.norm(dw2))
 return (J, dw1, dw2)

# Adam's Optimization technique for training w 

def Train(w1, w2, maxIter=50):
 # Algorithm
 a, b1, b2, e = 0.001, 0.9, 0.999, 10**(-8)
 V1 = np.zeros(w1.shape)
 V2 = np.zeros(w2.shape)
 S1 = np.zeros(w1.shape)
 S2 = np.zeros(w2.shape)
 for i in range(maxIter):
 J, dw1, dw2 = Cost(X, y, w1, w2)
 V1 = b1*V1 + (1-b1)*dw1
 S1 = b2*S1 + (1-b2)*(dw1**2)
 V2 = b1*V2 + (1-b1)*dw2
 S2 = b2*S2 + (1-b2)*(dw2**2)
 if i!=0: 
 V1 = V1/(1-b1**i)
 S1 = S1/(1-b2**i)
 V2 = V2/(1-b1**i)
 S2 = S2/(1-b2**i)
 w1 = w1 - a*V1/(np.sqrt(S1)+e)*dw1
 w2 = w2 - a*V2/(np.sqrt(S2)+e)*dw2
 print("tttIteration : ", i+1, " tCost : ", J)
 return (w1, w2)

# Training Neural Network 

w1, w2 = Train(w1,w2)

optimization gradient-descent loss-function

asked 4 hours ago

Arka Patra

New contributor

asked 4 hours ago

Arka Patra

New contributor

asked 4 hours ago

Arka Patra

New contributor

asked 4 hours ago

Arka Patra

asked 4 hours ago

Arka Patra

New contributor

Arka Patra is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

$begingroup$
Welcome to SE.DataScience! Adam and similar optimizers (Nestrov, Nadam, etc.) are all converging to a local minimum, no global optimum is guaranteed. This high variability could be due to (1) too much parameters, (2) too few training samples, (3) bugs in implementation, etc.. As you see, there are many causes for this symptom. You better provide an executable code with all the imports for a fast assessment.
$endgroup$
– Esmailian
4 hours ago

$begingroup$
@Esmailian Hello and Thank you. Is there any way to prevent the gradient from falling into local minima? I think Geoffrey Hinton produced a paper on that but I'm not sure which one. And if that's not possible how to resolve the issue? Besides few training examples or more features is an issue when overfitting but low training accuracy seems to be an issue of underfitting and doesn't the training accuracy be more for less number of features because the weights will adjust more accurately if there's less training example? P.S. I'm writing this in python and have only imported Pandas and NumPy.
$endgroup$
– Arka Patra
2 hours ago

$begingroup$
Is there any way to prevent the gradient from falling into local minima? No. One optimizer may perform better, but all fall into local minima. The high instability of accuracy cannot be attributed to over- or under-fitting surely yet. Please place a code that can be executed with no modification.
$endgroup$
– Esmailian
2 hours ago

$begingroup$
kaggle.com/starkark31/ann-titanic-survival/code Here's a link to the kernel. @Esmailian
$endgroup$
– Arka Patra
2 hours ago

add a comment |

$begingroup$
Welcome to SE.DataScience! Adam and similar optimizers (Nestrov, Nadam, etc.) are all converging to a local minimum, no global optimum is guaranteed. This high variability could be due to (1) too much parameters, (2) too few training samples, (3) bugs in implementation, etc.. As you see, there are many causes for this symptom. You better provide an executable code with all the imports for a fast assessment.
$endgroup$
– Esmailian
4 hours ago

$begingroup$
@Esmailian Hello and Thank you. Is there any way to prevent the gradient from falling into local minima? I think Geoffrey Hinton produced a paper on that but I'm not sure which one. And if that's not possible how to resolve the issue? Besides few training examples or more features is an issue when overfitting but low training accuracy seems to be an issue of underfitting and doesn't the training accuracy be more for less number of features because the weights will adjust more accurately if there's less training example? P.S. I'm writing this in python and have only imported Pandas and NumPy.
$endgroup$
– Arka Patra
2 hours ago

$begingroup$
Is there any way to prevent the gradient from falling into local minima? No. One optimizer may perform better, but all fall into local minima. The high instability of accuracy cannot be attributed to over- or under-fitting surely yet. Please place a code that can be executed with no modification.
$endgroup$
– Esmailian
2 hours ago

$begingroup$
kaggle.com/starkark31/ann-titanic-survival/code Here's a link to the kernel. @Esmailian
$endgroup$
– Arka Patra
2 hours ago

Welcome to SE.DataScience! Adam and similar optimizers (Nestrov, Nadam, etc.) are all converging to a local minimum, no global optimum is guaranteed. This high variability could be due to (1) too much parameters, (2) too few training samples, (3) bugs in implementation, etc.. As you see, there are many causes for this symptom. You better provide an executable code with all the imports for a fast assessment.

– Esmailian
4 hours ago

@Esmailian Hello and Thank you. Is there any way to prevent the gradient from falling into local minima? I think Geoffrey Hinton produced a paper on that but I'm not sure which one. And if that's not possible how to resolve the issue? Besides few training examples or more features is an issue when overfitting but low training accuracy seems to be an issue of underfitting and doesn't the training accuracy be more for less number of features because the weights will adjust more accurately if there's less training example? P.S. I'm writing this in python and have only imported Pandas and NumPy.

– Arka Patra
2 hours ago

Is there any way to prevent the gradient from falling into local minima? No. One optimizer may perform better, but all fall into local minima. The high instability of accuracy cannot be attributed to over- or under-fitting surely yet. Please place a code that can be executed with no modification.

– Esmailian
2 hours ago

kaggle.com/starkark31/ann-titanic-survival/code Here's a link to the kernel. @Esmailian

– Arka Patra
2 hours ago

add a comment |

0

active

oldest

votes

Your Answer

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

Arka Patra is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48825%2fis-adams-optimization-susceptible-to-local-minima%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

0

active

oldest

votes

0

active

oldest

votes

Arka Patra is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Arka Patra is a new contributor. Be nice, and check out our Code of Conduct.

Thanks for contributing an answer to Data Science Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Hfrxdjt

0

Your Answer

Post as a guest

0

0

Post as a guest

Popular posts from this blog

0

Your Answer

Sign up or log in

Post as a guest

Post as a guest

0

0

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog