Learning capacity: Deep Learning vs Traditional (Shallow) Learning The Next CEO of Stack Overflow2019 Community Moderator ElectionWhere does the “deep learning needs big data” rule come fromDeep learning basicsDeep learning facial recognition research projectFuture of deep learning (compared to traditional machine learning)Machine Learning vs Deep LearningMatch an image from a set of images : Combine traditional Computer vision + Deep Learning/CNNRecommended papers for deep learning based classification?Watermark detection using Deep LearningDeep Learning Network decreasing in accuracyDeep learning: Training in batchesNumpy Python deep learning framework

ls Ordering[Ordering[list]] optimal?

Why doesn't a table tennis ball float on the surface? How do we calculate buoyancy here?

Increase performance creating Mandelbrot set in python

What happens if you roll doubles 3 times then land on "Go to jail?"

What is the difference in properties java.runtime.version and java.version

MAZDA 3 2006 (UK) - poor acceleration then takes off at 3250 revs

Number of words that can be made using all the letters of the word W, if Os as well as Is are separated is?

Whats the best way to handle refactoring a big file?

Is HostGator storing my password in plaintext?

What can we do to stop prior company from asking us questions?

What is the meaning of "rider"?

What does "colonia" and "interior/dpto" means in Mexico on address lines?

Grabbing quick drinks

Why use "finir par" instead of "finir de" before an infinitive?

The King's new dress

Any way to transfer all permissions from one role to another?

How do I get the green key off the shelf in the Dobby level of Lego Harry Potter 2?

Is it a good idea to use COLUMN AS (left([Another_Column],(4)) insetead of LEFT in the select?

Can "Reverse Gravity" affect Meteor Swarm?

Is it my responsibility to learn a new technology in my own time my employer wants to implement?

suction cup thing with 1/4 TRS cable?

Return of the Riley Riddles in Reverse

In place solution to remove duplicates from a sorted list

How do I go from 300 unfinished/half written blog posts, to published posts?

Learning capacity: Deep Learning vs Traditional (Shallow) Learning

The Next CEO of Stack Overflow

2019 Community Moderator ElectionWhere does the “deep learning needs big data” rule come fromDeep learning basicsDeep learning facial recognition research projectFuture of deep learning (compared to traditional machine learning)Machine Learning vs Deep LearningMatch an image from a set of images : Combine traditional Computer vision + Deep Learning/CNNRecommended papers for deep learning based classification?Watermark detection using Deep LearningDeep Learning Network decreasing in accuracyDeep learning: Training in batchesNumpy Python deep learning framework

I am currently doing a course in coursera in which Andrew Ng draws the following image:

erformance of deep learning vs traditional ML as a function of dataset size

Does anybody know any references/reasoning that justify the drawn graph? Were any experiments conducted to obtain it? If so, could you point me to them?

I have already asked in the coursera forum and sent an e-mail to the support team with no avail.

Thank you

edited 1 hour ago

Esmailian

2,023217

asked Jan 29 at 11:04

asantos

263

add a comment |

I am currently doing a course in coursera in which Andrew Ng draws the following image:

erformance of deep learning vs traditional ML as a function of dataset size

Does anybody know any references/reasoning that justify the drawn graph? Were any experiments conducted to obtain it? If so, could you point me to them?

I have already asked in the coursera forum and sent an e-mail to the support team with no avail.

Thank you

edited 1 hour ago

Esmailian

2,023217

asked Jan 29 at 11:04

asantos

263

add a comment |

I am currently doing a course in coursera in which Andrew Ng draws the following image:

erformance of deep learning vs traditional ML as a function of dataset size

Does anybody know any references/reasoning that justify the drawn graph? Were any experiments conducted to obtain it? If so, could you point me to them?

I have already asked in the coursera forum and sent an e-mail to the support team with no avail.

Thank you

edited 1 hour ago

Esmailian

2,023217

asked Jan 29 at 11:04

asantos

263

I am currently doing a course in coursera in which Andrew Ng draws the following image:

erformance of deep learning vs traditional ML as a function of dataset size

Does anybody know any references/reasoning that justify the drawn graph? Were any experiments conducted to obtain it? If so, could you point me to them?

I have already asked in the coursera forum and sent an e-mail to the support team with no avail.

Thank you

machine-learning neural-network deep-learning

edited 1 hour ago

Esmailian

2,023217

asked Jan 29 at 11:04

asantos

263

edited 1 hour ago

Esmailian

2,023217

asked Jan 29 at 11:04

asantos

263

edited 1 hour ago

Esmailian

2,023217

edited 1 hour ago

Esmailian

2,023217

edited 1 hour ago

Esmailian

2,023217

asked Jan 29 at 11:04

asantos

263

asked Jan 29 at 11:04

asantos

263

asked Jan 29 at 11:04

asantos

263

add a comment |

4 Answers
4

active

oldest

votes

This image comes from the personal experience of computer scientists working with Deep learning networks for quite sometime now. This holds true for all the deep learning work one does, where the increase in the size of the dataset and with a well defined problem, scales better than traditional ML, simply because of the capabilities a deep learning network provides.

answered Jan 29 at 14:07

Nischal Hp

48829

add a comment |

There are a number of reasons why deep learning scales better with more data than traditional machine learning, in particular in areas of computer vision and speech recognition (where deep learning has been most successful).

Many traditional machine learning models (nearest neighbors, naive bayes, tree based models, kernel machines etc.) assume that predictions of for a new point should be somewhat close to neighbors of that point or that the output can be interpolated (in some way) from those neighboring points. However, in high dimensions, there often are no (obvious) neighbors. (Often called the curse of dimensionality.)

That means that traditional machine learning often can "only" generalize well locally. If you want to generalize non-locally, you need to introduce dependencies between the regions of the input space by introducing additional assumptions into your model.

To me this is most obvious in object recognition. Throwing together a bunch of random pixel rarely (never) gives you an image of an actual object. That is because most objects we care about in object recognition only make up a tiny portion of all possible images in $mathbbR^3$. To be more efficient we should make additional assumptions about the input space.

A convolutional neural network architecture, for example, shares parameters and applies the convolutional kernels at every input pixel. That is the assumption that features/objects should be the same, independent of the location in the image. This assumption makes CNNs a lot(!) more scalable and generalize better to the task of object recognition at the same time.

RNNs make similar assumptions for sequence data.

Large neural networks can therefore scale to more data more efficiently. Traditional machine learning models, on the other hand, will be "outpaced" by the exponential challenge of the curse of dimensionality because their capacity is somewhat linearly related to the number of diverse inputs. Neural networks, given the right assumptions (i.e. the right NN for the right task), can generalize to exponentially many more regions of interest with the same number of inputs.

At the end of chapter 5 in the "Deep Learning" there is an introduction to the challenges of traditional machine learning and some references to experiments supporting these hypotheses.

answered Jan 29 at 19:11

oW_

3,261732

add a comment |

I believe this statement can be supported with the concept of VC dimension. This blogpost provides a simplified explanation of the term. VC dimension can be understood as an ability of a classifier to learn complex dependencies in the data. From other hand, models with a huge VC bound tend to overfit on small amounts of data.

However, the plot you provided here depicts what is happening when our training set grows indefinitely - the win situation for models with a large VC dimension.

The VC bound of neural networks is something like polynomial from number of weights and connections between them. While VC dimension of a SVM classifier is linear to the dimensionality of the space it operates in.

Thus, neural networks can benefit more from larger amounts of training data. And the more weights you have, the better will be the result. Of course, given that the number of training samples is at least 10 times more than number of weights (a rule of thumb), so you do not face overfitting. That is why data augmentation and regularization are so important in deep learning.

answered Jan 29 at 19:15

Dmytro Prylipko

4667

add a comment |

This plot is just a conceptual sketch. It is used to convey a message at page 10 of "2018 Machine Learning Yearning (Draft version) - Andrew Ng". I checked "2016 Deep Learning - Ian Goodfellow, et al." and many seminal papers on deep learning and did not find a real example.

A hidden assumption:

A key assumption of this sketch is the number of model parameters, if the number of parameters is kept constant and similar for deep and shallow models, this plot occurs. Otherwise, for arbitrarily large training data (right side of the plot), any universal learner, as simple as a neural net with only one hidden layer (link), can perform as well as any deep learner if the number of parameters is allowed to increase arbitrarily.

Power of deep learning:

The literature that justifies the power of deep learning is quite technical; for example this paper, and this paper. To proceed, "capacity" of a model must be understood first. Roughly speaking, capacity of a model is the most complex function that it can represent; one formalization of this concept is "VC dimension". In theory, if capacity of a shallow model with $m$ parameters is $O(c)$, capacity of its deep counterpart with $m$ parameters that are spread across $d$ layers is roughly $O(c^d)$. This is an exponential growth in capacity with respect to depth $d$. As a result, with more training data, a deep model does not reach its capacity as fast as a shallow model with the same amount of parameters. This is what the sketch tries to convey.

answered Mar 1 at 10:45

Esmailian

2,023217

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f44768%2flearning-capacity-deep-learning-vs-traditional-shallow-learning%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

4 Answers
4

active

oldest

votes

4 Answers
4

active

oldest

votes

answered Jan 29 at 14:07

Nischal Hp

48829

add a comment |

answered Jan 29 at 14:07

Nischal Hp

48829

add a comment |

answered Jan 29 at 14:07

Nischal Hp

48829

answered Jan 29 at 14:07

Nischal Hp

48829

answered Jan 29 at 14:07

Nischal Hp

48829

answered Jan 29 at 14:07

Nischal Hp

48829

answered Jan 29 at 14:07

Nischal Hp

48829

add a comment |

RNNs make similar assumptions for sequence data.

At the end of chapter 5 in the "Deep Learning" there is an introduction to the challenges of traditional machine learning and some references to experiments supporting these hypotheses.

answered Jan 29 at 19:11

oW_

3,261732

add a comment |

RNNs make similar assumptions for sequence data.

At the end of chapter 5 in the "Deep Learning" there is an introduction to the challenges of traditional machine learning and some references to experiments supporting these hypotheses.

answered Jan 29 at 19:11

oW_

3,261732

add a comment |

RNNs make similar assumptions for sequence data.

At the end of chapter 5 in the "Deep Learning" there is an introduction to the challenges of traditional machine learning and some references to experiments supporting these hypotheses.

answered Jan 29 at 19:11

oW_

3,261732

RNNs make similar assumptions for sequence data.

At the end of chapter 5 in the "Deep Learning" there is an introduction to the challenges of traditional machine learning and some references to experiments supporting these hypotheses.

answered Jan 29 at 19:11

oW_

3,261732

answered Jan 29 at 19:11

oW_

3,261732

answered Jan 29 at 19:11

oW_

3,261732

answered Jan 29 at 19:11

oW_

3,261732

add a comment |

However, the plot you provided here depicts what is happening when our training set grows indefinitely - the win situation for models with a large VC dimension.

answered Jan 29 at 19:15

Dmytro Prylipko

4667

add a comment |

However, the plot you provided here depicts what is happening when our training set grows indefinitely - the win situation for models with a large VC dimension.

answered Jan 29 at 19:15

Dmytro Prylipko

4667

add a comment |

However, the plot you provided here depicts what is happening when our training set grows indefinitely - the win situation for models with a large VC dimension.

answered Jan 29 at 19:15

Dmytro Prylipko

4667

However, the plot you provided here depicts what is happening when our training set grows indefinitely - the win situation for models with a large VC dimension.

answered Jan 29 at 19:15

Dmytro Prylipko

4667

answered Jan 29 at 19:15

Dmytro Prylipko

4667

answered Jan 29 at 19:15

Dmytro Prylipko

4667

answered Jan 29 at 19:15

Dmytro Prylipko

4667

add a comment |

A hidden assumption:

Power of deep learning:

answered Mar 1 at 10:45

Esmailian

2,023217

add a comment |

A hidden assumption:

Power of deep learning:

answered Mar 1 at 10:45

Esmailian

2,023217

add a comment |

A hidden assumption:

Power of deep learning:

answered Mar 1 at 10:45

Esmailian

2,023217

A hidden assumption:

Power of deep learning:

answered Mar 1 at 10:45

Esmailian

2,023217

answered Mar 1 at 10:45

Esmailian

2,023217

answered Mar 1 at 10:45

Esmailian

2,023217

answered Mar 1 at 10:45

Esmailian

2,023217

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Data Science Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Hfrxdjt

4 Answers
4

Your Answer

Post as a guest

4 Answers
4

4 Answers
4

Post as a guest

Popular posts from this blog

4 Answers 4

Your Answer

Sign up or log in

Post as a guest

Post as a guest

4 Answers 4

4 Answers 4

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

4 Answers
4

4 Answers
4

4 Answers
4