What is the neural network architecture behind Facebook's Starspace model? Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern) 2019 Moderator Election Q&A - Questionnaire 2019 Community Moderator Election ResultsProve Reccurrent Neural Network can exhibit oscillatory behaviorClipping threshold of softmax layerHow to create a multi-dimensional softmax output in Tensorflow?Decomposable output regression neural networkTriplet loss - what threshold to use to detect similarity between two embeddings?What is the intuition behind using 2 consecutive convolutional filters in a Convolutional Neural Network?Possible reasons for word2vec learning context words as most similar rather than words in similar contextsConfusion about Entity Embeddings of Categorical Variables - Working Example!hypeparameters tuning neural network according to loss vs according to scoring functionWhat is the motivation for row-wise convolution and folding in Kalchbrenner et al. (2014)?

Why is a lens darker than other ones when applying the same settings?

Where did Ptolemy compare the Earth to the distance of fixed stars?

systemd and copy (/bin/cp): no such file or directory

When to apply negative sign when number is squared

What did Turing mean when saying that "machines cannot give rise to surprises" is due to a fallacy?

Should man-made satellites feature an intelligent inverted "cow catcher"?

Random body shuffle every night—can we still function?

By what mechanism was the 2017 UK General Election called?

Understanding piped command in Gnu/Linux

How to resize main filesystem

NIntegrate on a solution of a matrix ODE

Did John Wesley plagiarize Matthew Henry...?

Is the time—manner—place ordering of adverbials an oversimplification?

Did pre-Columbian Americans know the spherical shape of the Earth?

What is a more techy Technical Writer job title that isn't cutesy or confusing?

How much damage would a cupful of neutron star matter do to the Earth?

Putting class ranking in CV, but against dept guidelines

Trying to understand entropy as a novice in thermodynamics

Why can't fire hurt Daenerys but it did to Jon Snow in season 1?

Why weren't discrete x86 CPUs ever used in game hardware?

Why are two-digit numbers in Jonathan Swift's "Gulliver's Travels" (1726) written in "German style"?

How to make triangles with rounded sides and corners? (squircle with 3 sides)

First paper to introduce the "principal-agent problem"

How does TikZ render an arc?

What is the neural network architecture behind Facebook's Starspace model?

Announcing the arrival of Valued Associate #679: Cesar Manara

Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern)

2019 Moderator Election Q&A - Questionnaire

2019 Community Moderator Election ResultsProve Reccurrent Neural Network can exhibit oscillatory behaviorClipping threshold of softmax layerHow to create a multi-dimensional softmax output in Tensorflow?Decomposable output regression neural networkTriplet loss - what threshold to use to detect similarity between two embeddings?What is the intuition behind using 2 consecutive convolutional filters in a Convolutional Neural Network?Possible reasons for word2vec learning context words as most similar rather than words in similar contextsConfusion about Entity Embeddings of Categorical Variables - Working Example!hypeparameters tuning neural network according to loss vs according to scoring functionWhat is the motivation for row-wise convolution and folding in Kalchbrenner et al. (2014)?

Recently, Facebook released a paper concerning a general purpose neural embedding model called StarSpace.

In their paper, they explain the loss function and the training procedure of the model, but they don't emphasize much on the architecture of the model.

Does somebody know what the NN behind looks like?

edited Nov 19 '18 at 8:13

asked Nov 18 '18 at 12:08

ChiPlusPlus

138111

add a comment |

Recently, Facebook released a paper concerning a general purpose neural embedding model called StarSpace.

In their paper, they explain the loss function and the training procedure of the model, but they don't emphasize much on the architecture of the model.

Does somebody know what the NN behind looks like?

edited Nov 19 '18 at 8:13

asked Nov 18 '18 at 12:08

ChiPlusPlus

138111

add a comment |

Recently, Facebook released a paper concerning a general purpose neural embedding model called StarSpace.

In their paper, they explain the loss function and the training procedure of the model, but they don't emphasize much on the architecture of the model.

Does somebody know what the NN behind looks like?

edited Nov 19 '18 at 8:13

asked Nov 18 '18 at 12:08

ChiPlusPlus

138111

Recently, Facebook released a paper concerning a general purpose neural embedding model called StarSpace.

In their paper, they explain the loss function and the training procedure of the model, but they don't emphasize much on the architecture of the model.

Does somebody know what the NN behind looks like?

deep-learning word-embeddings embeddings

edited Nov 19 '18 at 8:13

asked Nov 18 '18 at 12:08

ChiPlusPlus

138111

edited Nov 19 '18 at 8:13

asked Nov 18 '18 at 12:08

ChiPlusPlus

138111

edited Nov 19 '18 at 8:13

asked Nov 18 '18 at 12:08

ChiPlusPlus

138111

asked Nov 18 '18 at 12:08

ChiPlusPlus

138111

asked Nov 18 '18 at 12:08

ChiPlusPlus

138111

add a comment |

1 Answer
1

active

oldest

votes

Calling StarSpace a neural model would be misleading I think. You could certainly think of the it as a neural network with a single layer and a linear activation function, but I think don't think that would be very illuminating. They didn't discuss the architecture much in that paper for a reason- there isn't really any in terms of layers of neurons, activation functions, latent variables, or anything else except the constraints on the number of dimensions.

In fact, the most helpful way to think about StarSpace is that, at its core like many (maybe most) popular embedding techniques across natural language, graphs, etc it's a low rank matrix factorization. What the sampling procedures are doing is using the data in some way to produce a positive definite gram matrix. It doesn't appear that way initially because that is done solely through sampling- if you were to find the expectations of each input/target pair, however, you'd find that the optimization objective is maximizing the expected value of vector similarity over a joint distribution on pairs of items minus a marginal distribution (this is due to the negative samples). Essentially, the goal is to maximize the difference in similarity between items that are frequently sampled and the similarity between items sampled independently from the marginal distribution.

If this sounds familiar to SGNS implicitly factorizing a shifted PPMI matrix or GlOVe explicitly factorizing a relaxed variant of the same, good. The specifics are different and StarSpace has significantly more flexibility in the sampling distribution it works with, but the principle is the same. "Neural Word Embedding as Implicit Matrix Factorization" and "Improving Distributional Similarity with Lessons Learned from Word Embeddings" are fantastic papers by Levy, 2014 and 2015 if I recall correctly, that discuss the connections between neural embeddings and explicit matrix factorization techniques like PPMI-SVD and glove and the principles that make them successful.

Similarly, "Network Embedding as Matrix Factorization: Unifying
DeepWalk, LINE, PTE, and node2vec" is a great discussion of the connections between neural network embeddings and the same implicit objectives as the neural word embeddings.

In short- it doesn't sound like there's much going on in StarSpace as far as architecture because there isn't. It's quite literally adjusting the placement of points in the embedding space to make associated items more similar to each other than to unrelated items.

answered 3 hours ago

Cameron King

211

New contributor

add a comment |

Your Answer

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f41376%2fwhat-is-the-neural-network-architecture-behind-facebooks-starspace-model%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

answered 3 hours ago

Cameron King

211

New contributor

add a comment |

answered 3 hours ago

Cameron King

211

New contributor

add a comment |

answered 3 hours ago

Cameron King

211

New contributor

answered 3 hours ago

Cameron King

211

New contributor

answered 3 hours ago

Cameron King

211

New contributor

answered 3 hours ago

Cameron King

211

answered 3 hours ago

Cameron King

211

New contributor

Cameron King is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Data Science Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Hfrxdjt

1 Answer
1

Your Answer

Post as a guest

1 Answer
1

1 Answer
1

Post as a guest

Popular posts from this blog

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

1 Answer 1

1 Answer 1

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

1 Answer
1

1 Answer
1

1 Answer
1