What is the neural network architecture behind Facebook's Starspace model? Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern) 2019 Moderator Election Q&A - Questionnaire 2019 Community Moderator Election ResultsProve Reccurrent Neural Network can exhibit oscillatory behaviorClipping threshold of softmax layerHow to create a multi-dimensional softmax output in Tensorflow?Decomposable output regression neural networkTriplet loss - what threshold to use to detect similarity between two embeddings?What is the intuition behind using 2 consecutive convolutional filters in a Convolutional Neural Network?Possible reasons for word2vec learning context words as most similar rather than words in similar contextsConfusion about Entity Embeddings of Categorical Variables - Working Example!hypeparameters tuning neural network according to loss vs according to scoring functionWhat is the motivation for row-wise convolution and folding in Kalchbrenner et al. (2014)?
Why is a lens darker than other ones when applying the same settings?
Where did Ptolemy compare the Earth to the distance of fixed stars?
systemd and copy (/bin/cp): no such file or directory
When to apply negative sign when number is squared
What did Turing mean when saying that "machines cannot give rise to surprises" is due to a fallacy?
Should man-made satellites feature an intelligent inverted "cow catcher"?
Random body shuffle every night—can we still function?
By what mechanism was the 2017 UK General Election called?
Understanding piped command in Gnu/Linux
How to resize main filesystem
NIntegrate on a solution of a matrix ODE
Did John Wesley plagiarize Matthew Henry...?
Is the time—manner—place ordering of adverbials an oversimplification?
Did pre-Columbian Americans know the spherical shape of the Earth?
What is a more techy Technical Writer job title that isn't cutesy or confusing?
How much damage would a cupful of neutron star matter do to the Earth?
Putting class ranking in CV, but against dept guidelines
Trying to understand entropy as a novice in thermodynamics
Why can't fire hurt Daenerys but it did to Jon Snow in season 1?
Why weren't discrete x86 CPUs ever used in game hardware?
Why are two-digit numbers in Jonathan Swift's "Gulliver's Travels" (1726) written in "German style"?
How to make triangles with rounded sides and corners? (squircle with 3 sides)
First paper to introduce the "principal-agent problem"
How does TikZ render an arc?
What is the neural network architecture behind Facebook's Starspace model?
Announcing the arrival of Valued Associate #679: Cesar Manara
Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern)
2019 Moderator Election Q&A - Questionnaire
2019 Community Moderator Election ResultsProve Reccurrent Neural Network can exhibit oscillatory behaviorClipping threshold of softmax layerHow to create a multi-dimensional softmax output in Tensorflow?Decomposable output regression neural networkTriplet loss - what threshold to use to detect similarity between two embeddings?What is the intuition behind using 2 consecutive convolutional filters in a Convolutional Neural Network?Possible reasons for word2vec learning context words as most similar rather than words in similar contextsConfusion about Entity Embeddings of Categorical Variables - Working Example!hypeparameters tuning neural network according to loss vs according to scoring functionWhat is the motivation for row-wise convolution and folding in Kalchbrenner et al. (2014)?
$begingroup$
Recently, Facebook released a paper concerning a general purpose neural embedding model called StarSpace.
In their paper, they explain the loss function and the training procedure of the model, but they don't emphasize much on the architecture of the model.
Does somebody know what the NN behind looks like?
deep-learning word-embeddings embeddings
$endgroup$
add a comment |
$begingroup$
Recently, Facebook released a paper concerning a general purpose neural embedding model called StarSpace.
In their paper, they explain the loss function and the training procedure of the model, but they don't emphasize much on the architecture of the model.
Does somebody know what the NN behind looks like?
deep-learning word-embeddings embeddings
$endgroup$
add a comment |
$begingroup$
Recently, Facebook released a paper concerning a general purpose neural embedding model called StarSpace.
In their paper, they explain the loss function and the training procedure of the model, but they don't emphasize much on the architecture of the model.
Does somebody know what the NN behind looks like?
deep-learning word-embeddings embeddings
$endgroup$
Recently, Facebook released a paper concerning a general purpose neural embedding model called StarSpace.
In their paper, they explain the loss function and the training procedure of the model, but they don't emphasize much on the architecture of the model.
Does somebody know what the NN behind looks like?
deep-learning word-embeddings embeddings
deep-learning word-embeddings embeddings
edited Nov 19 '18 at 8:13
ChiPlusPlus
asked Nov 18 '18 at 12:08
ChiPlusPlusChiPlusPlus
138111
138111
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
$begingroup$
Calling StarSpace a neural model would be misleading I think. You could certainly think of the it as a neural network with a single layer and a linear activation function, but I think don't think that would be very illuminating. They didn't discuss the architecture much in that paper for a reason- there isn't really any in terms of layers of neurons, activation functions, latent variables, or anything else except the constraints on the number of dimensions.
In fact, the most helpful way to think about StarSpace is that, at its core like many (maybe most) popular embedding techniques across natural language, graphs, etc it's a low rank matrix factorization. What the sampling procedures are doing is using the data in some way to produce a positive definite gram matrix. It doesn't appear that way initially because that is done solely through sampling- if you were to find the expectations of each input/target pair, however, you'd find that the optimization objective is maximizing the expected value of vector similarity over a joint distribution on pairs of items minus a marginal distribution (this is due to the negative samples). Essentially, the goal is to maximize the difference in similarity between items that are frequently sampled and the similarity between items sampled independently from the marginal distribution.
If this sounds familiar to SGNS implicitly factorizing a shifted PPMI matrix or GlOVe explicitly factorizing a relaxed variant of the same, good. The specifics are different and StarSpace has significantly more flexibility in the sampling distribution it works with, but the principle is the same. "Neural Word Embedding as Implicit Matrix Factorization" and "Improving Distributional Similarity with Lessons Learned from Word Embeddings" are fantastic papers by Levy, 2014 and 2015 if I recall correctly, that discuss the connections between neural embeddings and explicit matrix factorization techniques like PPMI-SVD and glove and the principles that make them successful.
Similarly, "Network Embedding as Matrix Factorization: Unifying
DeepWalk, LINE, PTE, and node2vec" is a great discussion of the connections between neural network embeddings and the same implicit objectives as the neural word embeddings.
In short- it doesn't sound like there's much going on in StarSpace as far as architecture because there isn't. It's quite literally adjusting the placement of points in the embedding space to make associated items more similar to each other than to unrelated items.
New contributor
Cameron King is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
$endgroup$
add a comment |
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f41376%2fwhat-is-the-neural-network-architecture-behind-facebooks-starspace-model%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
Calling StarSpace a neural model would be misleading I think. You could certainly think of the it as a neural network with a single layer and a linear activation function, but I think don't think that would be very illuminating. They didn't discuss the architecture much in that paper for a reason- there isn't really any in terms of layers of neurons, activation functions, latent variables, or anything else except the constraints on the number of dimensions.
In fact, the most helpful way to think about StarSpace is that, at its core like many (maybe most) popular embedding techniques across natural language, graphs, etc it's a low rank matrix factorization. What the sampling procedures are doing is using the data in some way to produce a positive definite gram matrix. It doesn't appear that way initially because that is done solely through sampling- if you were to find the expectations of each input/target pair, however, you'd find that the optimization objective is maximizing the expected value of vector similarity over a joint distribution on pairs of items minus a marginal distribution (this is due to the negative samples). Essentially, the goal is to maximize the difference in similarity between items that are frequently sampled and the similarity between items sampled independently from the marginal distribution.
If this sounds familiar to SGNS implicitly factorizing a shifted PPMI matrix or GlOVe explicitly factorizing a relaxed variant of the same, good. The specifics are different and StarSpace has significantly more flexibility in the sampling distribution it works with, but the principle is the same. "Neural Word Embedding as Implicit Matrix Factorization" and "Improving Distributional Similarity with Lessons Learned from Word Embeddings" are fantastic papers by Levy, 2014 and 2015 if I recall correctly, that discuss the connections between neural embeddings and explicit matrix factorization techniques like PPMI-SVD and glove and the principles that make them successful.
Similarly, "Network Embedding as Matrix Factorization: Unifying
DeepWalk, LINE, PTE, and node2vec" is a great discussion of the connections between neural network embeddings and the same implicit objectives as the neural word embeddings.
In short- it doesn't sound like there's much going on in StarSpace as far as architecture because there isn't. It's quite literally adjusting the placement of points in the embedding space to make associated items more similar to each other than to unrelated items.
New contributor
Cameron King is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
$endgroup$
add a comment |
$begingroup$
Calling StarSpace a neural model would be misleading I think. You could certainly think of the it as a neural network with a single layer and a linear activation function, but I think don't think that would be very illuminating. They didn't discuss the architecture much in that paper for a reason- there isn't really any in terms of layers of neurons, activation functions, latent variables, or anything else except the constraints on the number of dimensions.
In fact, the most helpful way to think about StarSpace is that, at its core like many (maybe most) popular embedding techniques across natural language, graphs, etc it's a low rank matrix factorization. What the sampling procedures are doing is using the data in some way to produce a positive definite gram matrix. It doesn't appear that way initially because that is done solely through sampling- if you were to find the expectations of each input/target pair, however, you'd find that the optimization objective is maximizing the expected value of vector similarity over a joint distribution on pairs of items minus a marginal distribution (this is due to the negative samples). Essentially, the goal is to maximize the difference in similarity between items that are frequently sampled and the similarity between items sampled independently from the marginal distribution.
If this sounds familiar to SGNS implicitly factorizing a shifted PPMI matrix or GlOVe explicitly factorizing a relaxed variant of the same, good. The specifics are different and StarSpace has significantly more flexibility in the sampling distribution it works with, but the principle is the same. "Neural Word Embedding as Implicit Matrix Factorization" and "Improving Distributional Similarity with Lessons Learned from Word Embeddings" are fantastic papers by Levy, 2014 and 2015 if I recall correctly, that discuss the connections between neural embeddings and explicit matrix factorization techniques like PPMI-SVD and glove and the principles that make them successful.
Similarly, "Network Embedding as Matrix Factorization: Unifying
DeepWalk, LINE, PTE, and node2vec" is a great discussion of the connections between neural network embeddings and the same implicit objectives as the neural word embeddings.
In short- it doesn't sound like there's much going on in StarSpace as far as architecture because there isn't. It's quite literally adjusting the placement of points in the embedding space to make associated items more similar to each other than to unrelated items.
New contributor
Cameron King is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
$endgroup$
add a comment |
$begingroup$
Calling StarSpace a neural model would be misleading I think. You could certainly think of the it as a neural network with a single layer and a linear activation function, but I think don't think that would be very illuminating. They didn't discuss the architecture much in that paper for a reason- there isn't really any in terms of layers of neurons, activation functions, latent variables, or anything else except the constraints on the number of dimensions.
In fact, the most helpful way to think about StarSpace is that, at its core like many (maybe most) popular embedding techniques across natural language, graphs, etc it's a low rank matrix factorization. What the sampling procedures are doing is using the data in some way to produce a positive definite gram matrix. It doesn't appear that way initially because that is done solely through sampling- if you were to find the expectations of each input/target pair, however, you'd find that the optimization objective is maximizing the expected value of vector similarity over a joint distribution on pairs of items minus a marginal distribution (this is due to the negative samples). Essentially, the goal is to maximize the difference in similarity between items that are frequently sampled and the similarity between items sampled independently from the marginal distribution.
If this sounds familiar to SGNS implicitly factorizing a shifted PPMI matrix or GlOVe explicitly factorizing a relaxed variant of the same, good. The specifics are different and StarSpace has significantly more flexibility in the sampling distribution it works with, but the principle is the same. "Neural Word Embedding as Implicit Matrix Factorization" and "Improving Distributional Similarity with Lessons Learned from Word Embeddings" are fantastic papers by Levy, 2014 and 2015 if I recall correctly, that discuss the connections between neural embeddings and explicit matrix factorization techniques like PPMI-SVD and glove and the principles that make them successful.
Similarly, "Network Embedding as Matrix Factorization: Unifying
DeepWalk, LINE, PTE, and node2vec" is a great discussion of the connections between neural network embeddings and the same implicit objectives as the neural word embeddings.
In short- it doesn't sound like there's much going on in StarSpace as far as architecture because there isn't. It's quite literally adjusting the placement of points in the embedding space to make associated items more similar to each other than to unrelated items.
New contributor
Cameron King is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
$endgroup$
Calling StarSpace a neural model would be misleading I think. You could certainly think of the it as a neural network with a single layer and a linear activation function, but I think don't think that would be very illuminating. They didn't discuss the architecture much in that paper for a reason- there isn't really any in terms of layers of neurons, activation functions, latent variables, or anything else except the constraints on the number of dimensions.
In fact, the most helpful way to think about StarSpace is that, at its core like many (maybe most) popular embedding techniques across natural language, graphs, etc it's a low rank matrix factorization. What the sampling procedures are doing is using the data in some way to produce a positive definite gram matrix. It doesn't appear that way initially because that is done solely through sampling- if you were to find the expectations of each input/target pair, however, you'd find that the optimization objective is maximizing the expected value of vector similarity over a joint distribution on pairs of items minus a marginal distribution (this is due to the negative samples). Essentially, the goal is to maximize the difference in similarity between items that are frequently sampled and the similarity between items sampled independently from the marginal distribution.
If this sounds familiar to SGNS implicitly factorizing a shifted PPMI matrix or GlOVe explicitly factorizing a relaxed variant of the same, good. The specifics are different and StarSpace has significantly more flexibility in the sampling distribution it works with, but the principle is the same. "Neural Word Embedding as Implicit Matrix Factorization" and "Improving Distributional Similarity with Lessons Learned from Word Embeddings" are fantastic papers by Levy, 2014 and 2015 if I recall correctly, that discuss the connections between neural embeddings and explicit matrix factorization techniques like PPMI-SVD and glove and the principles that make them successful.
Similarly, "Network Embedding as Matrix Factorization: Unifying
DeepWalk, LINE, PTE, and node2vec" is a great discussion of the connections between neural network embeddings and the same implicit objectives as the neural word embeddings.
In short- it doesn't sound like there's much going on in StarSpace as far as architecture because there isn't. It's quite literally adjusting the placement of points in the embedding space to make associated items more similar to each other than to unrelated items.
New contributor
Cameron King is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
Cameron King is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
answered 3 hours ago
Cameron KingCameron King
211
211
New contributor
Cameron King is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
Cameron King is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
Cameron King is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
add a comment |
add a comment |
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f41376%2fwhat-is-the-neural-network-architecture-behind-facebooks-starspace-model%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown