What is the neural network architecture behind Facebook's Starspace model? Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern) 2019 Moderator Election Q&A - Questionnaire 2019 Community Moderator Election ResultsProve Reccurrent Neural Network can exhibit oscillatory behaviorClipping threshold of softmax layerHow to create a multi-dimensional softmax output in Tensorflow?Decomposable output regression neural networkTriplet loss - what threshold to use to detect similarity between two embeddings?What is the intuition behind using 2 consecutive convolutional filters in a Convolutional Neural Network?Possible reasons for word2vec learning context words as most similar rather than words in similar contextsConfusion about Entity Embeddings of Categorical Variables - Working Example!hypeparameters tuning neural network according to loss vs according to scoring functionWhat is the motivation for row-wise convolution and folding in Kalchbrenner et al. (2014)?

Why is a lens darker than other ones when applying the same settings?

Where did Ptolemy compare the Earth to the distance of fixed stars?

systemd and copy (/bin/cp): no such file or directory

When to apply negative sign when number is squared

What did Turing mean when saying that "machines cannot give rise to surprises" is due to a fallacy?

Should man-made satellites feature an intelligent inverted "cow catcher"?

Random body shuffle every night—can we still function?

By what mechanism was the 2017 UK General Election called?

Understanding piped command in Gnu/Linux

How to resize main filesystem

NIntegrate on a solution of a matrix ODE

Did John Wesley plagiarize Matthew Henry...?

Is the time—manner—place ordering of adverbials an oversimplification?

Did pre-Columbian Americans know the spherical shape of the Earth?

What is a more techy Technical Writer job title that isn't cutesy or confusing?

How much damage would a cupful of neutron star matter do to the Earth?

Putting class ranking in CV, but against dept guidelines

Trying to understand entropy as a novice in thermodynamics

Why can't fire hurt Daenerys but it did to Jon Snow in season 1?

Why weren't discrete x86 CPUs ever used in game hardware?

Why are two-digit numbers in Jonathan Swift's "Gulliver's Travels" (1726) written in "German style"?

How to make triangles with rounded sides and corners? (squircle with 3 sides)

First paper to introduce the "principal-agent problem"

How does TikZ render an arc?



What is the neural network architecture behind Facebook's Starspace model?



Announcing the arrival of Valued Associate #679: Cesar Manara
Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern)
2019 Moderator Election Q&A - Questionnaire
2019 Community Moderator Election ResultsProve Reccurrent Neural Network can exhibit oscillatory behaviorClipping threshold of softmax layerHow to create a multi-dimensional softmax output in Tensorflow?Decomposable output regression neural networkTriplet loss - what threshold to use to detect similarity between two embeddings?What is the intuition behind using 2 consecutive convolutional filters in a Convolutional Neural Network?Possible reasons for word2vec learning context words as most similar rather than words in similar contextsConfusion about Entity Embeddings of Categorical Variables - Working Example!hypeparameters tuning neural network according to loss vs according to scoring functionWhat is the motivation for row-wise convolution and folding in Kalchbrenner et al. (2014)?










3












$begingroup$


Recently, Facebook released a paper concerning a general purpose neural embedding model called StarSpace.



In their paper, they explain the loss function and the training procedure of the model, but they don't emphasize much on the architecture of the model.



Does somebody know what the NN behind looks like?










share|improve this question











$endgroup$
















    3












    $begingroup$


    Recently, Facebook released a paper concerning a general purpose neural embedding model called StarSpace.



    In their paper, they explain the loss function and the training procedure of the model, but they don't emphasize much on the architecture of the model.



    Does somebody know what the NN behind looks like?










    share|improve this question











    $endgroup$














      3












      3








      3





      $begingroup$


      Recently, Facebook released a paper concerning a general purpose neural embedding model called StarSpace.



      In their paper, they explain the loss function and the training procedure of the model, but they don't emphasize much on the architecture of the model.



      Does somebody know what the NN behind looks like?










      share|improve this question











      $endgroup$




      Recently, Facebook released a paper concerning a general purpose neural embedding model called StarSpace.



      In their paper, they explain the loss function and the training procedure of the model, but they don't emphasize much on the architecture of the model.



      Does somebody know what the NN behind looks like?







      deep-learning word-embeddings embeddings






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 19 '18 at 8:13







      ChiPlusPlus

















      asked Nov 18 '18 at 12:08









      ChiPlusPlusChiPlusPlus

      138111




      138111




















          1 Answer
          1






          active

          oldest

          votes


















          1












          $begingroup$

          Calling StarSpace a neural model would be misleading I think. You could certainly think of the it as a neural network with a single layer and a linear activation function, but I think don't think that would be very illuminating. They didn't discuss the architecture much in that paper for a reason- there isn't really any in terms of layers of neurons, activation functions, latent variables, or anything else except the constraints on the number of dimensions.



          In fact, the most helpful way to think about StarSpace is that, at its core like many (maybe most) popular embedding techniques across natural language, graphs, etc it's a low rank matrix factorization. What the sampling procedures are doing is using the data in some way to produce a positive definite gram matrix. It doesn't appear that way initially because that is done solely through sampling- if you were to find the expectations of each input/target pair, however, you'd find that the optimization objective is maximizing the expected value of vector similarity over a joint distribution on pairs of items minus a marginal distribution (this is due to the negative samples). Essentially, the goal is to maximize the difference in similarity between items that are frequently sampled and the similarity between items sampled independently from the marginal distribution.



          If this sounds familiar to SGNS implicitly factorizing a shifted PPMI matrix or GlOVe explicitly factorizing a relaxed variant of the same, good. The specifics are different and StarSpace has significantly more flexibility in the sampling distribution it works with, but the principle is the same. "Neural Word Embedding as Implicit Matrix Factorization" and "Improving Distributional Similarity with Lessons Learned from Word Embeddings" are fantastic papers by Levy, 2014 and 2015 if I recall correctly, that discuss the connections between neural embeddings and explicit matrix factorization techniques like PPMI-SVD and glove and the principles that make them successful.



          Similarly, "Network Embedding as Matrix Factorization: Unifying
          DeepWalk, LINE, PTE, and node2vec" is a great discussion of the connections between neural network embeddings and the same implicit objectives as the neural word embeddings.



          In short- it doesn't sound like there's much going on in StarSpace as far as architecture because there isn't. It's quite literally adjusting the placement of points in the embedding space to make associated items more similar to each other than to unrelated items.






          share|improve this answer








          New contributor




          Cameron King is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.






          $endgroup$













            Your Answer








            StackExchange.ready(function()
            var channelOptions =
            tags: "".split(" "),
            id: "557"
            ;
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function()
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled)
            StackExchange.using("snippets", function()
            createEditor();
            );

            else
            createEditor();

            );

            function createEditor()
            StackExchange.prepareEditor(
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: false,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: null,
            bindNavPrevention: true,
            postfix: "",
            imageUploader:
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            ,
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            );



            );













            draft saved

            draft discarded


















            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f41376%2fwhat-is-the-neural-network-architecture-behind-facebooks-starspace-model%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown

























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            1












            $begingroup$

            Calling StarSpace a neural model would be misleading I think. You could certainly think of the it as a neural network with a single layer and a linear activation function, but I think don't think that would be very illuminating. They didn't discuss the architecture much in that paper for a reason- there isn't really any in terms of layers of neurons, activation functions, latent variables, or anything else except the constraints on the number of dimensions.



            In fact, the most helpful way to think about StarSpace is that, at its core like many (maybe most) popular embedding techniques across natural language, graphs, etc it's a low rank matrix factorization. What the sampling procedures are doing is using the data in some way to produce a positive definite gram matrix. It doesn't appear that way initially because that is done solely through sampling- if you were to find the expectations of each input/target pair, however, you'd find that the optimization objective is maximizing the expected value of vector similarity over a joint distribution on pairs of items minus a marginal distribution (this is due to the negative samples). Essentially, the goal is to maximize the difference in similarity between items that are frequently sampled and the similarity between items sampled independently from the marginal distribution.



            If this sounds familiar to SGNS implicitly factorizing a shifted PPMI matrix or GlOVe explicitly factorizing a relaxed variant of the same, good. The specifics are different and StarSpace has significantly more flexibility in the sampling distribution it works with, but the principle is the same. "Neural Word Embedding as Implicit Matrix Factorization" and "Improving Distributional Similarity with Lessons Learned from Word Embeddings" are fantastic papers by Levy, 2014 and 2015 if I recall correctly, that discuss the connections between neural embeddings and explicit matrix factorization techniques like PPMI-SVD and glove and the principles that make them successful.



            Similarly, "Network Embedding as Matrix Factorization: Unifying
            DeepWalk, LINE, PTE, and node2vec" is a great discussion of the connections between neural network embeddings and the same implicit objectives as the neural word embeddings.



            In short- it doesn't sound like there's much going on in StarSpace as far as architecture because there isn't. It's quite literally adjusting the placement of points in the embedding space to make associated items more similar to each other than to unrelated items.






            share|improve this answer








            New contributor




            Cameron King is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
            Check out our Code of Conduct.






            $endgroup$

















              1












              $begingroup$

              Calling StarSpace a neural model would be misleading I think. You could certainly think of the it as a neural network with a single layer and a linear activation function, but I think don't think that would be very illuminating. They didn't discuss the architecture much in that paper for a reason- there isn't really any in terms of layers of neurons, activation functions, latent variables, or anything else except the constraints on the number of dimensions.



              In fact, the most helpful way to think about StarSpace is that, at its core like many (maybe most) popular embedding techniques across natural language, graphs, etc it's a low rank matrix factorization. What the sampling procedures are doing is using the data in some way to produce a positive definite gram matrix. It doesn't appear that way initially because that is done solely through sampling- if you were to find the expectations of each input/target pair, however, you'd find that the optimization objective is maximizing the expected value of vector similarity over a joint distribution on pairs of items minus a marginal distribution (this is due to the negative samples). Essentially, the goal is to maximize the difference in similarity between items that are frequently sampled and the similarity between items sampled independently from the marginal distribution.



              If this sounds familiar to SGNS implicitly factorizing a shifted PPMI matrix or GlOVe explicitly factorizing a relaxed variant of the same, good. The specifics are different and StarSpace has significantly more flexibility in the sampling distribution it works with, but the principle is the same. "Neural Word Embedding as Implicit Matrix Factorization" and "Improving Distributional Similarity with Lessons Learned from Word Embeddings" are fantastic papers by Levy, 2014 and 2015 if I recall correctly, that discuss the connections between neural embeddings and explicit matrix factorization techniques like PPMI-SVD and glove and the principles that make them successful.



              Similarly, "Network Embedding as Matrix Factorization: Unifying
              DeepWalk, LINE, PTE, and node2vec" is a great discussion of the connections between neural network embeddings and the same implicit objectives as the neural word embeddings.



              In short- it doesn't sound like there's much going on in StarSpace as far as architecture because there isn't. It's quite literally adjusting the placement of points in the embedding space to make associated items more similar to each other than to unrelated items.






              share|improve this answer








              New contributor




              Cameron King is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
              Check out our Code of Conduct.






              $endgroup$















                1












                1








                1





                $begingroup$

                Calling StarSpace a neural model would be misleading I think. You could certainly think of the it as a neural network with a single layer and a linear activation function, but I think don't think that would be very illuminating. They didn't discuss the architecture much in that paper for a reason- there isn't really any in terms of layers of neurons, activation functions, latent variables, or anything else except the constraints on the number of dimensions.



                In fact, the most helpful way to think about StarSpace is that, at its core like many (maybe most) popular embedding techniques across natural language, graphs, etc it's a low rank matrix factorization. What the sampling procedures are doing is using the data in some way to produce a positive definite gram matrix. It doesn't appear that way initially because that is done solely through sampling- if you were to find the expectations of each input/target pair, however, you'd find that the optimization objective is maximizing the expected value of vector similarity over a joint distribution on pairs of items minus a marginal distribution (this is due to the negative samples). Essentially, the goal is to maximize the difference in similarity between items that are frequently sampled and the similarity between items sampled independently from the marginal distribution.



                If this sounds familiar to SGNS implicitly factorizing a shifted PPMI matrix or GlOVe explicitly factorizing a relaxed variant of the same, good. The specifics are different and StarSpace has significantly more flexibility in the sampling distribution it works with, but the principle is the same. "Neural Word Embedding as Implicit Matrix Factorization" and "Improving Distributional Similarity with Lessons Learned from Word Embeddings" are fantastic papers by Levy, 2014 and 2015 if I recall correctly, that discuss the connections between neural embeddings and explicit matrix factorization techniques like PPMI-SVD and glove and the principles that make them successful.



                Similarly, "Network Embedding as Matrix Factorization: Unifying
                DeepWalk, LINE, PTE, and node2vec" is a great discussion of the connections between neural network embeddings and the same implicit objectives as the neural word embeddings.



                In short- it doesn't sound like there's much going on in StarSpace as far as architecture because there isn't. It's quite literally adjusting the placement of points in the embedding space to make associated items more similar to each other than to unrelated items.






                share|improve this answer








                New contributor




                Cameron King is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                Check out our Code of Conduct.






                $endgroup$



                Calling StarSpace a neural model would be misleading I think. You could certainly think of the it as a neural network with a single layer and a linear activation function, but I think don't think that would be very illuminating. They didn't discuss the architecture much in that paper for a reason- there isn't really any in terms of layers of neurons, activation functions, latent variables, or anything else except the constraints on the number of dimensions.



                In fact, the most helpful way to think about StarSpace is that, at its core like many (maybe most) popular embedding techniques across natural language, graphs, etc it's a low rank matrix factorization. What the sampling procedures are doing is using the data in some way to produce a positive definite gram matrix. It doesn't appear that way initially because that is done solely through sampling- if you were to find the expectations of each input/target pair, however, you'd find that the optimization objective is maximizing the expected value of vector similarity over a joint distribution on pairs of items minus a marginal distribution (this is due to the negative samples). Essentially, the goal is to maximize the difference in similarity between items that are frequently sampled and the similarity between items sampled independently from the marginal distribution.



                If this sounds familiar to SGNS implicitly factorizing a shifted PPMI matrix or GlOVe explicitly factorizing a relaxed variant of the same, good. The specifics are different and StarSpace has significantly more flexibility in the sampling distribution it works with, but the principle is the same. "Neural Word Embedding as Implicit Matrix Factorization" and "Improving Distributional Similarity with Lessons Learned from Word Embeddings" are fantastic papers by Levy, 2014 and 2015 if I recall correctly, that discuss the connections between neural embeddings and explicit matrix factorization techniques like PPMI-SVD and glove and the principles that make them successful.



                Similarly, "Network Embedding as Matrix Factorization: Unifying
                DeepWalk, LINE, PTE, and node2vec" is a great discussion of the connections between neural network embeddings and the same implicit objectives as the neural word embeddings.



                In short- it doesn't sound like there's much going on in StarSpace as far as architecture because there isn't. It's quite literally adjusting the placement of points in the embedding space to make associated items more similar to each other than to unrelated items.







                share|improve this answer








                New contributor




                Cameron King is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                Check out our Code of Conduct.









                share|improve this answer



                share|improve this answer






                New contributor




                Cameron King is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                Check out our Code of Conduct.









                answered 3 hours ago









                Cameron KingCameron King

                211




                211




                New contributor




                Cameron King is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                Check out our Code of Conduct.





                New contributor





                Cameron King is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                Check out our Code of Conduct.






                Cameron King is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                Check out our Code of Conduct.



























                    draft saved

                    draft discarded
















































                    Thanks for contributing an answer to Data Science Stack Exchange!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid


                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.

                    Use MathJax to format equations. MathJax reference.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function ()
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f41376%2fwhat-is-the-neural-network-architecture-behind-facebooks-starspace-model%23new-answer', 'question_page');

                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown