What's the best classification model for this recommendation engine? Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern) 2019 Moderator Election Q&A - Questionnaire 2019 Community Moderator Election ResultsRecommended Language/Framework for Building a New Recommendation EngineRecommendation engine with mahoutCreating Data model for mahout recommendation engineBuilding Recommendation engine with Pythonusing movielens dataset build recommendation engineSVD for recommendation engineJob Recommendation EngineClassification model for recommender system?
How to find all the available tools in mac terminal?
2001: A Space Odyssey's use of the song "Daisy Bell" (Bicycle Built for Two); life imitates art or vice-versa?
How do I find out the mythology and history of my Fortress?
Is CEO the profession with the most psychopaths?
How could we fake a moon landing now?
Is there a kind of relay only consumes power when switching?
What does "lightly crushed" mean for cardamon pods?
Can a new player join a group only when a new campaign starts?
An adverb for when you're not exaggerating
Generate an RGB colour grid
Is it common practice to audition new musicians one-on-one before rehearsing with the entire band?
Do I really need to have a message in a novel to appeal to readers?
また usage in a dictionary
When was Kai Tak permanently closed to cargo service?
Trademark violation for app?
Did MS DOS itself ever use blinking text?
Can anything be seen from the center of the Boötes void? How dark would it be?
How to Make a Beautiful Stacked 3D Plot
How to tell that you are a giant?
Why are the trig functions versine, haversine, exsecant, etc, rarely used in modern mathematics?
What is the meaning of the simile “quick as silk”?
Can a party unilaterally change candidates in preparation for a General election?
Why do we bend a book to keep it straight?
How to compare two different files line by line in unix?
What's the best classification model for this recommendation engine?
Announcing the arrival of Valued Associate #679: Cesar Manara
Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern)
2019 Moderator Election Q&A - Questionnaire
2019 Community Moderator Election ResultsRecommended Language/Framework for Building a New Recommendation EngineRecommendation engine with mahoutCreating Data model for mahout recommendation engineBuilding Recommendation engine with Pythonusing movielens dataset build recommendation engineSVD for recommendation engineJob Recommendation EngineClassification model for recommender system?
$begingroup$
I'm not a data scientist but I'm trying to implement a recommendation engine on my company. My application runs on PHP but I'll use Python to process this data.
My company is an online school, with 40 online courses as of now. I have a CSV file with around 30k users preferences, and it looks like this:
0 means that user is not subscribed (I consider here that he has no interest), while 1 means subscribed (interested).
My idea is to compare one single user array such as [0,1,0,0,0,1,1...] with all this data and return a grade for each course with the probability of interest for this user.
I was thinking of using a Multinomial Logistic Regression, but as far as I know (and I don't know much) it would return me a binary result, right?
What classification model would you recommend me to use? Ideally, my result should be something like:
[0.95, 0.1, 0.54, 0.3, 0.87...]
Cheers!
python recommender-system multiclass-classification
$endgroup$
bumped to the homepage by Community♦ 22 mins ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
add a comment |
$begingroup$
I'm not a data scientist but I'm trying to implement a recommendation engine on my company. My application runs on PHP but I'll use Python to process this data.
My company is an online school, with 40 online courses as of now. I have a CSV file with around 30k users preferences, and it looks like this:
0 means that user is not subscribed (I consider here that he has no interest), while 1 means subscribed (interested).
My idea is to compare one single user array such as [0,1,0,0,0,1,1...] with all this data and return a grade for each course with the probability of interest for this user.
I was thinking of using a Multinomial Logistic Regression, but as far as I know (and I don't know much) it would return me a binary result, right?
What classification model would you recommend me to use? Ideally, my result should be something like:
[0.95, 0.1, 0.54, 0.3, 0.87...]
Cheers!
python recommender-system multiclass-classification
$endgroup$
bumped to the homepage by Community♦ 22 mins ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
1
$begingroup$
Formulate the problem as a Collaborative filtering task.
$endgroup$
– Fadi Bakoura
May 21 '18 at 14:52
$begingroup$
Thanks @FadiBakoura, will research on this and let you know.
$endgroup$
– grpaiva
May 21 '18 at 18:03
$begingroup$
Can you include more information about the user? (sex, age ...) An user single with 18 years old may like a course that another 50 years old do not like ...
$endgroup$
– Intruso
Aug 20 '18 at 13:50
$begingroup$
Seems like a prediction problem, not one of classification, so a neural network? Have you tried loading this data into Orange3? Seems you could test out your models pretty quickly. Orange3 uses Scikit, so once you find your workflow, you can use Python. By the way, if it is a neural network solution, TensorFlow has PHP bindings, so you could do the whole thing in PHP. Both may save you time.
$endgroup$
– davmor
Nov 18 '18 at 11:10
add a comment |
$begingroup$
I'm not a data scientist but I'm trying to implement a recommendation engine on my company. My application runs on PHP but I'll use Python to process this data.
My company is an online school, with 40 online courses as of now. I have a CSV file with around 30k users preferences, and it looks like this:
0 means that user is not subscribed (I consider here that he has no interest), while 1 means subscribed (interested).
My idea is to compare one single user array such as [0,1,0,0,0,1,1...] with all this data and return a grade for each course with the probability of interest for this user.
I was thinking of using a Multinomial Logistic Regression, but as far as I know (and I don't know much) it would return me a binary result, right?
What classification model would you recommend me to use? Ideally, my result should be something like:
[0.95, 0.1, 0.54, 0.3, 0.87...]
Cheers!
python recommender-system multiclass-classification
$endgroup$
I'm not a data scientist but I'm trying to implement a recommendation engine on my company. My application runs on PHP but I'll use Python to process this data.
My company is an online school, with 40 online courses as of now. I have a CSV file with around 30k users preferences, and it looks like this:
0 means that user is not subscribed (I consider here that he has no interest), while 1 means subscribed (interested).
My idea is to compare one single user array such as [0,1,0,0,0,1,1...] with all this data and return a grade for each course with the probability of interest for this user.
I was thinking of using a Multinomial Logistic Regression, but as far as I know (and I don't know much) it would return me a binary result, right?
What classification model would you recommend me to use? Ideally, my result should be something like:
[0.95, 0.1, 0.54, 0.3, 0.87...]
Cheers!
python recommender-system multiclass-classification
python recommender-system multiclass-classification
asked May 21 '18 at 14:34
grpaivagrpaiva
61
61
bumped to the homepage by Community♦ 22 mins ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
bumped to the homepage by Community♦ 22 mins ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
1
$begingroup$
Formulate the problem as a Collaborative filtering task.
$endgroup$
– Fadi Bakoura
May 21 '18 at 14:52
$begingroup$
Thanks @FadiBakoura, will research on this and let you know.
$endgroup$
– grpaiva
May 21 '18 at 18:03
$begingroup$
Can you include more information about the user? (sex, age ...) An user single with 18 years old may like a course that another 50 years old do not like ...
$endgroup$
– Intruso
Aug 20 '18 at 13:50
$begingroup$
Seems like a prediction problem, not one of classification, so a neural network? Have you tried loading this data into Orange3? Seems you could test out your models pretty quickly. Orange3 uses Scikit, so once you find your workflow, you can use Python. By the way, if it is a neural network solution, TensorFlow has PHP bindings, so you could do the whole thing in PHP. Both may save you time.
$endgroup$
– davmor
Nov 18 '18 at 11:10
add a comment |
1
$begingroup$
Formulate the problem as a Collaborative filtering task.
$endgroup$
– Fadi Bakoura
May 21 '18 at 14:52
$begingroup$
Thanks @FadiBakoura, will research on this and let you know.
$endgroup$
– grpaiva
May 21 '18 at 18:03
$begingroup$
Can you include more information about the user? (sex, age ...) An user single with 18 years old may like a course that another 50 years old do not like ...
$endgroup$
– Intruso
Aug 20 '18 at 13:50
$begingroup$
Seems like a prediction problem, not one of classification, so a neural network? Have you tried loading this data into Orange3? Seems you could test out your models pretty quickly. Orange3 uses Scikit, so once you find your workflow, you can use Python. By the way, if it is a neural network solution, TensorFlow has PHP bindings, so you could do the whole thing in PHP. Both may save you time.
$endgroup$
– davmor
Nov 18 '18 at 11:10
1
1
$begingroup$
Formulate the problem as a Collaborative filtering task.
$endgroup$
– Fadi Bakoura
May 21 '18 at 14:52
$begingroup$
Formulate the problem as a Collaborative filtering task.
$endgroup$
– Fadi Bakoura
May 21 '18 at 14:52
$begingroup$
Thanks @FadiBakoura, will research on this and let you know.
$endgroup$
– grpaiva
May 21 '18 at 18:03
$begingroup$
Thanks @FadiBakoura, will research on this and let you know.
$endgroup$
– grpaiva
May 21 '18 at 18:03
$begingroup$
Can you include more information about the user? (sex, age ...) An user single with 18 years old may like a course that another 50 years old do not like ...
$endgroup$
– Intruso
Aug 20 '18 at 13:50
$begingroup$
Can you include more information about the user? (sex, age ...) An user single with 18 years old may like a course that another 50 years old do not like ...
$endgroup$
– Intruso
Aug 20 '18 at 13:50
$begingroup$
Seems like a prediction problem, not one of classification, so a neural network? Have you tried loading this data into Orange3? Seems you could test out your models pretty quickly. Orange3 uses Scikit, so once you find your workflow, you can use Python. By the way, if it is a neural network solution, TensorFlow has PHP bindings, so you could do the whole thing in PHP. Both may save you time.
$endgroup$
– davmor
Nov 18 '18 at 11:10
$begingroup$
Seems like a prediction problem, not one of classification, so a neural network? Have you tried loading this data into Orange3? Seems you could test out your models pretty quickly. Orange3 uses Scikit, so once you find your workflow, you can use Python. By the way, if it is a neural network solution, TensorFlow has PHP bindings, so you could do the whole thing in PHP. Both may save you time.
$endgroup$
– davmor
Nov 18 '18 at 11:10
add a comment |
1 Answer
1
active
oldest
votes
$begingroup$
Without more information about your dataset, it's impossible to recommend one particular classifier over another.
If you want your classifier to return a vector of probabilities, then if you're using the sklearn library, you could use the predict_proba
method.
Here's an example:
from sklearn.datasets import load_digits
digits = load_digits(2)
from sklearn.linear_model import LogisticRegression
preds = LogisticRegression().fit(digits.data, digits.target).predict_proba(digits.data)
print([i[1] for i in preds])
$endgroup$
$begingroup$
Thanks for your answer @Lupacante! What I don't get here is that when I print digits.data.shape and digits.target.shape I get: (360, 64) and (360,). Shouldn't the target shape be something like(64,)? My dataset's shape looks like this: (27920, 46) and (46,). I'm getting an error:ValueError: Found input variables with inconsistent numbers of samples: [27920, 46]
$endgroup$
– grpaiva
May 21 '18 at 18:02
$begingroup$
The predictors and target from the training set should have the same number of rows. The first number in the tuple returned byshape
gives you the number of rows, so(360, 64)
and(360,)
is exactly what we'd expect.
$endgroup$
– marco_gorelli
May 22 '18 at 8:16
add a comment |
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f31932%2fwhats-the-best-classification-model-for-this-recommendation-engine%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
Without more information about your dataset, it's impossible to recommend one particular classifier over another.
If you want your classifier to return a vector of probabilities, then if you're using the sklearn library, you could use the predict_proba
method.
Here's an example:
from sklearn.datasets import load_digits
digits = load_digits(2)
from sklearn.linear_model import LogisticRegression
preds = LogisticRegression().fit(digits.data, digits.target).predict_proba(digits.data)
print([i[1] for i in preds])
$endgroup$
$begingroup$
Thanks for your answer @Lupacante! What I don't get here is that when I print digits.data.shape and digits.target.shape I get: (360, 64) and (360,). Shouldn't the target shape be something like(64,)? My dataset's shape looks like this: (27920, 46) and (46,). I'm getting an error:ValueError: Found input variables with inconsistent numbers of samples: [27920, 46]
$endgroup$
– grpaiva
May 21 '18 at 18:02
$begingroup$
The predictors and target from the training set should have the same number of rows. The first number in the tuple returned byshape
gives you the number of rows, so(360, 64)
and(360,)
is exactly what we'd expect.
$endgroup$
– marco_gorelli
May 22 '18 at 8:16
add a comment |
$begingroup$
Without more information about your dataset, it's impossible to recommend one particular classifier over another.
If you want your classifier to return a vector of probabilities, then if you're using the sklearn library, you could use the predict_proba
method.
Here's an example:
from sklearn.datasets import load_digits
digits = load_digits(2)
from sklearn.linear_model import LogisticRegression
preds = LogisticRegression().fit(digits.data, digits.target).predict_proba(digits.data)
print([i[1] for i in preds])
$endgroup$
$begingroup$
Thanks for your answer @Lupacante! What I don't get here is that when I print digits.data.shape and digits.target.shape I get: (360, 64) and (360,). Shouldn't the target shape be something like(64,)? My dataset's shape looks like this: (27920, 46) and (46,). I'm getting an error:ValueError: Found input variables with inconsistent numbers of samples: [27920, 46]
$endgroup$
– grpaiva
May 21 '18 at 18:02
$begingroup$
The predictors and target from the training set should have the same number of rows. The first number in the tuple returned byshape
gives you the number of rows, so(360, 64)
and(360,)
is exactly what we'd expect.
$endgroup$
– marco_gorelli
May 22 '18 at 8:16
add a comment |
$begingroup$
Without more information about your dataset, it's impossible to recommend one particular classifier over another.
If you want your classifier to return a vector of probabilities, then if you're using the sklearn library, you could use the predict_proba
method.
Here's an example:
from sklearn.datasets import load_digits
digits = load_digits(2)
from sklearn.linear_model import LogisticRegression
preds = LogisticRegression().fit(digits.data, digits.target).predict_proba(digits.data)
print([i[1] for i in preds])
$endgroup$
Without more information about your dataset, it's impossible to recommend one particular classifier over another.
If you want your classifier to return a vector of probabilities, then if you're using the sklearn library, you could use the predict_proba
method.
Here's an example:
from sklearn.datasets import load_digits
digits = load_digits(2)
from sklearn.linear_model import LogisticRegression
preds = LogisticRegression().fit(digits.data, digits.target).predict_proba(digits.data)
print([i[1] for i in preds])
answered May 21 '18 at 14:47
marco_gorellimarco_gorelli
4819
4819
$begingroup$
Thanks for your answer @Lupacante! What I don't get here is that when I print digits.data.shape and digits.target.shape I get: (360, 64) and (360,). Shouldn't the target shape be something like(64,)? My dataset's shape looks like this: (27920, 46) and (46,). I'm getting an error:ValueError: Found input variables with inconsistent numbers of samples: [27920, 46]
$endgroup$
– grpaiva
May 21 '18 at 18:02
$begingroup$
The predictors and target from the training set should have the same number of rows. The first number in the tuple returned byshape
gives you the number of rows, so(360, 64)
and(360,)
is exactly what we'd expect.
$endgroup$
– marco_gorelli
May 22 '18 at 8:16
add a comment |
$begingroup$
Thanks for your answer @Lupacante! What I don't get here is that when I print digits.data.shape and digits.target.shape I get: (360, 64) and (360,). Shouldn't the target shape be something like(64,)? My dataset's shape looks like this: (27920, 46) and (46,). I'm getting an error:ValueError: Found input variables with inconsistent numbers of samples: [27920, 46]
$endgroup$
– grpaiva
May 21 '18 at 18:02
$begingroup$
The predictors and target from the training set should have the same number of rows. The first number in the tuple returned byshape
gives you the number of rows, so(360, 64)
and(360,)
is exactly what we'd expect.
$endgroup$
– marco_gorelli
May 22 '18 at 8:16
$begingroup$
Thanks for your answer @Lupacante! What I don't get here is that when I print digits.data.shape and digits.target.shape I get: (360, 64) and (360,). Shouldn't the target shape be something like(64,)? My dataset's shape looks like this: (27920, 46) and (46,). I'm getting an error:
ValueError: Found input variables with inconsistent numbers of samples: [27920, 46]
$endgroup$
– grpaiva
May 21 '18 at 18:02
$begingroup$
Thanks for your answer @Lupacante! What I don't get here is that when I print digits.data.shape and digits.target.shape I get: (360, 64) and (360,). Shouldn't the target shape be something like(64,)? My dataset's shape looks like this: (27920, 46) and (46,). I'm getting an error:
ValueError: Found input variables with inconsistent numbers of samples: [27920, 46]
$endgroup$
– grpaiva
May 21 '18 at 18:02
$begingroup$
The predictors and target from the training set should have the same number of rows. The first number in the tuple returned by
shape
gives you the number of rows, so (360, 64)
and (360,)
is exactly what we'd expect.$endgroup$
– marco_gorelli
May 22 '18 at 8:16
$begingroup$
The predictors and target from the training set should have the same number of rows. The first number in the tuple returned by
shape
gives you the number of rows, so (360, 64)
and (360,)
is exactly what we'd expect.$endgroup$
– marco_gorelli
May 22 '18 at 8:16
add a comment |
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f31932%2fwhats-the-best-classification-model-for-this-recommendation-engine%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
$begingroup$
Formulate the problem as a Collaborative filtering task.
$endgroup$
– Fadi Bakoura
May 21 '18 at 14:52
$begingroup$
Thanks @FadiBakoura, will research on this and let you know.
$endgroup$
– grpaiva
May 21 '18 at 18:03
$begingroup$
Can you include more information about the user? (sex, age ...) An user single with 18 years old may like a course that another 50 years old do not like ...
$endgroup$
– Intruso
Aug 20 '18 at 13:50
$begingroup$
Seems like a prediction problem, not one of classification, so a neural network? Have you tried loading this data into Orange3? Seems you could test out your models pretty quickly. Orange3 uses Scikit, so once you find your workflow, you can use Python. By the way, if it is a neural network solution, TensorFlow has PHP bindings, so you could do the whole thing in PHP. Both may save you time.
$endgroup$
– davmor
Nov 18 '18 at 11:10