What's the best classification model for this recommendation engine? Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern) 2019 Moderator Election Q&A - Questionnaire 2019 Community Moderator Election ResultsRecommended Language/Framework for Building a New Recommendation EngineRecommendation engine with mahoutCreating Data model for mahout recommendation engineBuilding Recommendation engine with Pythonusing movielens dataset build recommendation engineSVD for recommendation engineJob Recommendation EngineClassification model for recommender system?

How to find all the available tools in mac terminal?

2001: A Space Odyssey's use of the song "Daisy Bell" (Bicycle Built for Two); life imitates art or vice-versa?

How do I find out the mythology and history of my Fortress?

Is CEO the profession with the most psychopaths?

How could we fake a moon landing now?

Is there a kind of relay only consumes power when switching?

What does "lightly crushed" mean for cardamon pods?

Can a new player join a group only when a new campaign starts?

An adverb for when you're not exaggerating

Generate an RGB colour grid

Is it common practice to audition new musicians one-on-one before rehearsing with the entire band?

Do I really need to have a message in a novel to appeal to readers?

また usage in a dictionary

When was Kai Tak permanently closed to cargo service?

Trademark violation for app?

Did MS DOS itself ever use blinking text?

Can anything be seen from the center of the Boötes void? How dark would it be?

How to Make a Beautiful Stacked 3D Plot

How to tell that you are a giant?

Why are the trig functions versine, haversine, exsecant, etc, rarely used in modern mathematics?

What is the meaning of the simile “quick as silk”?

Can a party unilaterally change candidates in preparation for a General election?

Why do we bend a book to keep it straight?

How to compare two different files line by line in unix?

What's the best classification model for this recommendation engine?

Announcing the arrival of Valued Associate #679: Cesar Manara

Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern)

2019 Moderator Election Q&A - Questionnaire

2019 Community Moderator Election ResultsRecommended Language/Framework for Building a New Recommendation EngineRecommendation engine with mahoutCreating Data model for mahout recommendation engineBuilding Recommendation engine with Pythonusing movielens dataset build recommendation engineSVD for recommendation engineJob Recommendation EngineClassification model for recommender system?

I'm not a data scientist but I'm trying to implement a recommendation engine on my company. My application runs on PHP but I'll use Python to process this data.

My company is an online school, with 40 online courses as of now. I have a CSV file with around 30k users preferences, and it looks like this:

Dataframe

0 means that user is not subscribed (I consider here that he has no interest), while 1 means subscribed (interested).

My idea is to compare one single user array such as [0,1,0,0,0,1,1...] with all this data and return a grade for each course with the probability of interest for this user.

I was thinking of using a Multinomial Logistic Regression, but as far as I know (and I don't know much) it would return me a binary result, right?

What classification model would you recommend me to use? Ideally, my result should be something like:

[0.95, 0.1, 0.54, 0.3, 0.87...]

Cheers!

asked May 21 '18 at 14:34

grpaiva

bumped to the homepage by Community♦ 22 mins ago

This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.

1

$begingroup$
Formulate the problem as a Collaborative filtering task.
$endgroup$
– Fadi Bakoura
May 21 '18 at 14:52

$begingroup$
Thanks @FadiBakoura, will research on this and let you know.
$endgroup$
– grpaiva
May 21 '18 at 18:03

$begingroup$
Can you include more information about the user? (sex, age ...) An user single with 18 years old may like a course that another 50 years old do not like ...
$endgroup$
– Intruso
Aug 20 '18 at 13:50

$begingroup$
Seems like a prediction problem, not one of classification, so a neural network? Have you tried loading this data into Orange3? Seems you could test out your models pretty quickly. Orange3 uses Scikit, so once you find your workflow, you can use Python. By the way, if it is a neural network solution, TensorFlow has PHP bindings, so you could do the whole thing in PHP. Both may save you time.
$endgroup$
– davmor
Nov 18 '18 at 11:10

add a comment |

I'm not a data scientist but I'm trying to implement a recommendation engine on my company. My application runs on PHP but I'll use Python to process this data.

My company is an online school, with 40 online courses as of now. I have a CSV file with around 30k users preferences, and it looks like this:

Dataframe

0 means that user is not subscribed (I consider here that he has no interest), while 1 means subscribed (interested).

My idea is to compare one single user array such as [0,1,0,0,0,1,1...] with all this data and return a grade for each course with the probability of interest for this user.

I was thinking of using a Multinomial Logistic Regression, but as far as I know (and I don't know much) it would return me a binary result, right?

What classification model would you recommend me to use? Ideally, my result should be something like:

[0.95, 0.1, 0.54, 0.3, 0.87...]

Cheers!

asked May 21 '18 at 14:34

grpaiva

bumped to the homepage by Community♦ 22 mins ago

This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.

1

$begingroup$
Formulate the problem as a Collaborative filtering task.
$endgroup$
– Fadi Bakoura
May 21 '18 at 14:52

$begingroup$
Thanks @FadiBakoura, will research on this and let you know.
$endgroup$
– grpaiva
May 21 '18 at 18:03

$begingroup$
Can you include more information about the user? (sex, age ...) An user single with 18 years old may like a course that another 50 years old do not like ...
$endgroup$
– Intruso
Aug 20 '18 at 13:50

$begingroup$
Seems like a prediction problem, not one of classification, so a neural network? Have you tried loading this data into Orange3? Seems you could test out your models pretty quickly. Orange3 uses Scikit, so once you find your workflow, you can use Python. By the way, if it is a neural network solution, TensorFlow has PHP bindings, so you could do the whole thing in PHP. Both may save you time.
$endgroup$
– davmor
Nov 18 '18 at 11:10

add a comment |

I'm not a data scientist but I'm trying to implement a recommendation engine on my company. My application runs on PHP but I'll use Python to process this data.

My company is an online school, with 40 online courses as of now. I have a CSV file with around 30k users preferences, and it looks like this:

Dataframe

0 means that user is not subscribed (I consider here that he has no interest), while 1 means subscribed (interested).

My idea is to compare one single user array such as [0,1,0,0,0,1,1...] with all this data and return a grade for each course with the probability of interest for this user.

I was thinking of using a Multinomial Logistic Regression, but as far as I know (and I don't know much) it would return me a binary result, right?

What classification model would you recommend me to use? Ideally, my result should be something like:

[0.95, 0.1, 0.54, 0.3, 0.87...]

Cheers!

asked May 21 '18 at 14:34

grpaiva

I'm not a data scientist but I'm trying to implement a recommendation engine on my company. My application runs on PHP but I'll use Python to process this data.

My company is an online school, with 40 online courses as of now. I have a CSV file with around 30k users preferences, and it looks like this:

Dataframe

0 means that user is not subscribed (I consider here that he has no interest), while 1 means subscribed (interested).

My idea is to compare one single user array such as [0,1,0,0,0,1,1...] with all this data and return a grade for each course with the probability of interest for this user.

I was thinking of using a Multinomial Logistic Regression, but as far as I know (and I don't know much) it would return me a binary result, right?

What classification model would you recommend me to use? Ideally, my result should be something like:

[0.95, 0.1, 0.54, 0.3, 0.87...]

Cheers!

python recommender-system multiclass-classification

asked May 21 '18 at 14:34

grpaiva

asked May 21 '18 at 14:34

grpaiva

asked May 21 '18 at 14:34

grpaiva

asked May 21 '18 at 14:34

grpaiva

asked May 21 '18 at 14:34

grpaiva

bumped to the homepage by Community♦ 22 mins ago

This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.

bumped to the homepage by Community♦ 22 mins ago

This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.

1

$begingroup$
Formulate the problem as a Collaborative filtering task.
$endgroup$
– Fadi Bakoura
May 21 '18 at 14:52

$begingroup$
Thanks @FadiBakoura, will research on this and let you know.
$endgroup$
– grpaiva
May 21 '18 at 18:03

$begingroup$
Can you include more information about the user? (sex, age ...) An user single with 18 years old may like a course that another 50 years old do not like ...
$endgroup$
– Intruso
Aug 20 '18 at 13:50

$begingroup$
Seems like a prediction problem, not one of classification, so a neural network? Have you tried loading this data into Orange3? Seems you could test out your models pretty quickly. Orange3 uses Scikit, so once you find your workflow, you can use Python. By the way, if it is a neural network solution, TensorFlow has PHP bindings, so you could do the whole thing in PHP. Both may save you time.
$endgroup$
– davmor
Nov 18 '18 at 11:10

add a comment |

1

$begingroup$
Formulate the problem as a Collaborative filtering task.
$endgroup$
– Fadi Bakoura
May 21 '18 at 14:52

$begingroup$
Thanks @FadiBakoura, will research on this and let you know.
$endgroup$
– grpaiva
May 21 '18 at 18:03

$begingroup$
Can you include more information about the user? (sex, age ...) An user single with 18 years old may like a course that another 50 years old do not like ...
$endgroup$
– Intruso
Aug 20 '18 at 13:50

$begingroup$
Seems like a prediction problem, not one of classification, so a neural network? Have you tried loading this data into Orange3? Seems you could test out your models pretty quickly. Orange3 uses Scikit, so once you find your workflow, you can use Python. By the way, if it is a neural network solution, TensorFlow has PHP bindings, so you could do the whole thing in PHP. Both may save you time.
$endgroup$
– davmor
Nov 18 '18 at 11:10

Formulate the problem as a Collaborative filtering task.

– Fadi Bakoura
May 21 '18 at 14:52

Thanks @FadiBakoura, will research on this and let you know.

– grpaiva
May 21 '18 at 18:03

Can you include more information about the user? (sex, age ...) An user single with 18 years old may like a course that another 50 years old do not like ...

– Intruso
Aug 20 '18 at 13:50

Seems like a prediction problem, not one of classification, so a neural network? Have you tried loading this data into Orange3? Seems you could test out your models pretty quickly. Orange3 uses Scikit, so once you find your workflow, you can use Python. By the way, if it is a neural network solution, TensorFlow has PHP bindings, so you could do the whole thing in PHP. Both may save you time.

– davmor
Nov 18 '18 at 11:10

add a comment |

1 Answer
1

active

oldest

votes

Without more information about your dataset, it's impossible to recommend one particular classifier over another.

If you want your classifier to return a vector of probabilities, then if you're using the sklearn library, you could use the predict_proba method.

Here's an example:

from sklearn.datasets import load_digits
digits = load_digits(2)
from sklearn.linear_model import LogisticRegression
preds = LogisticRegression().fit(digits.data, digits.target).predict_proba(digits.data)
print([i[1] for i in preds])

answered May 21 '18 at 14:47

marco_gorelli

4819

$begingroup$
Thanks for your answer @Lupacante! What I don't get here is that when I print digits.data.shape and digits.target.shape I get: (360, 64) and (360,). Shouldn't the target shape be something like(64,)? My dataset's shape looks like this: (27920, 46) and (46,). I'm getting an error: ValueError: Found input variables with inconsistent numbers of samples: [27920, 46]
$endgroup$
– grpaiva
May 21 '18 at 18:02

$begingroup$
The predictors and target from the training set should have the same number of rows. The first number in the tuple returned by shape gives you the number of rows, so (360, 64) and (360,) is exactly what we'd expect.
$endgroup$
– marco_gorelli
May 22 '18 at 8:16

add a comment |

Your Answer

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f31932%2fwhats-the-best-classification-model-for-this-recommendation-engine%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

Without more information about your dataset, it's impossible to recommend one particular classifier over another.

If you want your classifier to return a vector of probabilities, then if you're using the sklearn library, you could use the predict_proba method.

Here's an example:

from sklearn.datasets import load_digits
digits = load_digits(2)
from sklearn.linear_model import LogisticRegression
preds = LogisticRegression().fit(digits.data, digits.target).predict_proba(digits.data)
print([i[1] for i in preds])

answered May 21 '18 at 14:47

marco_gorelli

4819

$begingroup$
Thanks for your answer @Lupacante! What I don't get here is that when I print digits.data.shape and digits.target.shape I get: (360, 64) and (360,). Shouldn't the target shape be something like(64,)? My dataset's shape looks like this: (27920, 46) and (46,). I'm getting an error: ValueError: Found input variables with inconsistent numbers of samples: [27920, 46]
$endgroup$
– grpaiva
May 21 '18 at 18:02

$begingroup$
The predictors and target from the training set should have the same number of rows. The first number in the tuple returned by shape gives you the number of rows, so (360, 64) and (360,) is exactly what we'd expect.
$endgroup$
– marco_gorelli
May 22 '18 at 8:16

add a comment |

Without more information about your dataset, it's impossible to recommend one particular classifier over another.

If you want your classifier to return a vector of probabilities, then if you're using the sklearn library, you could use the predict_proba method.

Here's an example:

from sklearn.datasets import load_digits
digits = load_digits(2)
from sklearn.linear_model import LogisticRegression
preds = LogisticRegression().fit(digits.data, digits.target).predict_proba(digits.data)
print([i[1] for i in preds])

answered May 21 '18 at 14:47

marco_gorelli

4819

$begingroup$
Thanks for your answer @Lupacante! What I don't get here is that when I print digits.data.shape and digits.target.shape I get: (360, 64) and (360,). Shouldn't the target shape be something like(64,)? My dataset's shape looks like this: (27920, 46) and (46,). I'm getting an error: ValueError: Found input variables with inconsistent numbers of samples: [27920, 46]
$endgroup$
– grpaiva
May 21 '18 at 18:02

$begingroup$
The predictors and target from the training set should have the same number of rows. The first number in the tuple returned by shape gives you the number of rows, so (360, 64) and (360,) is exactly what we'd expect.
$endgroup$
– marco_gorelli
May 22 '18 at 8:16

add a comment |

Without more information about your dataset, it's impossible to recommend one particular classifier over another.

If you want your classifier to return a vector of probabilities, then if you're using the sklearn library, you could use the predict_proba method.

Here's an example:

from sklearn.datasets import load_digits
digits = load_digits(2)
from sklearn.linear_model import LogisticRegression
preds = LogisticRegression().fit(digits.data, digits.target).predict_proba(digits.data)
print([i[1] for i in preds])

answered May 21 '18 at 14:47

marco_gorelli

4819

Without more information about your dataset, it's impossible to recommend one particular classifier over another.

If you want your classifier to return a vector of probabilities, then if you're using the sklearn library, you could use the predict_proba method.

Here's an example:

from sklearn.datasets import load_digits
digits = load_digits(2)
from sklearn.linear_model import LogisticRegression
preds = LogisticRegression().fit(digits.data, digits.target).predict_proba(digits.data)
print([i[1] for i in preds])

answered May 21 '18 at 14:47

marco_gorelli

4819

answered May 21 '18 at 14:47

marco_gorelli

4819

answered May 21 '18 at 14:47

marco_gorelli

4819

answered May 21 '18 at 14:47

marco_gorelli

4819

$begingroup$
Thanks for your answer @Lupacante! What I don't get here is that when I print digits.data.shape and digits.target.shape I get: (360, 64) and (360,). Shouldn't the target shape be something like(64,)? My dataset's shape looks like this: (27920, 46) and (46,). I'm getting an error: ValueError: Found input variables with inconsistent numbers of samples: [27920, 46]
$endgroup$
– grpaiva
May 21 '18 at 18:02

$begingroup$
The predictors and target from the training set should have the same number of rows. The first number in the tuple returned by shape gives you the number of rows, so (360, 64) and (360,) is exactly what we'd expect.
$endgroup$
– marco_gorelli
May 22 '18 at 8:16

add a comment |

$begingroup$
Thanks for your answer @Lupacante! What I don't get here is that when I print digits.data.shape and digits.target.shape I get: (360, 64) and (360,). Shouldn't the target shape be something like(64,)? My dataset's shape looks like this: (27920, 46) and (46,). I'm getting an error: ValueError: Found input variables with inconsistent numbers of samples: [27920, 46]
$endgroup$
– grpaiva
May 21 '18 at 18:02

$begingroup$
The predictors and target from the training set should have the same number of rows. The first number in the tuple returned by shape gives you the number of rows, so (360, 64) and (360,) is exactly what we'd expect.
$endgroup$
– marco_gorelli
May 22 '18 at 8:16

Thanks for your answer @Lupacante! What I don't get here is that when I print digits.data.shape and digits.target.shape I get: (360, 64) and (360,). Shouldn't the target shape be something like(64,)? My dataset's shape looks like this: (27920, 46) and (46,). I'm getting an error: ValueError: Found input variables with inconsistent numbers of samples: [27920, 46]

– grpaiva
May 21 '18 at 18:02

The predictors and target from the training set should have the same number of rows. The first number in the tuple returned by shape gives you the number of rows, so (360, 64) and (360,) is exactly what we'd expect.

– marco_gorelli
May 22 '18 at 8:16

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Data Science Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Hfrxdjt

bumped to the homepage by Community♦ 22 mins ago

bumped to the homepage by Community♦ 22 mins ago

bumped to the homepage by Community♦ 22 mins ago

bumped to the homepage by Community♦ 22 mins ago

1 Answer
1

Your Answer

Post as a guest

1 Answer
1

1 Answer
1

Post as a guest

Popular posts from this blog

bumped to the homepage by Community♦ 22 mins ago

bumped to the homepage by Community♦ 22 mins ago

bumped to the homepage by Community♦ 22 mins ago

bumped to the homepage by Community♦ 22 mins ago

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

1 Answer 1

1 Answer 1

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

1 Answer
1

1 Answer
1

1 Answer
1