Found array with dim 3. Estimator expected <= 22019 Community Moderator ElectionHow do I use Hierarchical Dirichlet Process (HDP) implementations (hdpfaster by C.Wang or hca by W.Buntine at mloss.org) to discover number of topics?Classification when one class is otherWord analysis in PythonHow to use binary relevance for multi-label text classification?Learning token dictionaryQuantifying the Reproducibility of LDA Modelswhy the accuracy of LDA model is always changing and also is highCan I use euclidean distance for Latent Dirichlet Allocation document similarity?Need help in improving accuracy of text classification using Naive Bayes in nltk for movie reviewsExtracting sections from document based on list of keywords - Python

How will losing mobility of one hand affect my career as a programmer?

Does "Dominei" mean something?

Did US corporations pay demonstrators in the German demonstrations against article 13?

Can somebody explain Brexit in a few child-proof sentences?

Is there a good way to store credentials outside of a password manager?

Resetting two CD4017 counters simultaneously, only one resets

Can I create an upright 7-foot × 5-foot wall with the Minor Illusion spell?

Can I rely on these GitHub repository files?

What if somebody invests in my application?

A known event to a history junkie

Can a Gentile theist be saved?

Giant Toughroad SLR 2 for 200 miles in two days, will it make it?

How do I repair my stair bannister?

Can a Bard use an arcane focus?

Science Fiction story where a man invents a machine that can help him watch history unfold

Stereotypical names

Calculating the number of days between 2 dates in Excel

What will be the benefits of Brexit?

What was required to accept "troll"?

What is the term when two people sing in harmony, but they aren't singing the same notes?

Bob has never been a M before

Teaching indefinite integrals that require special-casing

Freedom of speech and where it applies

Should my PhD thesis be submitted under my legal name?



Found array with dim 3. Estimator expected



2019 Community Moderator ElectionHow do I use Hierarchical Dirichlet Process (HDP) implementations (hdpfaster by C.Wang or hca by W.Buntine at mloss.org) to discover number of topics?Classification when one class is otherWord analysis in PythonHow to use binary relevance for multi-label text classification?Learning token dictionaryQuantifying the Reproducibility of LDA Modelswhy the accuracy of LDA model is always changing and also is highCan I use euclidean distance for Latent Dirichlet Allocation document similarity?Need help in improving accuracy of text classification using Naive Bayes in nltk for movie reviewsExtracting sections from document based on list of keywords - Python










2












$begingroup$


I am using LDA over a simple collection of documents. My goal is to extract topics, then use the extracted topics as features to evaluate my model.



I decided to use multinomial SVM as the evaluator.



import itertools
from gensim.models import ldamodel
from nltk.tokenize import RegexpTokenizer
from nltk.stem.porter import PorterStemmer
from gensim import corpora, models
from sklearn.naive_bayes import MultinomialNB

tokenizer = RegexpTokenizer(r'w+')

# create English stop words list
en_stop = 'a'

# Create p_stemmer of class PorterStemmer
p_stemmer = PorterStemmer()

# create sample documents
doc_a = "Brocolli is good to eat. My brother likes to eat good brocolli, but not my mother."
doc_b = "My mother spends a lot of time driving my brother around to baseball practice."
doc_c = "Some health experts suggest that driving may cause increased tension and blood pressure."
doc_d = "I often feel pressure to perform well at school, but my mother never seems to drive my brother to do better."
doc_e = "Health professionals say that brocolli is good for your health."

# compile sample documents into a list
doc_set = [doc_a, doc_b, doc_c, doc_d, doc_e]

# list for tokenized documents in loop
texts = []

# loop through document list
for i in doc_set:
# clean and tokenize document string
raw = i.lower()
tokens = tokenizer.tokenize(raw)

# remove stop words from tokens
stopped_tokens = [i for i in tokens if not i in en_stop]

# stem tokens
stemmed_tokens = [p_stemmer.stem(i) for i in stopped_tokens]

# add tokens to list
texts.append(stemmed_tokens)

# turn our tokenized documents into a id <-> term dictionary
dictionary = corpora.Dictionary(texts)

# convert tokenized documents into a document-term matrix
corpus = [dictionary.doc2bow(text) for text in texts]


# generate LDA model
#ldamodel = gensim.models.ldamodel.LdaModel(corpus, num_topics=2, id2word=dictionary, passes=20)

id2word = corpora.Dictionary(texts)
# Creates the Bag of Word corpus.
mm = [id2word.doc2bow(text) for text in texts]

# Trains the LDA models.
lda = ldamodel.LdaModel(corpus=mm, id2word=id2word, num_topics=4,
update_every=1, chunksize=10000, passes=1)


# Assigns the topics to the documents in corpus
a=[]
lda_corpus = lda[mm]
for i in range(len(doc_set)):
a.append(lda_corpus[i])
print(lda_corpus[i])
merged_list = list(itertools.chain(*lda_corpus))
print(a)
#my_list.append(my_list[i])


sv=MultinomialNB()

yvalues = [0,1,2,3]

sv.fit(a,yvalues)
predictclass = sv.predict(a)

testLables=[0,1,2,3]
from sklearn import metrics, tree
#yacc=metrics.accuracy_score(testLables,predictclass)
#print (yacc)


when I run this code it throws the error mentioned in the subject.



Also this is the output of LDA model(topic doc distribution) that I feed to SVM:



[[(0, 0.95533888404477663), (1, 0.014775921798986477), (2, 0.015161897773308793), (3, 0.014723296382928375)], [(0, 0.019079556242721694), (1, 0.017932434792585779), (2, 0.94498655991579728), (3, 0.018001449048895311)], [(0, 0.017957955483631164), (1, 0.017900184473362918), (2, 0.018133572636989413), (3, 0.9460082874060165)], [(0, 0.96554611572184923), (1, 0.011407838337200715), (2, 0.011537900721487016), (3, 0.011508145219463113)], [(0, 0.023306931039431281), (1, 0.022823706054846005), (2, 0.93072240824085961), (3, 0.023146954664863096)]]


my labels here are : 0,1,2,3










share|improve this question











$endgroup$











  • $begingroup$
    Hi Saria, was this problem resolved? I'm encountering a similar problem with sklearn.cluster.KMeans
    $endgroup$
    – frank
    Jun 24 '18 at 22:44










  • $begingroup$
    you need to reshape the array I guess. Since it's 3 dimensional the error is being thrown, the algorithm works with 2D arrays, not 3D. Try reshaping.
    $endgroup$
    – Damini Jain
    1 hour ago










  • $begingroup$
    stackoverflow.com/questions/34972142/…
    $endgroup$
    – Damini Jain
    1 hour ago















2












$begingroup$


I am using LDA over a simple collection of documents. My goal is to extract topics, then use the extracted topics as features to evaluate my model.



I decided to use multinomial SVM as the evaluator.



import itertools
from gensim.models import ldamodel
from nltk.tokenize import RegexpTokenizer
from nltk.stem.porter import PorterStemmer
from gensim import corpora, models
from sklearn.naive_bayes import MultinomialNB

tokenizer = RegexpTokenizer(r'w+')

# create English stop words list
en_stop = 'a'

# Create p_stemmer of class PorterStemmer
p_stemmer = PorterStemmer()

# create sample documents
doc_a = "Brocolli is good to eat. My brother likes to eat good brocolli, but not my mother."
doc_b = "My mother spends a lot of time driving my brother around to baseball practice."
doc_c = "Some health experts suggest that driving may cause increased tension and blood pressure."
doc_d = "I often feel pressure to perform well at school, but my mother never seems to drive my brother to do better."
doc_e = "Health professionals say that brocolli is good for your health."

# compile sample documents into a list
doc_set = [doc_a, doc_b, doc_c, doc_d, doc_e]

# list for tokenized documents in loop
texts = []

# loop through document list
for i in doc_set:
# clean and tokenize document string
raw = i.lower()
tokens = tokenizer.tokenize(raw)

# remove stop words from tokens
stopped_tokens = [i for i in tokens if not i in en_stop]

# stem tokens
stemmed_tokens = [p_stemmer.stem(i) for i in stopped_tokens]

# add tokens to list
texts.append(stemmed_tokens)

# turn our tokenized documents into a id <-> term dictionary
dictionary = corpora.Dictionary(texts)

# convert tokenized documents into a document-term matrix
corpus = [dictionary.doc2bow(text) for text in texts]


# generate LDA model
#ldamodel = gensim.models.ldamodel.LdaModel(corpus, num_topics=2, id2word=dictionary, passes=20)

id2word = corpora.Dictionary(texts)
# Creates the Bag of Word corpus.
mm = [id2word.doc2bow(text) for text in texts]

# Trains the LDA models.
lda = ldamodel.LdaModel(corpus=mm, id2word=id2word, num_topics=4,
update_every=1, chunksize=10000, passes=1)


# Assigns the topics to the documents in corpus
a=[]
lda_corpus = lda[mm]
for i in range(len(doc_set)):
a.append(lda_corpus[i])
print(lda_corpus[i])
merged_list = list(itertools.chain(*lda_corpus))
print(a)
#my_list.append(my_list[i])


sv=MultinomialNB()

yvalues = [0,1,2,3]

sv.fit(a,yvalues)
predictclass = sv.predict(a)

testLables=[0,1,2,3]
from sklearn import metrics, tree
#yacc=metrics.accuracy_score(testLables,predictclass)
#print (yacc)


when I run this code it throws the error mentioned in the subject.



Also this is the output of LDA model(topic doc distribution) that I feed to SVM:



[[(0, 0.95533888404477663), (1, 0.014775921798986477), (2, 0.015161897773308793), (3, 0.014723296382928375)], [(0, 0.019079556242721694), (1, 0.017932434792585779), (2, 0.94498655991579728), (3, 0.018001449048895311)], [(0, 0.017957955483631164), (1, 0.017900184473362918), (2, 0.018133572636989413), (3, 0.9460082874060165)], [(0, 0.96554611572184923), (1, 0.011407838337200715), (2, 0.011537900721487016), (3, 0.011508145219463113)], [(0, 0.023306931039431281), (1, 0.022823706054846005), (2, 0.93072240824085961), (3, 0.023146954664863096)]]


my labels here are : 0,1,2,3










share|improve this question











$endgroup$











  • $begingroup$
    Hi Saria, was this problem resolved? I'm encountering a similar problem with sklearn.cluster.KMeans
    $endgroup$
    – frank
    Jun 24 '18 at 22:44










  • $begingroup$
    you need to reshape the array I guess. Since it's 3 dimensional the error is being thrown, the algorithm works with 2D arrays, not 3D. Try reshaping.
    $endgroup$
    – Damini Jain
    1 hour ago










  • $begingroup$
    stackoverflow.com/questions/34972142/…
    $endgroup$
    – Damini Jain
    1 hour ago













2












2








2





$begingroup$


I am using LDA over a simple collection of documents. My goal is to extract topics, then use the extracted topics as features to evaluate my model.



I decided to use multinomial SVM as the evaluator.



import itertools
from gensim.models import ldamodel
from nltk.tokenize import RegexpTokenizer
from nltk.stem.porter import PorterStemmer
from gensim import corpora, models
from sklearn.naive_bayes import MultinomialNB

tokenizer = RegexpTokenizer(r'w+')

# create English stop words list
en_stop = 'a'

# Create p_stemmer of class PorterStemmer
p_stemmer = PorterStemmer()

# create sample documents
doc_a = "Brocolli is good to eat. My brother likes to eat good brocolli, but not my mother."
doc_b = "My mother spends a lot of time driving my brother around to baseball practice."
doc_c = "Some health experts suggest that driving may cause increased tension and blood pressure."
doc_d = "I often feel pressure to perform well at school, but my mother never seems to drive my brother to do better."
doc_e = "Health professionals say that brocolli is good for your health."

# compile sample documents into a list
doc_set = [doc_a, doc_b, doc_c, doc_d, doc_e]

# list for tokenized documents in loop
texts = []

# loop through document list
for i in doc_set:
# clean and tokenize document string
raw = i.lower()
tokens = tokenizer.tokenize(raw)

# remove stop words from tokens
stopped_tokens = [i for i in tokens if not i in en_stop]

# stem tokens
stemmed_tokens = [p_stemmer.stem(i) for i in stopped_tokens]

# add tokens to list
texts.append(stemmed_tokens)

# turn our tokenized documents into a id <-> term dictionary
dictionary = corpora.Dictionary(texts)

# convert tokenized documents into a document-term matrix
corpus = [dictionary.doc2bow(text) for text in texts]


# generate LDA model
#ldamodel = gensim.models.ldamodel.LdaModel(corpus, num_topics=2, id2word=dictionary, passes=20)

id2word = corpora.Dictionary(texts)
# Creates the Bag of Word corpus.
mm = [id2word.doc2bow(text) for text in texts]

# Trains the LDA models.
lda = ldamodel.LdaModel(corpus=mm, id2word=id2word, num_topics=4,
update_every=1, chunksize=10000, passes=1)


# Assigns the topics to the documents in corpus
a=[]
lda_corpus = lda[mm]
for i in range(len(doc_set)):
a.append(lda_corpus[i])
print(lda_corpus[i])
merged_list = list(itertools.chain(*lda_corpus))
print(a)
#my_list.append(my_list[i])


sv=MultinomialNB()

yvalues = [0,1,2,3]

sv.fit(a,yvalues)
predictclass = sv.predict(a)

testLables=[0,1,2,3]
from sklearn import metrics, tree
#yacc=metrics.accuracy_score(testLables,predictclass)
#print (yacc)


when I run this code it throws the error mentioned in the subject.



Also this is the output of LDA model(topic doc distribution) that I feed to SVM:



[[(0, 0.95533888404477663), (1, 0.014775921798986477), (2, 0.015161897773308793), (3, 0.014723296382928375)], [(0, 0.019079556242721694), (1, 0.017932434792585779), (2, 0.94498655991579728), (3, 0.018001449048895311)], [(0, 0.017957955483631164), (1, 0.017900184473362918), (2, 0.018133572636989413), (3, 0.9460082874060165)], [(0, 0.96554611572184923), (1, 0.011407838337200715), (2, 0.011537900721487016), (3, 0.011508145219463113)], [(0, 0.023306931039431281), (1, 0.022823706054846005), (2, 0.93072240824085961), (3, 0.023146954664863096)]]


my labels here are : 0,1,2,3










share|improve this question











$endgroup$




I am using LDA over a simple collection of documents. My goal is to extract topics, then use the extracted topics as features to evaluate my model.



I decided to use multinomial SVM as the evaluator.



import itertools
from gensim.models import ldamodel
from nltk.tokenize import RegexpTokenizer
from nltk.stem.porter import PorterStemmer
from gensim import corpora, models
from sklearn.naive_bayes import MultinomialNB

tokenizer = RegexpTokenizer(r'w+')

# create English stop words list
en_stop = 'a'

# Create p_stemmer of class PorterStemmer
p_stemmer = PorterStemmer()

# create sample documents
doc_a = "Brocolli is good to eat. My brother likes to eat good brocolli, but not my mother."
doc_b = "My mother spends a lot of time driving my brother around to baseball practice."
doc_c = "Some health experts suggest that driving may cause increased tension and blood pressure."
doc_d = "I often feel pressure to perform well at school, but my mother never seems to drive my brother to do better."
doc_e = "Health professionals say that brocolli is good for your health."

# compile sample documents into a list
doc_set = [doc_a, doc_b, doc_c, doc_d, doc_e]

# list for tokenized documents in loop
texts = []

# loop through document list
for i in doc_set:
# clean and tokenize document string
raw = i.lower()
tokens = tokenizer.tokenize(raw)

# remove stop words from tokens
stopped_tokens = [i for i in tokens if not i in en_stop]

# stem tokens
stemmed_tokens = [p_stemmer.stem(i) for i in stopped_tokens]

# add tokens to list
texts.append(stemmed_tokens)

# turn our tokenized documents into a id <-> term dictionary
dictionary = corpora.Dictionary(texts)

# convert tokenized documents into a document-term matrix
corpus = [dictionary.doc2bow(text) for text in texts]


# generate LDA model
#ldamodel = gensim.models.ldamodel.LdaModel(corpus, num_topics=2, id2word=dictionary, passes=20)

id2word = corpora.Dictionary(texts)
# Creates the Bag of Word corpus.
mm = [id2word.doc2bow(text) for text in texts]

# Trains the LDA models.
lda = ldamodel.LdaModel(corpus=mm, id2word=id2word, num_topics=4,
update_every=1, chunksize=10000, passes=1)


# Assigns the topics to the documents in corpus
a=[]
lda_corpus = lda[mm]
for i in range(len(doc_set)):
a.append(lda_corpus[i])
print(lda_corpus[i])
merged_list = list(itertools.chain(*lda_corpus))
print(a)
#my_list.append(my_list[i])


sv=MultinomialNB()

yvalues = [0,1,2,3]

sv.fit(a,yvalues)
predictclass = sv.predict(a)

testLables=[0,1,2,3]
from sklearn import metrics, tree
#yacc=metrics.accuracy_score(testLables,predictclass)
#print (yacc)


when I run this code it throws the error mentioned in the subject.



Also this is the output of LDA model(topic doc distribution) that I feed to SVM:



[[(0, 0.95533888404477663), (1, 0.014775921798986477), (2, 0.015161897773308793), (3, 0.014723296382928375)], [(0, 0.019079556242721694), (1, 0.017932434792585779), (2, 0.94498655991579728), (3, 0.018001449048895311)], [(0, 0.017957955483631164), (1, 0.017900184473362918), (2, 0.018133572636989413), (3, 0.9460082874060165)], [(0, 0.96554611572184923), (1, 0.011407838337200715), (2, 0.011537900721487016), (3, 0.011508145219463113)], [(0, 0.023306931039431281), (1, 0.022823706054846005), (2, 0.93072240824085961), (3, 0.023146954664863096)]]


my labels here are : 0,1,2,3







machine-learning python classification svm lda






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited 3 mins ago









Damini Jain

1073




1073










asked Aug 8 '17 at 16:29









sariiisariii

18510




18510











  • $begingroup$
    Hi Saria, was this problem resolved? I'm encountering a similar problem with sklearn.cluster.KMeans
    $endgroup$
    – frank
    Jun 24 '18 at 22:44










  • $begingroup$
    you need to reshape the array I guess. Since it's 3 dimensional the error is being thrown, the algorithm works with 2D arrays, not 3D. Try reshaping.
    $endgroup$
    – Damini Jain
    1 hour ago










  • $begingroup$
    stackoverflow.com/questions/34972142/…
    $endgroup$
    – Damini Jain
    1 hour ago
















  • $begingroup$
    Hi Saria, was this problem resolved? I'm encountering a similar problem with sklearn.cluster.KMeans
    $endgroup$
    – frank
    Jun 24 '18 at 22:44










  • $begingroup$
    you need to reshape the array I guess. Since it's 3 dimensional the error is being thrown, the algorithm works with 2D arrays, not 3D. Try reshaping.
    $endgroup$
    – Damini Jain
    1 hour ago










  • $begingroup$
    stackoverflow.com/questions/34972142/…
    $endgroup$
    – Damini Jain
    1 hour ago















$begingroup$
Hi Saria, was this problem resolved? I'm encountering a similar problem with sklearn.cluster.KMeans
$endgroup$
– frank
Jun 24 '18 at 22:44




$begingroup$
Hi Saria, was this problem resolved? I'm encountering a similar problem with sklearn.cluster.KMeans
$endgroup$
– frank
Jun 24 '18 at 22:44












$begingroup$
you need to reshape the array I guess. Since it's 3 dimensional the error is being thrown, the algorithm works with 2D arrays, not 3D. Try reshaping.
$endgroup$
– Damini Jain
1 hour ago




$begingroup$
you need to reshape the array I guess. Since it's 3 dimensional the error is being thrown, the algorithm works with 2D arrays, not 3D. Try reshaping.
$endgroup$
– Damini Jain
1 hour ago












$begingroup$
stackoverflow.com/questions/34972142/…
$endgroup$
– Damini Jain
1 hour ago




$begingroup$
stackoverflow.com/questions/34972142/…
$endgroup$
– Damini Jain
1 hour ago










0






active

oldest

votes











Your Answer





StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f22078%2ffound-array-with-dim-3-estimator-expected-2%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























0






active

oldest

votes








0






active

oldest

votes









active

oldest

votes






active

oldest

votes















draft saved

draft discarded
















































Thanks for contributing an answer to Data Science Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f22078%2ffound-array-with-dim-3-estimator-expected-2%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

ValueError: Error when checking input: expected conv2d_13_input to have shape (3, 150, 150) but got array with shape (150, 150, 3)2019 Community Moderator ElectionError when checking : expected dense_1_input to have shape (None, 5) but got array with shape (200, 1)Error 'Expected 2D array, got 1D array instead:'ValueError: Error when checking input: expected lstm_41_input to have 3 dimensions, but got array with shape (40000,100)ValueError: Error when checking target: expected dense_1 to have shape (7,) but got array with shape (1,)ValueError: Error when checking target: expected dense_2 to have shape (1,) but got array with shape (0,)Keras exception: ValueError: Error when checking input: expected conv2d_1_input to have shape (150, 150, 3) but got array with shape (256, 256, 3)Steps taking too long to completewhen checking input: expected dense_1_input to have shape (13328,) but got array with shape (317,)ValueError: Error when checking target: expected dense_3 to have shape (None, 1) but got array with shape (7715, 40000)Keras exception: Error when checking input: expected dense_input to have shape (2,) but got array with shape (1,)

Ружовы пелікан Змест Знешні выгляд | Пашырэнне | Асаблівасці біялогіі | Літаратура | НавігацыяДагледжаная версіяправерана1 зменаДагледжаная версіяправерана1 змена/ 22697590 Сістэматыкана ВіківідахВыявына Вікісховішчы174693363011049382

Illegal assignment from SObject to ContactFetching String, Id from Map - Illegal Assignment Id to Field / ObjectError: Compile Error: Illegal assignment from String to BooleanError: List has no rows for assignment to SObjectError on Test Class - System.QueryException: List has no rows for assignment to SObjectRemote action problemDML requires SObject or SObject list type error“Illegal assignment from List to List”Test Class Fail: Batch Class: System.QueryException: List has no rows for assignment to SObjectMapping to a user'List has no rows for assignment to SObject' Mystery