Found array with dim 3. Estimator expected <= 22019 Community Moderator ElectionHow do I use Hierarchical Dirichlet Process (HDP) implementations (hdpfaster by C.Wang or hca by W.Buntine at mloss.org) to discover number of topics?Classification when one class is otherWord analysis in PythonHow to use binary relevance for multi-label text classification?Learning token dictionaryQuantifying the Reproducibility of LDA Modelswhy the accuracy of LDA model is always changing and also is highCan I use euclidean distance for Latent Dirichlet Allocation document similarity?Need help in improving accuracy of text classification using Naive Bayes in nltk for movie reviewsExtracting sections from document based on list of keywords

Found array with dim 3. Estimator expected <= 22019 Community Moderator ElectionHow do I use Hierarchical Dirichlet Process (HDP) implementations (hdpfaster by C.Wang or hca by W.Buntine at mloss.org) to discover number of topics?Classification when one class is otherWord analysis in PythonHow to use binary relevance for multi-label text classification?Learning token dictionaryQuantifying the Reproducibility of LDA Modelswhy the accuracy of LDA model is always changing and also is highCan I use euclidean distance for Latent Dirichlet Allocation document similarity?Need help in improving accuracy of text classification using Naive Bayes in nltk for movie reviewsExtracting sections from document based on list of keywords - Python

How will losing mobility of one hand affect my career as a programmer?

Does "Dominei" mean something?

Did US corporations pay demonstrators in the German demonstrations against article 13?

Can somebody explain Brexit in a few child-proof sentences?

Is there a good way to store credentials outside of a password manager?

Resetting two CD4017 counters simultaneously, only one resets

Can I create an upright 7-foot × 5-foot wall with the Minor Illusion spell?

Can I rely on these GitHub repository files?

What if somebody invests in my application?

A known event to a history junkie

Can a Gentile theist be saved?

Giant Toughroad SLR 2 for 200 miles in two days, will it make it?

How do I repair my stair bannister?

Can a Bard use an arcane focus?

Science Fiction story where a man invents a machine that can help him watch history unfold

Stereotypical names

Calculating the number of days between 2 dates in Excel

What will be the benefits of Brexit?

What was required to accept "troll"?

What is the term when two people sing in harmony, but they aren't singing the same notes?

Bob has never been a M before

Teaching indefinite integrals that require special-casing

Freedom of speech and where it applies

Should my PhD thesis be submitted under my legal name?

Found array with dim 3. Estimator expected

2019 Community Moderator ElectionHow do I use Hierarchical Dirichlet Process (HDP) implementations (hdpfaster by C.Wang or hca by W.Buntine at mloss.org) to discover number of topics?Classification when one class is otherWord analysis in PythonHow to use binary relevance for multi-label text classification?Learning token dictionaryQuantifying the Reproducibility of LDA Modelswhy the accuracy of LDA model is always changing and also is highCan I use euclidean distance for Latent Dirichlet Allocation document similarity?Need help in improving accuracy of text classification using Naive Bayes in nltk for movie reviewsExtracting sections from document based on list of keywords - Python

I am using LDA over a simple collection of documents. My goal is to extract topics, then use the extracted topics as features to evaluate my model.

I decided to use multinomial SVM as the evaluator.

import itertools
from gensim.models import ldamodel
from nltk.tokenize import RegexpTokenizer
from nltk.stem.porter import PorterStemmer
from gensim import corpora, models
from sklearn.naive_bayes import MultinomialNB

tokenizer = RegexpTokenizer(r'w+')

# create English stop words list
en_stop = 'a'

# Create p_stemmer of class PorterStemmer
p_stemmer = PorterStemmer()

# create sample documents
doc_a = "Brocolli is good to eat. My brother likes to eat good brocolli, but not my mother."
doc_b = "My mother spends a lot of time driving my brother around to baseball practice."
doc_c = "Some health experts suggest that driving may cause increased tension and blood pressure."
doc_d = "I often feel pressure to perform well at school, but my mother never seems to drive my brother to do better."
doc_e = "Health professionals say that brocolli is good for your health."

# compile sample documents into a list
doc_set = [doc_a, doc_b, doc_c, doc_d, doc_e]

# list for tokenized documents in loop
texts = []

# loop through document list
for i in doc_set:
 # clean and tokenize document string
 raw = i.lower()
 tokens = tokenizer.tokenize(raw)

 # remove stop words from tokens
 stopped_tokens = [i for i in tokens if not i in en_stop]

 # stem tokens
 stemmed_tokens = [p_stemmer.stem(i) for i in stopped_tokens]

 # add tokens to list
 texts.append(stemmed_tokens)

# turn our tokenized documents into a id <-> term dictionary
dictionary = corpora.Dictionary(texts)

# convert tokenized documents into a document-term matrix
corpus = [dictionary.doc2bow(text) for text in texts]


# generate LDA model
#ldamodel = gensim.models.ldamodel.LdaModel(corpus, num_topics=2, id2word=dictionary, passes=20)

id2word = corpora.Dictionary(texts)
# Creates the Bag of Word corpus.
mm = [id2word.doc2bow(text) for text in texts]

# Trains the LDA models.
lda = ldamodel.LdaModel(corpus=mm, id2word=id2word, num_topics=4,
 update_every=1, chunksize=10000, passes=1)


# Assigns the topics to the documents in corpus
a=[]
lda_corpus = lda[mm]
for i in range(len(doc_set)):
 a.append(lda_corpus[i])
 print(lda_corpus[i])
merged_list = list(itertools.chain(*lda_corpus))
print(a)
 #my_list.append(my_list[i])


sv=MultinomialNB()

yvalues = [0,1,2,3]

sv.fit(a,yvalues)
predictclass = sv.predict(a)

testLables=[0,1,2,3]
from sklearn import metrics, tree
#yacc=metrics.accuracy_score(testLables,predictclass)
#print (yacc)

when I run this code it throws the error mentioned in the subject.

Also this is the output of LDA model(topic doc distribution) that I feed to SVM:

[[(0, 0.95533888404477663), (1, 0.014775921798986477), (2, 0.015161897773308793), (3, 0.014723296382928375)], [(0, 0.019079556242721694), (1, 0.017932434792585779), (2, 0.94498655991579728), (3, 0.018001449048895311)], [(0, 0.017957955483631164), (1, 0.017900184473362918), (2, 0.018133572636989413), (3, 0.9460082874060165)], [(0, 0.96554611572184923), (1, 0.011407838337200715), (2, 0.011537900721487016), (3, 0.011508145219463113)], [(0, 0.023306931039431281), (1, 0.022823706054846005), (2, 0.93072240824085961), (3, 0.023146954664863096)]]

my labels here are : 0,1,2,3

edited 3 mins ago

Damini Jain

1073

asked Aug 8 '17 at 16:29

sariii

18510

$begingroup$
Hi Saria, was this problem resolved? I'm encountering a similar problem with sklearn.cluster.KMeans
$endgroup$
– frank
Jun 24 '18 at 22:44

$begingroup$
you need to reshape the array I guess. Since it's 3 dimensional the error is being thrown, the algorithm works with 2D arrays, not 3D. Try reshaping.
$endgroup$
– Damini Jain
1 hour ago

$begingroup$
stackoverflow.com/questions/34972142/…
$endgroup$
– Damini Jain
1 hour ago

add a comment |

I am using LDA over a simple collection of documents. My goal is to extract topics, then use the extracted topics as features to evaluate my model.

I decided to use multinomial SVM as the evaluator.

import itertools
from gensim.models import ldamodel
from nltk.tokenize import RegexpTokenizer
from nltk.stem.porter import PorterStemmer
from gensim import corpora, models
from sklearn.naive_bayes import MultinomialNB

tokenizer = RegexpTokenizer(r'w+')

# create English stop words list
en_stop = 'a'

# Create p_stemmer of class PorterStemmer
p_stemmer = PorterStemmer()

# create sample documents
doc_a = "Brocolli is good to eat. My brother likes to eat good brocolli, but not my mother."
doc_b = "My mother spends a lot of time driving my brother around to baseball practice."
doc_c = "Some health experts suggest that driving may cause increased tension and blood pressure."
doc_d = "I often feel pressure to perform well at school, but my mother never seems to drive my brother to do better."
doc_e = "Health professionals say that brocolli is good for your health."

# compile sample documents into a list
doc_set = [doc_a, doc_b, doc_c, doc_d, doc_e]

# list for tokenized documents in loop
texts = []

# loop through document list
for i in doc_set:
 # clean and tokenize document string
 raw = i.lower()
 tokens = tokenizer.tokenize(raw)

 # remove stop words from tokens
 stopped_tokens = [i for i in tokens if not i in en_stop]

 # stem tokens
 stemmed_tokens = [p_stemmer.stem(i) for i in stopped_tokens]

 # add tokens to list
 texts.append(stemmed_tokens)

# turn our tokenized documents into a id <-> term dictionary
dictionary = corpora.Dictionary(texts)

# convert tokenized documents into a document-term matrix
corpus = [dictionary.doc2bow(text) for text in texts]


# generate LDA model
#ldamodel = gensim.models.ldamodel.LdaModel(corpus, num_topics=2, id2word=dictionary, passes=20)

id2word = corpora.Dictionary(texts)
# Creates the Bag of Word corpus.
mm = [id2word.doc2bow(text) for text in texts]

# Trains the LDA models.
lda = ldamodel.LdaModel(corpus=mm, id2word=id2word, num_topics=4,
 update_every=1, chunksize=10000, passes=1)


# Assigns the topics to the documents in corpus
a=[]
lda_corpus = lda[mm]
for i in range(len(doc_set)):
 a.append(lda_corpus[i])
 print(lda_corpus[i])
merged_list = list(itertools.chain(*lda_corpus))
print(a)
 #my_list.append(my_list[i])


sv=MultinomialNB()

yvalues = [0,1,2,3]

sv.fit(a,yvalues)
predictclass = sv.predict(a)

testLables=[0,1,2,3]
from sklearn import metrics, tree
#yacc=metrics.accuracy_score(testLables,predictclass)
#print (yacc)

when I run this code it throws the error mentioned in the subject.

Also this is the output of LDA model(topic doc distribution) that I feed to SVM:

[[(0, 0.95533888404477663), (1, 0.014775921798986477), (2, 0.015161897773308793), (3, 0.014723296382928375)], [(0, 0.019079556242721694), (1, 0.017932434792585779), (2, 0.94498655991579728), (3, 0.018001449048895311)], [(0, 0.017957955483631164), (1, 0.017900184473362918), (2, 0.018133572636989413), (3, 0.9460082874060165)], [(0, 0.96554611572184923), (1, 0.011407838337200715), (2, 0.011537900721487016), (3, 0.011508145219463113)], [(0, 0.023306931039431281), (1, 0.022823706054846005), (2, 0.93072240824085961), (3, 0.023146954664863096)]]

my labels here are : 0,1,2,3

edited 3 mins ago

Damini Jain

1073

asked Aug 8 '17 at 16:29

sariii

18510

$begingroup$
Hi Saria, was this problem resolved? I'm encountering a similar problem with sklearn.cluster.KMeans
$endgroup$
– frank
Jun 24 '18 at 22:44

$begingroup$
you need to reshape the array I guess. Since it's 3 dimensional the error is being thrown, the algorithm works with 2D arrays, not 3D. Try reshaping.
$endgroup$
– Damini Jain
1 hour ago

$begingroup$
stackoverflow.com/questions/34972142/…
$endgroup$
– Damini Jain
1 hour ago

add a comment |

I am using LDA over a simple collection of documents. My goal is to extract topics, then use the extracted topics as features to evaluate my model.

I decided to use multinomial SVM as the evaluator.

import itertools
from gensim.models import ldamodel
from nltk.tokenize import RegexpTokenizer
from nltk.stem.porter import PorterStemmer
from gensim import corpora, models
from sklearn.naive_bayes import MultinomialNB

tokenizer = RegexpTokenizer(r'w+')

# create English stop words list
en_stop = 'a'

# Create p_stemmer of class PorterStemmer
p_stemmer = PorterStemmer()

# create sample documents
doc_a = "Brocolli is good to eat. My brother likes to eat good brocolli, but not my mother."
doc_b = "My mother spends a lot of time driving my brother around to baseball practice."
doc_c = "Some health experts suggest that driving may cause increased tension and blood pressure."
doc_d = "I often feel pressure to perform well at school, but my mother never seems to drive my brother to do better."
doc_e = "Health professionals say that brocolli is good for your health."

# compile sample documents into a list
doc_set = [doc_a, doc_b, doc_c, doc_d, doc_e]

# list for tokenized documents in loop
texts = []

# loop through document list
for i in doc_set:
 # clean and tokenize document string
 raw = i.lower()
 tokens = tokenizer.tokenize(raw)

 # remove stop words from tokens
 stopped_tokens = [i for i in tokens if not i in en_stop]

 # stem tokens
 stemmed_tokens = [p_stemmer.stem(i) for i in stopped_tokens]

 # add tokens to list
 texts.append(stemmed_tokens)

# turn our tokenized documents into a id <-> term dictionary
dictionary = corpora.Dictionary(texts)

# convert tokenized documents into a document-term matrix
corpus = [dictionary.doc2bow(text) for text in texts]


# generate LDA model
#ldamodel = gensim.models.ldamodel.LdaModel(corpus, num_topics=2, id2word=dictionary, passes=20)

id2word = corpora.Dictionary(texts)
# Creates the Bag of Word corpus.
mm = [id2word.doc2bow(text) for text in texts]

# Trains the LDA models.
lda = ldamodel.LdaModel(corpus=mm, id2word=id2word, num_topics=4,
 update_every=1, chunksize=10000, passes=1)


# Assigns the topics to the documents in corpus
a=[]
lda_corpus = lda[mm]
for i in range(len(doc_set)):
 a.append(lda_corpus[i])
 print(lda_corpus[i])
merged_list = list(itertools.chain(*lda_corpus))
print(a)
 #my_list.append(my_list[i])


sv=MultinomialNB()

yvalues = [0,1,2,3]

sv.fit(a,yvalues)
predictclass = sv.predict(a)

testLables=[0,1,2,3]
from sklearn import metrics, tree
#yacc=metrics.accuracy_score(testLables,predictclass)
#print (yacc)

when I run this code it throws the error mentioned in the subject.

Also this is the output of LDA model(topic doc distribution) that I feed to SVM:

[[(0, 0.95533888404477663), (1, 0.014775921798986477), (2, 0.015161897773308793), (3, 0.014723296382928375)], [(0, 0.019079556242721694), (1, 0.017932434792585779), (2, 0.94498655991579728), (3, 0.018001449048895311)], [(0, 0.017957955483631164), (1, 0.017900184473362918), (2, 0.018133572636989413), (3, 0.9460082874060165)], [(0, 0.96554611572184923), (1, 0.011407838337200715), (2, 0.011537900721487016), (3, 0.011508145219463113)], [(0, 0.023306931039431281), (1, 0.022823706054846005), (2, 0.93072240824085961), (3, 0.023146954664863096)]]

my labels here are : 0,1,2,3

edited 3 mins ago

Damini Jain

1073

asked Aug 8 '17 at 16:29

sariii

18510

I am using LDA over a simple collection of documents. My goal is to extract topics, then use the extracted topics as features to evaluate my model.

I decided to use multinomial SVM as the evaluator.

import itertools
from gensim.models import ldamodel
from nltk.tokenize import RegexpTokenizer
from nltk.stem.porter import PorterStemmer
from gensim import corpora, models
from sklearn.naive_bayes import MultinomialNB

tokenizer = RegexpTokenizer(r'w+')

# create English stop words list
en_stop = 'a'

# Create p_stemmer of class PorterStemmer
p_stemmer = PorterStemmer()

# create sample documents
doc_a = "Brocolli is good to eat. My brother likes to eat good brocolli, but not my mother."
doc_b = "My mother spends a lot of time driving my brother around to baseball practice."
doc_c = "Some health experts suggest that driving may cause increased tension and blood pressure."
doc_d = "I often feel pressure to perform well at school, but my mother never seems to drive my brother to do better."
doc_e = "Health professionals say that brocolli is good for your health."

# compile sample documents into a list
doc_set = [doc_a, doc_b, doc_c, doc_d, doc_e]

# list for tokenized documents in loop
texts = []

# loop through document list
for i in doc_set:
 # clean and tokenize document string
 raw = i.lower()
 tokens = tokenizer.tokenize(raw)

 # remove stop words from tokens
 stopped_tokens = [i for i in tokens if not i in en_stop]

 # stem tokens
 stemmed_tokens = [p_stemmer.stem(i) for i in stopped_tokens]

 # add tokens to list
 texts.append(stemmed_tokens)

# turn our tokenized documents into a id <-> term dictionary
dictionary = corpora.Dictionary(texts)

# convert tokenized documents into a document-term matrix
corpus = [dictionary.doc2bow(text) for text in texts]


# generate LDA model
#ldamodel = gensim.models.ldamodel.LdaModel(corpus, num_topics=2, id2word=dictionary, passes=20)

id2word = corpora.Dictionary(texts)
# Creates the Bag of Word corpus.
mm = [id2word.doc2bow(text) for text in texts]

# Trains the LDA models.
lda = ldamodel.LdaModel(corpus=mm, id2word=id2word, num_topics=4,
 update_every=1, chunksize=10000, passes=1)


# Assigns the topics to the documents in corpus
a=[]
lda_corpus = lda[mm]
for i in range(len(doc_set)):
 a.append(lda_corpus[i])
 print(lda_corpus[i])
merged_list = list(itertools.chain(*lda_corpus))
print(a)
 #my_list.append(my_list[i])


sv=MultinomialNB()

yvalues = [0,1,2,3]

sv.fit(a,yvalues)
predictclass = sv.predict(a)

testLables=[0,1,2,3]
from sklearn import metrics, tree
#yacc=metrics.accuracy_score(testLables,predictclass)
#print (yacc)

when I run this code it throws the error mentioned in the subject.

Also this is the output of LDA model(topic doc distribution) that I feed to SVM:

[[(0, 0.95533888404477663), (1, 0.014775921798986477), (2, 0.015161897773308793), (3, 0.014723296382928375)], [(0, 0.019079556242721694), (1, 0.017932434792585779), (2, 0.94498655991579728), (3, 0.018001449048895311)], [(0, 0.017957955483631164), (1, 0.017900184473362918), (2, 0.018133572636989413), (3, 0.9460082874060165)], [(0, 0.96554611572184923), (1, 0.011407838337200715), (2, 0.011537900721487016), (3, 0.011508145219463113)], [(0, 0.023306931039431281), (1, 0.022823706054846005), (2, 0.93072240824085961), (3, 0.023146954664863096)]]

my labels here are : 0,1,2,3

machine-learning python classification svm lda

edited 3 mins ago

Damini Jain

1073

asked Aug 8 '17 at 16:29

sariii

18510

edited 3 mins ago

Damini Jain

1073

asked Aug 8 '17 at 16:29

sariii

18510

edited 3 mins ago

Damini Jain

1073

edited 3 mins ago

Damini Jain

1073

edited 3 mins ago

Damini Jain

1073

asked Aug 8 '17 at 16:29

sariii

18510

asked Aug 8 '17 at 16:29

sariii

18510

asked Aug 8 '17 at 16:29

sariii

18510

$begingroup$
Hi Saria, was this problem resolved? I'm encountering a similar problem with sklearn.cluster.KMeans
$endgroup$
– frank
Jun 24 '18 at 22:44

$begingroup$
you need to reshape the array I guess. Since it's 3 dimensional the error is being thrown, the algorithm works with 2D arrays, not 3D. Try reshaping.
$endgroup$
– Damini Jain
1 hour ago

$begingroup$
stackoverflow.com/questions/34972142/…
$endgroup$
– Damini Jain
1 hour ago

add a comment |

$begingroup$
Hi Saria, was this problem resolved? I'm encountering a similar problem with sklearn.cluster.KMeans
$endgroup$
– frank
Jun 24 '18 at 22:44

$begingroup$
you need to reshape the array I guess. Since it's 3 dimensional the error is being thrown, the algorithm works with 2D arrays, not 3D. Try reshaping.
$endgroup$
– Damini Jain
1 hour ago

$begingroup$
stackoverflow.com/questions/34972142/…
$endgroup$
– Damini Jain
1 hour ago

Hi Saria, was this problem resolved? I'm encountering a similar problem with sklearn.cluster.KMeans

– frank
Jun 24 '18 at 22:44

you need to reshape the array I guess. Since it's 3 dimensional the error is being thrown, the algorithm works with 2D arrays, not 3D. Try reshaping.

– Damini Jain
1 hour ago

stackoverflow.com/questions/34972142/…

– Damini Jain
1 hour ago

add a comment |

0

active

oldest

votes

Your Answer

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f22078%2ffound-array-with-dim-3-estimator-expected-2%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

0

active

oldest

votes

0

active

oldest

votes

draft saved

draft discarded

Thanks for contributing an answer to Data Science Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Hfrxdjt

0

Your Answer

Post as a guest

0

0

Post as a guest

Popular posts from this blog

0

Your Answer

Sign up or log in

Post as a guest

Post as a guest

0

0

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog