Increasing SpaCy max NLP limit Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30 pm US/Eastern) 2019 Moderator Election Q&A - Questionnaire 2019 Community Moderator Election ResultsList of NLP challengesSklearn and PCA. Why is max n_row == max n_components?Resolving time in NLPNLP grouping word categoriesData scraping & NLP?Rasa_Nlu SpaCy installing dependenciesSpacy Returns Nonidentical Results for Doc. Examples?Help in NLP ProblemNLP: Fuzzy Word/Phrase Match“Context Resolution” Task in NLP
What were wait-states, and why was it only an issue for PCs?
My admission is revoked after accepting the admission offer
What's called a person who work as someone who puts products on shelves in stores?
Was there ever a LEGO store in Miami International Airport?
Will I be more secure with my own router behind my ISP's router?
Does Prince Arnaud cause someone holding the Princess to lose?
France's Public Holidays' Puzzle
In search of the origins of term censor, I hit a dead end stuck with the greek term, to censor, λογοκρίνω
Specify the range of GridLines
How do I deal with an erroneously large refund?
Coin Game with infinite paradox
Translate text contents of an existing file from lower to upper case and copy to a new file
Why is water being consumed when my shutoff valve is closed?
Why I cannot instantiate a class whose constructor is private in a friend class?
How to keep bees out of canned beverages?
Will I lose my paid in full property
What happened to Viserion in Season 7?
Are there existing rules/lore for MTG planeswalkers?
Putting Ant-Man on house arrest
What does こした mean?
What is the purpose of the side handle on a hand ("eggbeater") drill?
What is the ongoing value of the Kanban board to the developers as opposed to management
How was Lagrange appointed professor of mathematics so early?
What do you call an IPA symbol that lacks a name (e.g. ɲ)?
Increasing SpaCy max NLP limit
Announcing the arrival of Valued Associate #679: Cesar Manara
Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30 pm US/Eastern)
2019 Moderator Election Q&A - Questionnaire
2019 Community Moderator Election ResultsList of NLP challengesSklearn and PCA. Why is max n_row == max n_components?Resolving time in NLPNLP grouping word categoriesData scraping & NLP?Rasa_Nlu SpaCy installing dependenciesSpacy Returns Nonidentical Results for Doc. Examples?Help in NLP ProblemNLP: Fuzzy Word/Phrase Match“Context Resolution” Task in NLP
$begingroup$
I'm getting this error:
[E088] Text of length 1029371 exceeds maximum of 1000000. The v2.x parser and NER models require roughly 1GB of temporary memory per 100,000 characters in the input. This means long texts may cause memory allocation errors. If you're not using the parser or NER, it's probably safe to increase the `nlp.max_length` limit. The limit is in number of characters, so you can check whether your inputs are too long by checking `len(text)`.
The weird thing is that if I reduce the amount of documents being lemmatized, it still says the length exceeds 1 million. Is there a way of increasing the limit past 1 million? The error seems to suggest there is but I'm unable to do so.
python nlp
$endgroup$
bumped to the homepage by Community♦ 2 hours ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
add a comment |
$begingroup$
I'm getting this error:
[E088] Text of length 1029371 exceeds maximum of 1000000. The v2.x parser and NER models require roughly 1GB of temporary memory per 100,000 characters in the input. This means long texts may cause memory allocation errors. If you're not using the parser or NER, it's probably safe to increase the `nlp.max_length` limit. The limit is in number of characters, so you can check whether your inputs are too long by checking `len(text)`.
The weird thing is that if I reduce the amount of documents being lemmatized, it still says the length exceeds 1 million. Is there a way of increasing the limit past 1 million? The error seems to suggest there is but I'm unable to do so.
python nlp
$endgroup$
bumped to the homepage by Community♦ 2 hours ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
1
$begingroup$
What code exactly are you running when you get that error? Please include a sapmle in your post.
$endgroup$
– n1k31t4
Sep 25 '18 at 0:30
$begingroup$
Facing the same issue. Would be nice if spacy leaves it to the user how many words his/her infrastructure can process.
$endgroup$
– padmalcom
Jan 2 at 13:01
$begingroup$
See my answer. I spent many hours trying to troubleshoot this and figured that it was just easier to split the document into smaller pieces. Initially, I thought it had to do with the amount of RAM I was running.. But I think its a character limit on the library
$endgroup$
– D500
Jan 3 at 14:03
add a comment |
$begingroup$
I'm getting this error:
[E088] Text of length 1029371 exceeds maximum of 1000000. The v2.x parser and NER models require roughly 1GB of temporary memory per 100,000 characters in the input. This means long texts may cause memory allocation errors. If you're not using the parser or NER, it's probably safe to increase the `nlp.max_length` limit. The limit is in number of characters, so you can check whether your inputs are too long by checking `len(text)`.
The weird thing is that if I reduce the amount of documents being lemmatized, it still says the length exceeds 1 million. Is there a way of increasing the limit past 1 million? The error seems to suggest there is but I'm unable to do so.
python nlp
$endgroup$
I'm getting this error:
[E088] Text of length 1029371 exceeds maximum of 1000000. The v2.x parser and NER models require roughly 1GB of temporary memory per 100,000 characters in the input. This means long texts may cause memory allocation errors. If you're not using the parser or NER, it's probably safe to increase the `nlp.max_length` limit. The limit is in number of characters, so you can check whether your inputs are too long by checking `len(text)`.
The weird thing is that if I reduce the amount of documents being lemmatized, it still says the length exceeds 1 million. Is there a way of increasing the limit past 1 million? The error seems to suggest there is but I'm unable to do so.
python nlp
python nlp
asked Sep 24 '18 at 23:33
D500D500
62
62
bumped to the homepage by Community♦ 2 hours ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
bumped to the homepage by Community♦ 2 hours ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
1
$begingroup$
What code exactly are you running when you get that error? Please include a sapmle in your post.
$endgroup$
– n1k31t4
Sep 25 '18 at 0:30
$begingroup$
Facing the same issue. Would be nice if spacy leaves it to the user how many words his/her infrastructure can process.
$endgroup$
– padmalcom
Jan 2 at 13:01
$begingroup$
See my answer. I spent many hours trying to troubleshoot this and figured that it was just easier to split the document into smaller pieces. Initially, I thought it had to do with the amount of RAM I was running.. But I think its a character limit on the library
$endgroup$
– D500
Jan 3 at 14:03
add a comment |
1
$begingroup$
What code exactly are you running when you get that error? Please include a sapmle in your post.
$endgroup$
– n1k31t4
Sep 25 '18 at 0:30
$begingroup$
Facing the same issue. Would be nice if spacy leaves it to the user how many words his/her infrastructure can process.
$endgroup$
– padmalcom
Jan 2 at 13:01
$begingroup$
See my answer. I spent many hours trying to troubleshoot this and figured that it was just easier to split the document into smaller pieces. Initially, I thought it had to do with the amount of RAM I was running.. But I think its a character limit on the library
$endgroup$
– D500
Jan 3 at 14:03
1
1
$begingroup$
What code exactly are you running when you get that error? Please include a sapmle in your post.
$endgroup$
– n1k31t4
Sep 25 '18 at 0:30
$begingroup$
What code exactly are you running when you get that error? Please include a sapmle in your post.
$endgroup$
– n1k31t4
Sep 25 '18 at 0:30
$begingroup$
Facing the same issue. Would be nice if spacy leaves it to the user how many words his/her infrastructure can process.
$endgroup$
– padmalcom
Jan 2 at 13:01
$begingroup$
Facing the same issue. Would be nice if spacy leaves it to the user how many words his/her infrastructure can process.
$endgroup$
– padmalcom
Jan 2 at 13:01
$begingroup$
See my answer. I spent many hours trying to troubleshoot this and figured that it was just easier to split the document into smaller pieces. Initially, I thought it had to do with the amount of RAM I was running.. But I think its a character limit on the library
$endgroup$
– D500
Jan 3 at 14:03
$begingroup$
See my answer. I spent many hours trying to troubleshoot this and figured that it was just easier to split the document into smaller pieces. Initially, I thought it had to do with the amount of RAM I was running.. But I think its a character limit on the library
$endgroup$
– D500
Jan 3 at 14:03
add a comment |
1 Answer
1
active
oldest
votes
$begingroup$
I wasn't able to figure out how to increase the maximum limit of characters but I did however just split my document in half. The problem is that SpaCy cannot process more than 1 million characters. Because I ran into this problem during the lemmatization, it doesn't matter if the document is one whole or a few parts.
$endgroup$
add a comment |
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f38745%2fincreasing-spacy-max-nlp-limit%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
I wasn't able to figure out how to increase the maximum limit of characters but I did however just split my document in half. The problem is that SpaCy cannot process more than 1 million characters. Because I ran into this problem during the lemmatization, it doesn't matter if the document is one whole or a few parts.
$endgroup$
add a comment |
$begingroup$
I wasn't able to figure out how to increase the maximum limit of characters but I did however just split my document in half. The problem is that SpaCy cannot process more than 1 million characters. Because I ran into this problem during the lemmatization, it doesn't matter if the document is one whole or a few parts.
$endgroup$
add a comment |
$begingroup$
I wasn't able to figure out how to increase the maximum limit of characters but I did however just split my document in half. The problem is that SpaCy cannot process more than 1 million characters. Because I ran into this problem during the lemmatization, it doesn't matter if the document is one whole or a few parts.
$endgroup$
I wasn't able to figure out how to increase the maximum limit of characters but I did however just split my document in half. The problem is that SpaCy cannot process more than 1 million characters. Because I ran into this problem during the lemmatization, it doesn't matter if the document is one whole or a few parts.
answered Sep 25 '18 at 12:26
D500D500
62
62
add a comment |
add a comment |
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f38745%2fincreasing-spacy-max-nlp-limit%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
$begingroup$
What code exactly are you running when you get that error? Please include a sapmle in your post.
$endgroup$
– n1k31t4
Sep 25 '18 at 0:30
$begingroup$
Facing the same issue. Would be nice if spacy leaves it to the user how many words his/her infrastructure can process.
$endgroup$
– padmalcom
Jan 2 at 13:01
$begingroup$
See my answer. I spent many hours trying to troubleshoot this and figured that it was just easier to split the document into smaller pieces. Initially, I thought it had to do with the amount of RAM I was running.. But I think its a character limit on the library
$endgroup$
– D500
Jan 3 at 14:03