Increasing SpaCy max NLP limit Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30 pm US/Eastern) 2019 Moderator Election Q&A - Questionnaire 2019 Community Moderator Election ResultsList of NLP challengesSklearn and PCA. Why is max n_row == max n_components?Resolving time in NLPNLP grouping word categoriesData scraping & NLP?Rasa_Nlu SpaCy installing dependenciesSpacy Returns Nonidentical Results for Doc. Examples?Help in NLP ProblemNLP: Fuzzy Word/Phrase Match“Context Resolution” Task in NLP

What were wait-states, and why was it only an issue for PCs?

My admission is revoked after accepting the admission offer

What's called a person who work as someone who puts products on shelves in stores?

Was there ever a LEGO store in Miami International Airport?

Will I be more secure with my own router behind my ISP's router?

Does Prince Arnaud cause someone holding the Princess to lose?

France's Public Holidays' Puzzle

In search of the origins of term censor, I hit a dead end stuck with the greek term, to censor, λογοκρίνω

Specify the range of GridLines

How do I deal with an erroneously large refund?

Coin Game with infinite paradox

Translate text contents of an existing file from lower to upper case and copy to a new file

Why is water being consumed when my shutoff valve is closed?

Why I cannot instantiate a class whose constructor is private in a friend class?

How to keep bees out of canned beverages?

Will I lose my paid in full property

What happened to Viserion in Season 7?

Are there existing rules/lore for MTG planeswalkers?

Putting Ant-Man on house arrest

What does こした mean?

What is the purpose of the side handle on a hand ("eggbeater") drill?

What is the ongoing value of the Kanban board to the developers as opposed to management

How was Lagrange appointed professor of mathematics so early?

What do you call an IPA symbol that lacks a name (e.g. ɲ)?



Increasing SpaCy max NLP limit



Announcing the arrival of Valued Associate #679: Cesar Manara
Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30 pm US/Eastern)
2019 Moderator Election Q&A - Questionnaire
2019 Community Moderator Election ResultsList of NLP challengesSklearn and PCA. Why is max n_row == max n_components?Resolving time in NLPNLP grouping word categoriesData scraping & NLP?Rasa_Nlu SpaCy installing dependenciesSpacy Returns Nonidentical Results for Doc. Examples?Help in NLP ProblemNLP: Fuzzy Word/Phrase Match“Context Resolution” Task in NLP










1












$begingroup$


I'm getting this error:



[E088] Text of length 1029371 exceeds maximum of 1000000. The v2.x parser and NER models require roughly 1GB of temporary memory per 100,000 characters in the input. This means long texts may cause memory allocation errors. If you're not using the parser or NER, it's probably safe to increase the `nlp.max_length` limit. The limit is in number of characters, so you can check whether your inputs are too long by checking `len(text)`.


The weird thing is that if I reduce the amount of documents being lemmatized, it still says the length exceeds 1 million. Is there a way of increasing the limit past 1 million? The error seems to suggest there is but I'm unable to do so.










share|improve this question









$endgroup$




bumped to the homepage by Community 2 hours ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.










  • 1




    $begingroup$
    What code exactly are you running when you get that error? Please include a sapmle in your post.
    $endgroup$
    – n1k31t4
    Sep 25 '18 at 0:30










  • $begingroup$
    Facing the same issue. Would be nice if spacy leaves it to the user how many words his/her infrastructure can process.
    $endgroup$
    – padmalcom
    Jan 2 at 13:01










  • $begingroup$
    See my answer. I spent many hours trying to troubleshoot this and figured that it was just easier to split the document into smaller pieces. Initially, I thought it had to do with the amount of RAM I was running.. But I think its a character limit on the library
    $endgroup$
    – D500
    Jan 3 at 14:03















1












$begingroup$


I'm getting this error:



[E088] Text of length 1029371 exceeds maximum of 1000000. The v2.x parser and NER models require roughly 1GB of temporary memory per 100,000 characters in the input. This means long texts may cause memory allocation errors. If you're not using the parser or NER, it's probably safe to increase the `nlp.max_length` limit. The limit is in number of characters, so you can check whether your inputs are too long by checking `len(text)`.


The weird thing is that if I reduce the amount of documents being lemmatized, it still says the length exceeds 1 million. Is there a way of increasing the limit past 1 million? The error seems to suggest there is but I'm unable to do so.










share|improve this question









$endgroup$




bumped to the homepage by Community 2 hours ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.










  • 1




    $begingroup$
    What code exactly are you running when you get that error? Please include a sapmle in your post.
    $endgroup$
    – n1k31t4
    Sep 25 '18 at 0:30










  • $begingroup$
    Facing the same issue. Would be nice if spacy leaves it to the user how many words his/her infrastructure can process.
    $endgroup$
    – padmalcom
    Jan 2 at 13:01










  • $begingroup$
    See my answer. I spent many hours trying to troubleshoot this and figured that it was just easier to split the document into smaller pieces. Initially, I thought it had to do with the amount of RAM I was running.. But I think its a character limit on the library
    $endgroup$
    – D500
    Jan 3 at 14:03













1












1








1


2



$begingroup$


I'm getting this error:



[E088] Text of length 1029371 exceeds maximum of 1000000. The v2.x parser and NER models require roughly 1GB of temporary memory per 100,000 characters in the input. This means long texts may cause memory allocation errors. If you're not using the parser or NER, it's probably safe to increase the `nlp.max_length` limit. The limit is in number of characters, so you can check whether your inputs are too long by checking `len(text)`.


The weird thing is that if I reduce the amount of documents being lemmatized, it still says the length exceeds 1 million. Is there a way of increasing the limit past 1 million? The error seems to suggest there is but I'm unable to do so.










share|improve this question









$endgroup$




I'm getting this error:



[E088] Text of length 1029371 exceeds maximum of 1000000. The v2.x parser and NER models require roughly 1GB of temporary memory per 100,000 characters in the input. This means long texts may cause memory allocation errors. If you're not using the parser or NER, it's probably safe to increase the `nlp.max_length` limit. The limit is in number of characters, so you can check whether your inputs are too long by checking `len(text)`.


The weird thing is that if I reduce the amount of documents being lemmatized, it still says the length exceeds 1 million. Is there a way of increasing the limit past 1 million? The error seems to suggest there is but I'm unable to do so.







python nlp






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Sep 24 '18 at 23:33









D500D500

62




62





bumped to the homepage by Community 2 hours ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.







bumped to the homepage by Community 2 hours ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.









  • 1




    $begingroup$
    What code exactly are you running when you get that error? Please include a sapmle in your post.
    $endgroup$
    – n1k31t4
    Sep 25 '18 at 0:30










  • $begingroup$
    Facing the same issue. Would be nice if spacy leaves it to the user how many words his/her infrastructure can process.
    $endgroup$
    – padmalcom
    Jan 2 at 13:01










  • $begingroup$
    See my answer. I spent many hours trying to troubleshoot this and figured that it was just easier to split the document into smaller pieces. Initially, I thought it had to do with the amount of RAM I was running.. But I think its a character limit on the library
    $endgroup$
    – D500
    Jan 3 at 14:03












  • 1




    $begingroup$
    What code exactly are you running when you get that error? Please include a sapmle in your post.
    $endgroup$
    – n1k31t4
    Sep 25 '18 at 0:30










  • $begingroup$
    Facing the same issue. Would be nice if spacy leaves it to the user how many words his/her infrastructure can process.
    $endgroup$
    – padmalcom
    Jan 2 at 13:01










  • $begingroup$
    See my answer. I spent many hours trying to troubleshoot this and figured that it was just easier to split the document into smaller pieces. Initially, I thought it had to do with the amount of RAM I was running.. But I think its a character limit on the library
    $endgroup$
    – D500
    Jan 3 at 14:03







1




1




$begingroup$
What code exactly are you running when you get that error? Please include a sapmle in your post.
$endgroup$
– n1k31t4
Sep 25 '18 at 0:30




$begingroup$
What code exactly are you running when you get that error? Please include a sapmle in your post.
$endgroup$
– n1k31t4
Sep 25 '18 at 0:30












$begingroup$
Facing the same issue. Would be nice if spacy leaves it to the user how many words his/her infrastructure can process.
$endgroup$
– padmalcom
Jan 2 at 13:01




$begingroup$
Facing the same issue. Would be nice if spacy leaves it to the user how many words his/her infrastructure can process.
$endgroup$
– padmalcom
Jan 2 at 13:01












$begingroup$
See my answer. I spent many hours trying to troubleshoot this and figured that it was just easier to split the document into smaller pieces. Initially, I thought it had to do with the amount of RAM I was running.. But I think its a character limit on the library
$endgroup$
– D500
Jan 3 at 14:03




$begingroup$
See my answer. I spent many hours trying to troubleshoot this and figured that it was just easier to split the document into smaller pieces. Initially, I thought it had to do with the amount of RAM I was running.. But I think its a character limit on the library
$endgroup$
– D500
Jan 3 at 14:03










1 Answer
1






active

oldest

votes


















0












$begingroup$

I wasn't able to figure out how to increase the maximum limit of characters but I did however just split my document in half. The problem is that SpaCy cannot process more than 1 million characters. Because I ran into this problem during the lemmatization, it doesn't matter if the document is one whole or a few parts.






share|improve this answer









$endgroup$













    Your Answer








    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "557"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













    draft saved

    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f38745%2fincreasing-spacy-max-nlp-limit%23new-answer', 'question_page');

    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0












    $begingroup$

    I wasn't able to figure out how to increase the maximum limit of characters but I did however just split my document in half. The problem is that SpaCy cannot process more than 1 million characters. Because I ran into this problem during the lemmatization, it doesn't matter if the document is one whole or a few parts.






    share|improve this answer









    $endgroup$

















      0












      $begingroup$

      I wasn't able to figure out how to increase the maximum limit of characters but I did however just split my document in half. The problem is that SpaCy cannot process more than 1 million characters. Because I ran into this problem during the lemmatization, it doesn't matter if the document is one whole or a few parts.






      share|improve this answer









      $endgroup$















        0












        0








        0





        $begingroup$

        I wasn't able to figure out how to increase the maximum limit of characters but I did however just split my document in half. The problem is that SpaCy cannot process more than 1 million characters. Because I ran into this problem during the lemmatization, it doesn't matter if the document is one whole or a few parts.






        share|improve this answer









        $endgroup$



        I wasn't able to figure out how to increase the maximum limit of characters but I did however just split my document in half. The problem is that SpaCy cannot process more than 1 million characters. Because I ran into this problem during the lemmatization, it doesn't matter if the document is one whole or a few parts.







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Sep 25 '18 at 12:26









        D500D500

        62




        62



























            draft saved

            draft discarded
















































            Thanks for contributing an answer to Data Science Stack Exchange!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            Use MathJax to format equations. MathJax reference.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f38745%2fincreasing-spacy-max-nlp-limit%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Ружовы пелікан Змест Знешні выгляд | Пашырэнне | Асаблівасці біялогіі | Літаратура | НавігацыяДагледжаная версіяправерана1 зменаДагледжаная версіяправерана1 змена/ 22697590 Сістэматыкана ВіківідахВыявына Вікісховішчы174693363011049382

            ValueError: Error when checking input: expected conv2d_13_input to have shape (3, 150, 150) but got array with shape (150, 150, 3)2019 Community Moderator ElectionError when checking : expected dense_1_input to have shape (None, 5) but got array with shape (200, 1)Error 'Expected 2D array, got 1D array instead:'ValueError: Error when checking input: expected lstm_41_input to have 3 dimensions, but got array with shape (40000,100)ValueError: Error when checking target: expected dense_1 to have shape (7,) but got array with shape (1,)ValueError: Error when checking target: expected dense_2 to have shape (1,) but got array with shape (0,)Keras exception: ValueError: Error when checking input: expected conv2d_1_input to have shape (150, 150, 3) but got array with shape (256, 256, 3)Steps taking too long to completewhen checking input: expected dense_1_input to have shape (13328,) but got array with shape (317,)ValueError: Error when checking target: expected dense_3 to have shape (None, 1) but got array with shape (7715, 40000)Keras exception: Error when checking input: expected dense_input to have shape (2,) but got array with shape (1,)

            Illegal assignment from SObject to ContactFetching String, Id from Map - Illegal Assignment Id to Field / ObjectError: Compile Error: Illegal assignment from String to BooleanError: List has no rows for assignment to SObjectError on Test Class - System.QueryException: List has no rows for assignment to SObjectRemote action problemDML requires SObject or SObject list type error“Illegal assignment from List to List”Test Class Fail: Batch Class: System.QueryException: List has no rows for assignment to SObjectMapping to a user'List has no rows for assignment to SObject' Mystery