TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Advanced NLP with spaCy v3

207 pointsby pvpvover 3 years ago

3 comments

artembugaraover 3 years ago
We&#x27;ve been using spaCy a lot for the past few months.<p>Mostly for non-production use cases, however, I can say that it is the most robust framework for NLP at the moment.<p>V3 added support for transformers: that&#x27;s a killer feature as many models from <a href="https:&#x2F;&#x2F;huggingface.co&#x2F;docs&#x2F;transformers&#x2F;index" rel="nofollow">https:&#x2F;&#x2F;huggingface.co&#x2F;docs&#x2F;transformers&#x2F;index</a> work great out of the box.<p>At the same time, I found NER models provided by spaCy to have a low accuracy while working with real data: we deal with news articles <a href="https:&#x2F;&#x2F;demo.newscatcherapi.com&#x2F;" rel="nofollow">https:&#x2F;&#x2F;demo.newscatcherapi.com&#x2F;</a><p>Also, while I see how much attention ML models get from the crowd, I think that many problems can be solved with rule-based approach: and spaCy is just amazing for these.<p>Btw, we recently wrote a blog post comparing spaCy to NLTK for text normalization task: <a href="https:&#x2F;&#x2F;newscatcherapi.com&#x2F;blog&#x2F;spacy-vs-nltk-text-normalization-comparison-with-code-examples" rel="nofollow">https:&#x2F;&#x2F;newscatcherapi.com&#x2F;blog&#x2F;spacy-vs-nltk-text-normaliza...</a>
评论 #29511921 未加载
评论 #29512079 未加载
评论 #29514603 未加载
评论 #29519236 未加载
评论 #29514681 未加载
评论 #29513680 未加载
评论 #29515417 未加载
minimaxirover 3 years ago
A relatively underdiscussed quirk of the rise of superlarge language models like GPT-3 for certain NLP tasks is that since those models have incorporated so much real world grammar, there&#x27;s no need to do advanced preprocessing and can just YOLO and work with generated embeddings instead without going into spaCy&#x27;s (excellent) parsing&#x2F;NER features.<p>OpenAI recently released an Embeddings API for GPT-3 with good demos and explanations: <a href="https:&#x2F;&#x2F;beta.openai.com&#x2F;docs&#x2F;guides&#x2F;embeddings" rel="nofollow">https:&#x2F;&#x2F;beta.openai.com&#x2F;docs&#x2F;guides&#x2F;embeddings</a><p>Hugging Face Transformers makes this easier (and for free) as most models can be configured to return a &quot;last_hidden_state&quot; which will return the aggregated embedding. Just use DistilBERT uncased&#x2F;cased (which is fast enough to run on consumer CPUs) and you&#x27;re probably good to go.
评论 #29514872 未加载
评论 #29512654 未加载
评论 #29514057 未加载
41209over 3 years ago
I really love spaCy, it&#x27;s trivial to throw up a server which handles basic NLP. No complaints here, very happy to see it still being updated