SpaCy 3.0

484 pointsby syllogismover 4 years ago

22 comments

screyeover 4 years ago

I have been using Spacy3 nightly for a while now. This is game changing.Spacy3 practically covers 90% of NLP use-cases with near SOTA performance. The only reason to not use it would be if you are literally pushing the boundaries of NLP or building something super specialized.Hugging Face and Spacy (also Pytorch, but duh) are saving millions of dollars in man hours for companies around the world. They've been a revelation.

评论 #25989841 未加载

评论 #25989930 未加载

评论 #25996828 未加载

stingraycharlesover 4 years ago

Ok so I’ve evaluated spacy a few years ago, but nowadays we’re using huggingface’s transformers / tokenizers / etc to train our own language models + fine tuned models. I see there’s now transformer based pipeline support, how do the two relate?Phrased differently, how does spacy fit in with today’s world of transformers? Would it still be interesting for me?

评论 #25988898 未加载

评论 #25989011 未加载

ZeroCool2uover 4 years ago

SpaCy and HuggingFace fulfill practically 99% of all our needs for NLP project at work. Really incredible bodies of work.Also, my team chat is currently filled with people being extremely stoked about the SpaCy + FastAPI support! Really hope FastAPI replaces Flask sooner rather than later.

评论 #25991340 未加载

评论 #25990178 未加载

brataoover 4 years ago

Thank you Matthew, Ines, Sofie and Adriane for spaCy. It is a fundamental piece for me, both for work in Academia and in Industry.

liminalover 4 years ago

I'm curious what sort of NLP use cases people are solving. How are people finding business value in these models and pipelines? We have looked at a number of uses and have found it hard to make a case for ROI. Wondering what's been working for folks.

评论 #26000444 未加载

评论 #25997526 未加载

评论 #25995789 未加载

magicalhippoover 4 years ago

I stumbled over SpaCy when looking for something to extract key words and numbers from sentences, however it looked a bit daunting and/or overkill. Think recipes or similar, turning "take three tablespoons of sugar" into [3, 'tablespoons', 'sugar'] or similar.Should I give it another shot or are there libraries more suited for this than just plain regexp galore?

评论 #25993729 未加载

评论 #25998658 未加载

mark_l_watsonover 4 years ago

Thanks to the SpaCy team! I spent a lot of time over about 20 years working on my own NLP tools. I stopped doing that and mostly now just use SpaCy (and sometimes Huggingface and Apple’s NLP models).

评论 #25988893 未加载

Xenoamorphousover 4 years ago

I think I read somewhere that spaCy was going to have named entity disambiguation at some point, with named entities having links to knowledge bases like Wikidata or DBpedia. That’s something that paid NER services but that I haven’t found in open source libs, and would be really interesting IMO.

评论 #25994882 未加载

jcuenodover 4 years ago

I'll add my hats off to to @ines and the spaCy team. It's super impressive. There's also a (free) orientation course I'd recommend at <a href="https://course.spacy.io/" rel="nofollow">https://course.spacy.io/</a>

pabs3over 4 years ago

Please note that Explosion does not like redistribution of SpaCy, they expect everyone to only use the builds they produce, so it would not be a good idea to package it for your favourite distro.

评论 #25989657 未加载

评论 #25989613 未加载

nocajarover 4 years ago

@hannibal Would love to discuss how we could extend spacy as a powerful engine to also support processing from layouted documents. Just imagine how powerful it would be if you could throw a PDF document into it and it would preprocess it to text + layout, e.g. Paragraph and perform the next steps like extracting the right paragraph or date, adress, etc. I would love to provide/support that transformation. I have been doing similar things using rasa_nlu+spacy.

gillesjacobsover 4 years ago

Excited for this release and I will start integrating this in my own information extraction pipelines immediately. Thanks, Explosion team, got your stickers on my notebook!The new configuration approach looks familiar to AllenNLP and that's great. Loose-coupling of model submodules with flexible config should be standard in NLP. I am happy that more libraries are integrating these concepts.

datametaover 4 years ago

I wonder if it is sheer coincidence that SpaCy is pronounced the way the russian word "спасай" is. It means "rescue" (v.)

tsujpover 4 years ago

This sort of this could be used to create a "plain language" CLI right? So you could have the usual flags etc and then a separate version for less tech literate people that allows something like "list hidden files" (I know `ls -a` isn't particularly hard to remember I just need a contrived example).

juliensalinasover 4 years ago

That's really cool to see how accuracy of pre-trained models is improving by simply switching to v3.0I've been using the v3 nightly version for 2 months and it works like a charm. I'm now training models with v3 and using them in production without any issue.Great job!

yewenjieover 4 years ago

I am not sure if SpaCy does it but is there some free and open-source framework capable of speech synthesis comparable to the level of AWS Polly or alike?

langitbiruover 4 years ago

So with SpaCy 3.0, HuggingFace, do we still have a reason to use NLTK? Or they complement each other? Right now, I lost track of the progress in NLP.

评论 #25991867 未加载

pplonski86over 4 years ago

Is there any framework similar to SpaCy or HugginFaces but for images?

评论 #25990501 未加载

polynomialover 4 years ago

Super excited to see improvement in NER accuracy in SpaCy 3.0.

suyashover 4 years ago

Has anyone tried it on Raspberry Pi, will it work well?

cambalacheover 4 years ago

> spaCy is a library for advanced Natural Language Processing in Python and Cython.

评论 #25989316 未加载

The_rationalistover 4 years ago

I see that by default the trf model is roberta_base <a href="https://spacy.io/models/en#en_core_web_trf" rel="nofollow">https://spacy.io/models/en#en_core_web_trf</a>Is there an easy way to use xlnet (from transformers) for pos tagging, dep parsing, etc? Btw it would have been a smarter default as it scores more sota results on paperswithcode.com

评论 #25989231 未加载