I have been using Spacy3 nightly for a while now. This is game changing.<p>Spacy3 practically covers 90% of NLP use-cases with near SOTA performance. The only reason to not use it would be if you are literally pushing the boundaries of NLP or building something super specialized.<p>Hugging Face and Spacy (also Pytorch, but duh) are saving millions of dollars in man hours for companies around the world. They've been a revelation.
Ok so I’ve evaluated spacy a few years ago, but nowadays we’re using huggingface’s transformers / tokenizers / etc to train our own language models + fine tuned models. I see there’s now transformer based pipeline support, how do the two relate?<p>Phrased differently, how does spacy fit in with today’s world of transformers? Would it still be interesting for me?
SpaCy and HuggingFace fulfill practically 99% of all our needs for NLP project at work. Really incredible bodies of work.<p>Also, my team chat is currently filled with people being extremely stoked about the SpaCy + FastAPI support! Really hope FastAPI replaces Flask sooner rather than later.
I'm curious what sort of NLP use cases people are solving. How are people finding business value in these models and pipelines? We have looked at a number of uses and have found it hard to make a case for ROI. Wondering what's been working for folks.
I stumbled over SpaCy when looking for something to extract key words and numbers from sentences, however it looked a bit daunting and/or overkill. Think recipes or similar, turning "take three tablespoons of sugar" into [3, 'tablespoons', 'sugar'] or similar.<p>Should I give it another shot or are there libraries more suited for this than just plain regexp galore?
Thanks to the SpaCy team! I spent a lot of time over about 20 years working on my own NLP tools. I stopped doing that and mostly now just use SpaCy (and sometimes Huggingface and Apple’s NLP models).
I think I read somewhere that spaCy was going to have named entity disambiguation at some point, with named entities having links to knowledge bases like Wikidata or DBpedia. That’s something that paid NER services but that I haven’t found in open source libs, and would be really interesting IMO.
I'll add my hats off to to @ines and the spaCy team. It's super impressive. There's also a (free) orientation course I'd recommend at <a href="https://course.spacy.io/" rel="nofollow">https://course.spacy.io/</a>
Please note that Explosion does not like redistribution of SpaCy, they expect everyone to only use the builds they produce, so it would not be a good idea to package it for your favourite distro.
@hannibal
Would love to discuss how we could extend spacy as a powerful engine to also support processing from layouted documents. Just imagine how powerful it would be if you could throw a PDF document into it and it would preprocess it to text + layout, e.g. Paragraph and perform the next steps like extracting the right paragraph or date, adress, etc.
I would love to provide/support that transformation. I have been doing similar things using rasa_nlu+spacy.
Excited for this release and I will start integrating this in my own information extraction pipelines immediately. Thanks, Explosion team, got your stickers on my notebook!<p>The new configuration approach looks familiar to AllenNLP and that's great. Loose-coupling of model submodules with flexible config should be standard in NLP. I am happy that more libraries are integrating these concepts.
This sort of this could be used to create a "plain language" CLI right? So you could have the usual flags etc and then a separate version for less tech literate people that allows something like "list hidden files" (I know `ls -a` isn't particularly hard to remember I just need a contrived example).
That's really cool to see how accuracy of pre-trained models is improving by simply switching to v3.0<p>I've been using the v3 nightly version for 2 months and it works like a charm. I'm now training models with v3 and using them in production without any issue.<p>Great job!
I am not sure if SpaCy does it but is there some free and open-source framework capable of speech synthesis comparable to the level of AWS Polly or alike?
I see that by default the trf model is roberta_base <a href="https://spacy.io/models/en#en_core_web_trf" rel="nofollow">https://spacy.io/models/en#en_core_web_trf</a><p>Is there an easy way to use xlnet (from transformers) for pos tagging, dep parsing, etc?
Btw it would have been a smarter default as it scores more sota results on paperswithcode.com