Hi all,<p>Ironic timing here! We're just preparing the 1.7 release, which has a lot of nice changes, including the option of a much smaller model for English (50mb), to help people test faster.<p>This means that if you install the library right now, you'll have to redownload the data once the new version is released.<p>So, maybe wait until tomorrow to get started? Definitely our most ambivalent front-paging yet!
Not sure exactly why this was posted today, since spaCy has been around at least a couple years, but - spaCy is a great tool, and I have a ton of respect for Matthew Honnibal, the main developer.<p>Coincidentally, I wrote a blog post [1] that went up just this morning that, in part, compares spaCy with the other giant in the Python NLP ecosystem, NLTK. TLDR - I think that, right now, the majority of users are better served by spaCy than NLTK.<p>[1] <a href="https://automatedinsights.com/blog/the-python-nlp-ccosystem-a-short-and-very-opinionated-guide" rel="nofollow">https://automatedinsights.com/blog/the-python-nlp-ccosystem-...</a>
It only supports English and German. However you can try add other languages here <a href="https://spacy.io/docs/usage/adding-languages" rel="nofollow">https://spacy.io/docs/usage/adding-languages</a>
Ask HN: Could you suggest a fast library for converting documents into a sparse matrix representation (e.g., COO or CSR) in any programming language? I'm guessing C beats most of the implementation? But there is also the issue of efficient n-gram hashing/indexing.
Does spaCy have a C# .NET wrapper, or can it be used from other languages/frameworks through a REST API?<p>I'm using the CoreNLP C# wrapper, so I'm wondering if something similar (.NET Core compatible) is available/doable for spaCy?