TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Show HN: Model2Vec: make sentence transformers 500x faster on CPU, 15x smaller

6 pointsby stephantul8 months ago
Hi HN!<p>We (Thomas and Stéphan, hello!) recently released Model2Vec, a Python library for distilling any sentence transformer into a small set of static embeddings. This makes inference with such a model up to 500x faster, and reduces model size by a factor of 15 (7.5M params or 15&#x2F;30MB on disk, depending on whether you use float16 or float32).<p>This reduction of course comes at a cost: distilled models are a lot worse than their parent models. Even so, they are actually a lot better than large sets of conventional static embeddings, such as GLoVe or word2vec-based models, which are many times larger. In addition, the performance gap between a Model2Vec model and a sentence-transformer ends up being smaller than you would expect, see: <a href="https:&#x2F;&#x2F;github.com&#x2F;MinishLab&#x2F;model2vec&#x2F;tree&#x2F;main?tab=readme-ov-file#results">https:&#x2F;&#x2F;github.com&#x2F;MinishLab&#x2F;model2vec&#x2F;tree&#x2F;main?tab=readme-...</a> for results. Fitting a Model2Vec does not require any data, just a sentence transformer and, possibly, a frequency-sorted vocabulary, making it an easy solution to implement in whatever workflow you have lying around.<p>We wrote this library because we separately got a bit frustrated with the lack of options if you need extremely fast CPU inference that still works well. If MiniLM isn’t fast enough and you don’t have access to a GPU, you’re often resigned to using BPemb, which is not flexible, or training your own GLoVe&#x2F;word2vec models, which requires lots of data. Model2Vec solves all of these problems, and works better than specialized static embeddings trained on huge corpora.<p>We spent a lot of time thinking about how the library could be easy to use and integrate into common workflows. It’s a tiny thing: we’d rather make only a few generic functions that work well instead of having a ton of integrations.<p>Please let us know what you think. We’re very interested in getting feedback from you. We’re already using this in our own projects, and ultimately built this because we kind of needed it, but we’d be happy to hear from you if you have interesting use-cases or questions.<p>Finally, If you think this sounds a lot like WordLlama, which was featured last week. It is! We were working on this in “stealth mode” for a while, since May, so I guess we and the WordLlama authors came up with the same idea at about the same time. We directly compare our models to WordLlama in our experiments. In short: WordLlama does a little bit worse, and is not unsupervised or multilingual, so it’s more difficult to adapt to new domains than Model2Vec.<p>Have a nice day!

1 comment

billconan8 months ago
the original models can only generate sentence embeddings, correct?<p>can a token prediction model use this?
评论 #41616494 未加载