Hi HN!<p>We (Thomas and Stéphan, hello!) recently released Model2Vec, a Python library for distilling any sentence transformer into a small set of static embeddings. This makes inference with such a model up to 500x faster, and reduces model size by a factor of 15 (7.5M params or 15/30MB on disk, depending on whether you use float16 or float32).<p>This reduction of course comes at a cost: distilled models are a lot worse than their parent models. Even so, they are actually a lot better than large sets of conventional static embeddings, such as GLoVe or word2vec-based models, which are many times larger. In addition, the performance gap between a Model2Vec model and a sentence-transformer ends up being smaller than you would expect, see: <a href="https://github.com/MinishLab/model2vec/tree/main?tab=readme-ov-file#results">https://github.com/MinishLab/model2vec/tree/main?tab=readme-...</a> for results. Fitting a Model2Vec does not require any data, just a sentence transformer and, possibly, a frequency-sorted vocabulary, making it an easy solution to implement in whatever workflow you have lying around.<p>We wrote this library because we separately got a bit frustrated with the lack of options if you need extremely fast CPU inference that still works well. If MiniLM isn’t fast enough and you don’t have access to a GPU, you’re often resigned to using BPemb, which is not flexible, or training your own GLoVe/word2vec models, which requires lots of data. Model2Vec solves all of these problems, and works better than specialized static embeddings trained on huge corpora.<p>We spent a lot of time thinking about how the library could be easy to use and integrate into common workflows. It’s a tiny thing: we’d rather make only a few generic functions that work well instead of having a ton of integrations.<p>Please let us know what you think. We’re very interested in getting feedback from you. We’re already using this in our own projects, and ultimately built this because we kind of needed it, but we’d be happy to hear from you if you have interesting use-cases or questions.<p>Finally, If you think this sounds a lot like WordLlama, which was featured last week. It is! We were working on this in “stealth mode” for a while, since May, so I guess we and the WordLlama authors came up with the same idea at about the same time. We directly compare our models to WordLlama in our experiments. In short: WordLlama does a little bit worse, and is not unsupervised or multilingual, so it’s more difficult to adapt to new domains than Model2Vec.<p>Have a nice day!