TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Show HN: Model2vec-Rs – Fast Static Text Embeddings in Rust

59 pointsby Tananon4 days ago
Hey HN! We’ve just open-sourced model2vec-rs, a Rust crate for loading and running Model2Vec static embedding models with zero Python dependency. This allows you to embed text at (very) high throughput; for example, in a Rust-based microservice or CLI tool. This can be used for semantic search, retrieval, RAG, or any other text embedding usecase.<p>Main Features:<p>- Rust-native inference: Load any Model2Vec model from Hugging Face or your local path with StaticModel::from_pretrained(...).<p>- Tiny footprint: The crate itself is only ~1.7 mb, with embedding models between 7 and 30 mb.<p>Performance:<p>We benchmarked single-threaded on a CPU:<p>- Python: ~4650 embeddings&#x2F;sec<p>- Rust: ~8000 embeddings&#x2F;sec (~1.7× speedup)<p>First open-source project in Rust for us, so would be great to get some feedback!

5 comments

gthompson5123 days ago
How does it handle documents longer than the context length of the model? Sorry there are a ton of these regularly and they don&#x27;t usually think about this.<p>Edit: it seems like it just splits in to sentences which is a weird thing to do given in English only 95%ish percent agreement is even possible on what a sentence is. ``` &#x2F;&#x2F; Process in batches for batch in sentences.chunks(batch_size) { &#x2F;&#x2F; Truncate each sentence to max_length * median_token_length chars let truncated: Vec&lt;&amp;str&gt; = batch .iter() .map(|text| { if let Some(max_tok) = max_length { Self::truncate_str(text, max_tok, self.median_token_length) } else { text.as_str() } }) .collect(); ```
评论 #44026027 未加载
noahbp4 days ago
What is your preferred static text embedding model?<p>For someone looking to build a large embedding search, fast static embeddings seem like a good deal, but almost too good to be true. What quality tradeoff are you seeing with these models versus embedding models with attention mechanisms?
评论 #44023356 未加载
echelon3 days ago
I love that you&#x27;re doing this, Tananon.<p>We&#x27;ve been using Candle and Cudarc and having a fairly good time of it. We&#x27;ve built a real time drawing app on a custom LCM stack, and Rust makes it feel rock solid. Python is way too flimsy for something like this.<p>The more the Rust ML ecosystem grows, the better. It&#x27;s a little bit fledgling right now, so every little bit counts.<p>If llama.cpp had instead been llama.rs, I feel like we would have had a runaway success.<p>We&#x27;ll be checking this out! Kudos, and keep it up!
评论 #44026940 未加载
Havoc4 days ago
Surprised it is so much faster. I would have thought the python one is C under the hood
评论 #44023378 未加载
badmonster3 days ago
How do I load a custom model instead of the ones on Hugging Face?
评论 #44026935 未加载