Hey HN!
We’ve just open-sourced model2vec-rs, a Rust crate for loading and running Model2Vec static embedding models with zero Python dependency. This allows you to embed text at (very) high throughput; for example, in a Rust-based microservice or CLI tool. This can be used for semantic search, retrieval, RAG, or any other text embedding usecase.<p>Main Features:<p>- Rust-native inference: Load any Model2Vec model from Hugging Face or your local path with StaticModel::from_pretrained(...).<p>- Tiny footprint: The crate itself is only ~1.7 mb, with embedding models between 7 and 30 mb.<p>Performance:<p>We benchmarked single-threaded on a CPU:<p>- Python: ~4650 embeddings/sec<p>- Rust: ~8000 embeddings/sec (~1.7× speedup)<p>First open-source project in Rust for us, so would be great to get some feedback!
How does it handle documents longer than the context length of the model? Sorry there are a ton of these regularly and they don't usually think about this.<p>Edit: it seems like it just splits in to sentences which is a weird thing to do given in English only 95%ish percent agreement is even possible on what a sentence is.
```
// Process in batches
for batch in sentences.chunks(batch_size) {
// Truncate each sentence to max_length * median_token_length chars
let truncated: Vec<&str> = batch
.iter()
.map(|text| {
if let Some(max_tok) = max_length {
Self::truncate_str(text, max_tok, self.median_token_length)
} else {
text.as_str()
}
})
.collect();
```
What is your preferred static text embedding model?<p>For someone looking to build a large embedding search, fast static embeddings seem like a good deal, but almost too good to be true. What quality tradeoff are you seeing with these models versus embedding models with attention mechanisms?
I love that you're doing this, Tananon.<p>We've been using Candle and Cudarc and having a fairly good time of it. We've built a real time drawing app on a custom LCM stack, and Rust makes it feel rock solid. Python is way too flimsy for something like this.<p>The more the Rust ML ecosystem grows, the better. It's a little bit fledgling right now, so every little bit counts.<p>If llama.cpp had instead been llama.rs, I feel like we would have had a runaway success.<p>We'll be checking this out! Kudos, and keep it up!