TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Ask HN: What's the best local sentence transformer?

5 pointsby helloplanetsover 1 year ago
Basically what&#x27;s in the title. There&#x27;s been such a crazy amount of development in local LLMs if you look at LLaMa, Mistral, etc.<p>It feels like using OpenAI&#x27;s Ada to get text embeddings is probably not at all the best option at this point. What would be the best &#x2F; most cost efficient way of getting text embeddings these days? Preferably open source.

2 comments

james-revisoaiover 1 year ago
Like caprock says, e5 are the best tradeoff for model size&#x2F;speed of embedding for the results you get, they will be great at semantic search etc in English.<p>Possibly consider cross-encoding for semantic search depending on your use case, but whenever cross-encoding is useful, generative embeddings like Ada are usually much better... There used to be embeddings useful for things like classifying sentences are entailing one another, whether a sentence was complete, but these are basically completely supplanted these days.<p>Do consider the all-mini type embeddings (default in sentence transformer) for speed or on-device use. They are half the size (and therefore less than half the computing for distance functions) so they are faster for large searches etc, which is useful if you run your own stuff with vector stores rather than a service.
caprockover 1 year ago
The answer is dependent on the task(s) to which the embeddings will be applied. For general search in industry, the e5 models are well regarded.<p>A good place to start is this eval system:<p><a href="https:&#x2F;&#x2F;huggingface.co&#x2F;spaces&#x2F;mteb&#x2F;leaderboard" rel="nofollow noreferrer">https:&#x2F;&#x2F;huggingface.co&#x2F;spaces&#x2F;mteb&#x2F;leaderboard</a>