TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Ask HN: Why aren't there Open source embedding models with context length > 512?

3 pointsby rawshover 1 year ago

2 comments

james-revisoaiover 1 year ago
There are as mentioned, but additionally, for many models, you can split content up into several vectors (say one for each sentence or paragraph depending on how the model is trained) and pool the vectors together to get a representation that will span the content overall well.<p>Since the models trained to work on single sentences (like Mini-V2, the SBERT default) work worse at length, pooling representations of sentences is typically more useful.<p>For deliberately longer representations, generative model embeddings or document embeddings are the right answer sometimes.
caprockover 1 year ago
There are some:<p><a href="https:&#x2F;&#x2F;huggingface.co&#x2F;spaces&#x2F;mteb&#x2F;leaderboard" rel="nofollow noreferrer">https:&#x2F;&#x2F;huggingface.co&#x2F;spaces&#x2F;mteb&#x2F;leaderboard</a>