TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

You probably shouldn't use OpenAI's embeddings

71 pointsby diegoabout 2 years ago

6 comments

nicoabout 2 years ago
Is someone doing embeddings&lt;&gt;embeddings mapping?<p>For example, mapping embeddings of Llama to GPT-3?<p>That way you can see how similar the models “understand the world”.
评论 #35379112 未加载
评论 #35379420 未加载
评论 #35389967 未加载
mustacheemperorabout 2 years ago
Could anyone point me towards a relatively beginner-friendly guide to do something like<p>&gt;download all my tweets (about 20k) and build a semantic searcher on top ?<p>How can utilize 3rd party embeddings with OpenAI&#x27;s LLM API? Am I correct to understand from this article that this is possible?
评论 #35378831 未加载
评论 #35378854 未加载
评论 #35379466 未加载
评论 #35379686 未加载
评论 #35380386 未加载
fzliuabout 2 years ago
I&#x27;ve done some quick-and-dirty testing with OpenAI&#x27;s embedding API + Zilliz Cloud. The 1st gen embeddings leave something to be desired (<a href="https:&#x2F;&#x2F;medium.com&#x2F;@nils_reimers&#x2F;openai-gpt-3-text-embeddings-really-a-new-state-of-the-art-in-dense-text-embeddings-6571fe3ec9d9" rel="nofollow">https:&#x2F;&#x2F;medium.com&#x2F;@nils_reimers&#x2F;openai-gpt-3-text-embedding...</a>), but the 2nd gen embeddings are actually fairly performant relative to many open source models with MLM loss.<p>I&#x27;ll have to dig out the notebook that I created for this, but I&#x27;ll try to post it here once I find it.
评论 #35381867 未加载
celestialcheeseabout 2 years ago
Very interested in this - I&#x27;ve been using embeddings &#x2F; semantic search doing information retrieval from PDFs, using ada-002, and have been impressed by the results in testing.<p>The reasons the article listed, namely a) lock-in and b) cost, have given me pause with embedding our whole corpus of data. I&#x27;d much rather use an open model but don&#x27;t have much experience in evaluating these embedding models and search performance - still very new to me.<p>Like what you did with ada-002 vs Instruct XL, has there been any papers or prior work done evaluating the different embedding models?
评论 #35379023 未加载
nomadiccoderabout 2 years ago
The heat map of availability time, 98.58 (Jan), 99.07 (Feb), and 99.71 (Mar) trends upwards.
评论 #35386781 未加载
breckenedgeabout 2 years ago
It’s fine to use their embeddings for a proof of concept, but since you don’t own it, you probably shouldn’t rely on it because it could go away at any time.
评论 #35380778 未加载