TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Voyage-code-3

111 点作者 fzliu4 个月前

8 条评论

underlines4 个月前
For 2 years I built RAG apps for clients. They are not Code-Assistants but everytime I see code assistants solely relying on embeddings to find the right pieces as context, it feels wrong to me:<p>Code is very well structured. Based on a starting point (current cursor, current file, or results of an embedding search result) you would probably fair better to traverse the code tree up and down building a tree or using Abstract Syntax Trees (ASTs) as described in this blog post [4]. It&#x27;s like a tree search in order to get relevant code pieces for a certain task, and it imitates what human coders do. It would integrate well into an agent loop to search relevant code.<p>Aren&#x27;t there any open source code assistants and plugins that do this? All I see are embedding searches for the big projects such as cursor, cline or continue.<p>All I ever found were a few research efforts such as RepoGraph [1], CodeGraph [2] and one one codebase open sourced by Deutsche Telekom called advanced-coding-assistant [3]<p>1 <a href="https:&#x2F;&#x2F;github.com&#x2F;ozyyshr&#x2F;RepoGraph">https:&#x2F;&#x2F;github.com&#x2F;ozyyshr&#x2F;RepoGraph</a><p>2 <a href="https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2408.13863" rel="nofollow">https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2408.13863</a><p>3 <a href="https:&#x2F;&#x2F;github.com&#x2F;telekom&#x2F;advanced-coding-assistant-backend">https:&#x2F;&#x2F;github.com&#x2F;telekom&#x2F;advanced-coding-assistant-backend</a><p>4 <a href="https:&#x2F;&#x2F;cyrilsadovsky.substack.com&#x2F;p&#x2F;advanced-coding-chatbot-knowledge-graphs-and-asts-0c18c90373be" rel="nofollow">https:&#x2F;&#x2F;cyrilsadovsky.substack.com&#x2F;p&#x2F;advanced-coding-chatbot...</a>
评论 #42701461 未加载
评论 #42695841 未加载
评论 #42694588 未加载
serjester4 个月前
I don’t understand why we have so many companies working on vector databases when the real value is in fine tuned embeddings. Having recently evaluated vector DB’s to handle 100M+ nodes, everything will run you $1 - 8k per month. Yet using embeddings fine tuned on a specific use case, will lower that 10X since you can get away with way coarser vectors.<p>This makes so much intuitive sense - voyage please release an insurance focused model.
评论 #42693470 未加载
评论 #42694986 未加载
评论 #42695772 未加载
评论 #42705059 未加载
uncomplexity_4 个月前
Currently using voyage for my project for the following reasons:<p>1. They are recommended by Anthropic on their docs<p>2. They&#x27;re focused on embeddings as a service, I somehow prefer this other than spread-thin large orgs like OpenAI, GoogleAI, etc.<p>3. They got good standing on the huggingface mteb leaderboard: <a href="https:&#x2F;&#x2F;huggingface.co&#x2F;spaces&#x2F;mteb&#x2F;leaderboard" rel="nofollow">https:&#x2F;&#x2F;huggingface.co&#x2F;spaces&#x2F;mteb&#x2F;leaderboard</a>
qeternity4 个月前
Their 1024 dimension outperforms their 2048 dimension? What am I missing here.
评论 #42692659 未加载
maCDzP4 个月前
I used the embeddings from Voyage for a project in Swedish this summer.<p>The neat thing about Voyage was besides the speed of the service.<p>I think I had 250 million tokens and Voyage was the fastest. It took a couple of days on and off. I believe my napkin calculation showed that OpenAI would have taken months.
potatoman224 个月前
What&#x27;s their ranking on SWEBench?
评论 #42692599 未加载
评论 #42692641 未加载
zelcon4 个月前
Release the weights or buy an ad. This doesn’t deserve front page.
doctorpangloss4 个月前
OpenAI is a big famous company. Why should I trust you guys with code, when I’m too lazy to clean out sensitive stuff?
评论 #42693131 未加载