科技回声

8 条评论

For 2 years I built RAG apps for clients. They are not Code-Assistants but everytime I see code assistants solely relying on embeddings to find the right pieces as context, it feels wrong to me:Code is very well structured. Based on a starting point (current cursor, current file, or results of an embedding search result) you would probably fair better to traverse the code tree up and down building a tree or using Abstract Syntax Trees (ASTs) as described in this blog post [4]. It's like a tree search in order to get relevant code pieces for a certain task, and it imitates what human coders do. It would integrate well into an agent loop to search relevant code.Aren't there any open source code assistants and plugins that do this? All I see are embedding searches for the big projects such as cursor, cline or continue.All I ever found were a few research efforts such as RepoGraph [1], CodeGraph [2] and one one codebase open sourced by Deutsche Telekom called advanced-coding-assistant [3]1 <a href="https://github.com/ozyyshr/RepoGraph">https://github.com/ozyyshr/RepoGraph</a>2 <a href="https://arxiv.org/abs/2408.13863" rel="nofollow">https://arxiv.org/abs/2408.13863</a>3 <a href="https://github.com/telekom/advanced-coding-assistant-backend">https://github.com/telekom/advanced-coding-assistant-backend</a>4 <a href="https://cyrilsadovsky.substack.com/p/advanced-coding-chatbot-knowledge-graphs-and-asts-0c18c90373be" rel="nofollow">https://cyrilsadovsky.substack.com/p/advanced-coding-chatbot...</a>

评论 #42701461 未加载

评论 #42695841 未加载

评论 #42694588 未加载

serjester4 个月前

I don’t understand why we have so many companies working on vector databases when the real value is in fine tuned embeddings. Having recently evaluated vector DB’s to handle 100M+ nodes, everything will run you $1 - 8k per month. Yet using embeddings fine tuned on a specific use case, will lower that 10X since you can get away with way coarser vectors.This makes so much intuitive sense - voyage please release an insurance focused model.

评论 #42693470 未加载

评论 #42694986 未加载

评论 #42695772 未加载

评论 #42705059 未加载

uncomplexity_4 个月前

Currently using voyage for my project for the following reasons:1. They are recommended by Anthropic on their docs2. They're focused on embeddings as a service, I somehow prefer this other than spread-thin large orgs like OpenAI, GoogleAI, etc.3. They got good standing on the huggingface mteb leaderboard: <a href="https://huggingface.co/spaces/mteb/leaderboard" rel="nofollow">https://huggingface.co/spaces/mteb/leaderboard</a>

qeternity4 个月前

Their 1024 dimension outperforms their 2048 dimension? What am I missing here.

评论 #42692659 未加载

maCDzP4 个月前

I used the embeddings from Voyage for a project in Swedish this summer.The neat thing about Voyage was besides the speed of the service.I think I had 250 million tokens and Voyage was the fastest. It took a couple of days on and off. I believe my napkin calculation showed that OpenAI would have taken months.

potatoman224 个月前

What's their ranking on SWEBench?

评论 #42692599 未加载

评论 #42692641 未加载

zelcon4 个月前

Release the weights or buy an ad. This doesn’t deserve front page.

doctorpangloss4 个月前

OpenAI is a big famous company. Why should I trust you guys with code, when I’m too lazy to clean out sensitive stuff?

评论 #42693131 未加载

8 条评论

underlines4 个月前

评论 #42701461 未加载

评论 #42695841 未加载

评论 #42694588 未加载

serjester4 个月前

评论 #42693470 未加载

评论 #42694986 未加载

评论 #42695772 未加载

评论 #42705059 未加载

uncomplexity_4 个月前

qeternity4 个月前

Their 1024 dimension outperforms their 2048 dimension? What am I missing here.

评论 #42692659 未加载

maCDzP4 个月前

potatoman224 个月前

What's their ranking on SWEBench?

评论 #42692599 未加载

评论 #42692641 未加载

zelcon4 个月前

Release the weights or buy an ad. This doesn’t deserve front page.

doctorpangloss4 个月前

OpenAI is a big famous company. Why should I trust you guys with code, when I’m too lazy to clean out sensitive stuff?

评论 #42693131 未加载

Voyage-code-3

8 条评论

Voyage-code-3

8 条评论