TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Searching a Codebase in English

56 pointsby dakshgupta10 months ago

5 comments

tonyoconnell10 months ago
Summary &quot;Semantic search on codebases works better if you first translate the code to natural language, before generating embedding vectors. It also works better if you chunk more “tightly” - on a per-function level rather than a per-file level. This is because noise negatively impacts retrieval quality in a huge way.&quot;<p>This makes a lot of sense. You should also embed information about how the code is related to other functions&#x2F;code and where it is located in the codebase. One approach is to add really wonderful comments to the code so that when humans and machines read it they are brought on a step by step journey of how the code fulfills a goal. I tell the LLM to explain step by step to junior developers and and to inspire seniour engineers with glimpse of the profound beauty of the code and its architecture.
评论 #41296300 未加载
byearthithatius10 months ago
I think I found a mistake. In the article you write: &quot;We then compare that against our database of vectors and find the one(s) that match the closest, i.e., have the lowest dot product and highest similarity.&quot;<p>We want to maximize the normalized dot product (or cosine similarity) to find semantically similar text chunks.
评论 #41302555 未加载
oshams10 months ago
Interesting direction. We also have a codebase chat (example here <a href="https:&#x2F;&#x2F;wiki.mutable.ai&#x2F;ollama&#x2F;ollama">https:&#x2F;&#x2F;wiki.mutable.ai&#x2F;ollama&#x2F;ollama</a>) that HN might find appealing. It uses a wiki as a living artifact owned by your team to power the chat, gives us increased context length and reasoning capabilities. We didn&#x27;t really like the results we got with embeddings. Have been pretty thrilled with the results on Q&amp;A, search, and even codegen (more on that soon).
deisteve10 months ago
is there a free version of greptile
评论 #41296294 未加载
Zambyte10 months ago
The page is unreadable on Firefox Focus
评论 #41295261 未加载
评论 #41296401 未加载
评论 #41295586 未加载