TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Static analysis at GitHub

178 pointsby trptcolinabout 3 years ago

8 comments

evmarabout 3 years ago
I contributed to a similar system used within Google (partially open source at kythe.io), that took the very different approach of integrating with the language-native toolchain for each language.<p>As this article describes, doing this requires per-language integrations and also effectively being able to &quot;run the build&quot; for any given code (because e.g. the C++ header search path can vary on a per-source-file basis), which is untenable for a codebase as large and varied as GitHub&#x27;s. However, if you can make it work, you get the benefit of having the compiler&#x27;s understanding of the semantics of the code, which is especially finicky in complex languages like C++ or, say, Rust.<p>For example, if you look at this[1] method call it refers to a symbol generated by a chain of macros, but the browser is still able to point you at the definition of it.<p>It&#x27;s an interesting tradeoff to make: the GitHub approach likely doesn&#x27;t handle corner cases like the above but it makes up for it in broad applicability and performance. I recall an IDE developer once telling me they made a similar tradeoff in code completion, in that it&#x27;s better DX to pop up completions quickly even if they&#x27;re &quot;only&quot; 99% correct.<p>(To be clear, I absolutely think the approach taken in the article was the right one for the domain they&#x27;re working in, I was just contrasting it against my experience in a similar problem where we took a very different approach.)<p>[1] <a href="https:&#x2F;&#x2F;source.chromium.org&#x2F;chromium&#x2F;chromium&#x2F;src&#x2F;+&#x2F;main:v8&#x2F;src&#x2F;api&#x2F;api.cc;l=1316;drc=ede0a4abeeb7c8c3262e97c02c62c1d93f732f09" rel="nofollow">https:&#x2F;&#x2F;source.chromium.org&#x2F;chromium&#x2F;chromium&#x2F;src&#x2F;+&#x2F;main:v8&#x2F;...</a>
评论 #30858007 未加载
beyangabout 3 years ago
Sourcegraph CTO here. It&#x27;s interesting to read about GitHub&#x27;s approach and how it contrasts with the approach we&#x27;ve taken at Sourcegraph. One of the key tradeoffs the article highlights is GitHub&#x27;s decision to take the &quot;shallow-but-wide&quot; approach to code navigation, which has enabled them to provide some level of code navigation for most open-source repositories on GitHub, but at the expense of precision&#x2F;accuracy (i.e., the system can&#x27;t necessarily differentiate between different symbols with the same name).<p>Sourcegraph decided early on to take the opposite approach, favoring precision and accuracy over supporting every public codebase. Part of the reason why is that we aren&#x27;t a code host that hosts millions of open-source repositories, so we didn&#x27;t feel the need to support all of those at once. Another big reason is we heard from our users and customers that code navigation accuracy was critical for exploring their private code and enabling them to stay in flow (inaccurate results would break the train of thought because you&#x27;d have to actively think about how to navigate to the referenced symbol). We actually built out a language-agnostic search-based code navigation, but increasingly user feedback has driven us to adopt a more precise model, based at first on our own protocol (<a href="https:&#x2F;&#x2F;srclib.org" rel="nofollow">https:&#x2F;&#x2F;srclib.org</a>) and also the LSIF protocol open-sourced by Microsoft that now enables code navigation for many popular editor extensions.<p>This is not to say that GitHub&#x27;s approach is wrong, but more to say that it&#x27;s interesting how different goals and constraints have led to systems that are quite different despite tackling the same general problem. GitHub aiming to provide some level of navigation to every repository on GitHub, and Sourcegraph aiming to provide best-in-class navigation for private codebases and dependencies.<p>(Btw, hats off to the GitHub team for open-sourcing tree-sitter, a great library which we&#x27;ve incorporated into parts of our stack. We actually hosted the creator of tree-sitter, Max Brunsfeld, on our podcast awhile back and it was a really fun and insightful conversation if people are interested in hearing some of the backstory of tree-sitter: <a href="https:&#x2F;&#x2F;about.sourcegraph.com&#x2F;podcast&#x2F;max-brunsfeld" rel="nofollow">https:&#x2F;&#x2F;about.sourcegraph.com&#x2F;podcast&#x2F;max-brunsfeld</a>.)
评论 #30862687 未加载
TYMorningCoffeeabout 3 years ago
GitHub released a great java parser that I think is related to this work <a href="https:&#x2F;&#x2F;github.com&#x2F;javaparser&#x2F;javaparser" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;javaparser&#x2F;javaparser</a><p>I&#x27;m also using that parser for a side project where developers can cross link their source code and host them statically: <a href="https:&#x2F;&#x2F;github.com&#x2F;josephmate&#x2F;OdinCodeBrowser#readme" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;josephmate&#x2F;OdinCodeBrowser#readme</a>
gandalfgeekabout 3 years ago
Short (~5 min) video summary, if that&#x27;s your thing: <a href="https:&#x2F;&#x2F;youtu.be&#x2F;YoOFJApmPKc" rel="nofollow">https:&#x2F;&#x2F;youtu.be&#x2F;YoOFJApmPKc</a>
marceloabsousaabout 3 years ago
This article more about parsing at scale than static analysis at scale.
评论 #30858081 未加载
ushakovabout 3 years ago
tree-sitter is a phenomenal project<p>i’m doing a git-related project myself and use it to generate symbols for source code<p>if you’re into it too, i recommend also checking out LSIF: <a href="https:&#x2F;&#x2F;github.com&#x2F;Microsoft&#x2F;language-server-protocol&#x2F;blob&#x2F;main&#x2F;indexFormat&#x2F;specification.md" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;Microsoft&#x2F;language-server-protocol&#x2F;blob&#x2F;m...</a>
miohtamaabout 3 years ago
This is Big Code
mistrial9about 3 years ago
figure 2 is repeated for some reason?
评论 #30856316 未加载