TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Glean – System for collecting, deriving and querying facts about source code

213 pointsby donsover 3 years ago

24 comments

donsover 3 years ago
We use this to power things like find-references or jump-to-def, &quot;symbol search&quot; and autocomplete, or more complicated code queries and analysis (even across languages). Imagine rich LSPs without a local checkout, web-based code queries, or seeding fuzzers and static analyzers with entry points in code.<p>Our focus has been on very large scale, multi-language code indexing, and then low latency (e.g. hundreds of micros) query times, to drive highly interactive developer workflows.
评论 #28367937 未加载
评论 #28367079 未加载
评论 #28369094 未加载
评论 #28366613 未加载
评论 #28367838 未加载
评论 #28366537 未加载
评论 #28366456 未加载
评论 #28366641 未加载
simonwover 3 years ago
Feature request: a live demo! I would love to try out the web interface described at <a href="https:&#x2F;&#x2F;glean.software&#x2F;docs&#x2F;trying" rel="nofollow">https:&#x2F;&#x2F;glean.software&#x2F;docs&#x2F;trying</a> without pulling down a 7GB Docker image first.
评论 #28368047 未加载
评论 #28366877 未加载
conductorover 3 years ago
To prevent any confusion, this is a different product than Mozilla&#x27;s Glean [0][1].<p>[0] <a href="https:&#x2F;&#x2F;docs.telemetry.mozilla.org&#x2F;concepts&#x2F;glean&#x2F;glean.html" rel="nofollow">https:&#x2F;&#x2F;docs.telemetry.mozilla.org&#x2F;concepts&#x2F;glean&#x2F;glean.html</a><p>[1] <a href="https:&#x2F;&#x2F;github.com&#x2F;mozilla&#x2F;glean&#x2F;" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;mozilla&#x2F;glean&#x2F;</a>
评论 #28369932 未加载
评论 #28368854 未加载
soonnowover 3 years ago
I had a look at the site and it seems to be parsing source code in multiple languages and storing the parsed &quot;syntax trees&quot; into a database for querying.<p>I would love to know what the usecase for this tool is aside from maybe being a source for presentations? (We have 5 million if statements).<p>How can this be used to improve code quality or any other aspect of the code lifecycle?<p>Or is it solving problems in a completely different problem area?
评论 #28366448 未加载
coderddover 3 years ago
Great to see this space moving! Any pointers on diff vs Kythe? Kythe has a mostly fixed schema, for one.<p>One of the pain points using Kythe is wiring up the indexer to the build system. Would Glean indexers be easier to wire up for the common cases?<p>Other is the index post-processing, which is not very scalable in the open source version (due to go-beam having rough Flunk support, for example).<p>Third, how does it link up references across compilation units? Is it heuristic, or relies on unique keys from indexers matching? Or across languages?
评论 #28367220 未加载
balddenimheroover 3 years ago
Datalog-ish query languages sure is a fun area to be working in. Such DSLs exist for various domains and, like Semmle&#x27;s codeQL or the more academic Soufflé, Glean focuses on the domain of programming languages.<p>Glean seems to still be work in progress, e.g. no support for recursive queries yet, but I wonder where they&#x27;re heading. I&#x27;ll certainly keep an eye on the project but I wonder how exactly Glean aims to -- or maybe it already does -- improve upon the alternatives? From the talk linked in another comment I guess the distinctive feature may be the planned integration with IDEs. Correct me if I&#x27;m wrong. Other contenders provide great querying technology but there is indeed no strong focus on making such tech really convenient and integrated yet.
评论 #28368298 未加载
doddsiedoddsover 3 years ago
An excellent talk by Simon Marlow on Glean here: <a href="https:&#x2F;&#x2F;youtu.be&#x2F;-OPN7QPsYKE" rel="nofollow">https:&#x2F;&#x2F;youtu.be&#x2F;-OPN7QPsYKE</a>
评论 #28366573 未加载
booleandilemmaover 3 years ago
The very first page of the site should have examples of what you can do with it.
评论 #28375551 未加载
aabaker99over 3 years ago
Cool! I would love to play around with this.<p>How do I write a schema and indexer for my favorite programming language that isn&#x27;t currently (and won&#x27;t be) supported with official releases?<p>For Schemas, [1] says to modify (or base new ones off) these: <a href="https:&#x2F;&#x2F;github.com&#x2F;facebookincubator&#x2F;Glean&#x2F;tree&#x2F;main&#x2F;glean&#x2F;schema&#x2F;source" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;facebookincubator&#x2F;Glean&#x2F;tree&#x2F;main&#x2F;glean&#x2F;s...</a><p>For Indexers, it&#x27;s a little less clear but it looks like I need to write my own type checker?<p>[1] <a href="https:&#x2F;&#x2F;glean.software&#x2F;docs&#x2F;schema&#x2F;workflow" rel="nofollow">https:&#x2F;&#x2F;glean.software&#x2F;docs&#x2F;schema&#x2F;workflow</a>
z3t4over 3 years ago
This seems very interesting, would love to see more alternatives to TreeSitter and microsoft LSP - what makes those hard to use is lack of examples and tutorials. So I hope tbere will be examples and tutorials. For example: How do you find all variables in scope when the text cursor is on line x and col y in &#x2F;file&#x2F;path&#x2F;file.js
Grimm1over 3 years ago
Very cool! How does this differ algorithmically from the trigram based search that everything uses from google code search from like 20 years ago?<p>And continuing off of that theme in practical terms how does it stand up against zoekt?<p>I’m curious because zoekt is kind of slow when it comes to ingesting large amounts of code like all of the publicly available code on GitHub<p>The few people using that commercially have basically had to spend a lot of time rewriting parts of it to make their goal of public codesearch for all attainable.<p>I and a few people I know are pretty convinced that there are better and easier ways &#x2F; technologies to make that happen.
ExtraEover 3 years ago
What, uh, is this? This is a space that I’m not familiar with and the linked site doesn’t make it super clear.
ctvoover 3 years ago
Great job with this. What&#x27;s your roadmap for releasing some of the tooling for editor integration? Really, the question is should I build something or wait a few weeks?
metalliqazover 3 years ago
We have used SciTools Understand to do this on local source code. What is the use of putting this in the cloud? The website doesn&#x27;t really explain that.
tclancyover 3 years ago
Getting a 401 when trying `docker pull ghcr.io&#x2F;facebookincubator&#x2F;glean&#x2F;demo:latest` -- is that true for anyone else?
评论 #28369883 未加载
log101over 3 years ago
I didn&#x27;t understand what it does
评论 #28366396 未加载
avinasshover 3 years ago
How does this actually work? Where can I learn more about the indexing and searching?
justinmchaseover 3 years ago
But whats an example of a fact? Looks cool but I have no idea what its for.
marcodiegoover 3 years ago
Is it a modern cscope?
rognjenover 3 years ago
Meta: should the ?open tracking part of the URL be removed?
_jezell_over 3 years ago
Is this basically Facebook&#x27;s version of SourceGraph?
ing33kover 3 years ago
7GB docker image !
erlichover 3 years ago
I can&#x27;t believe Facebook hasn&#x27;t canned Flowtype yet and moved to TypeScript. They will have to do it eventually.
评论 #28367890 未加载
评论 #28367116 未加载
评论 #28367488 未加载
da39a3eeover 3 years ago
I was recently looking for a library that takes a few lines of source code as input, and predicts the programming language as output.<p>That seems like a very tractable machine learning problem, yet all I could find was a single python library which looks nice, but doesn&#x27;t have much adoption, and requires installing the entirety of tensorflow despite the fact that users just want a trained model and a predict() function.<p>Why doesn&#x27;t a popular library like this exist?
评论 #28368248 未加载