Glean – System for collecting, deriving and querying facts about source code

213 pointsby donsover 3 years ago

24 comments

donsover 3 years ago

We use this to power things like find-references or jump-to-def, "symbol search" and autocomplete, or more complicated code queries and analysis (even across languages). Imagine rich LSPs without a local checkout, web-based code queries, or seeding fuzzers and static analyzers with entry points in code.Our focus has been on very large scale, multi-language code indexing, and then low latency (e.g. hundreds of micros) query times, to drive highly interactive developer workflows.

评论 #28367937 未加载

评论 #28367079 未加载

评论 #28369094 未加载

评论 #28366613 未加载

评论 #28367838 未加载

评论 #28366537 未加载

评论 #28366456 未加载

评论 #28366641 未加载

simonwover 3 years ago

Feature request: a live demo! I would love to try out the web interface described at <a href="https://glean.software/docs/trying" rel="nofollow">https://glean.software/docs/trying</a> without pulling down a 7GB Docker image first.

评论 #28368047 未加载

评论 #28366877 未加载

conductorover 3 years ago

To prevent any confusion, this is a different product than Mozilla's Glean [0][1].[0] <a href="https://docs.telemetry.mozilla.org/concepts/glean/glean.html" rel="nofollow">https://docs.telemetry.mozilla.org/concepts/glean/glean.html</a>[1] <a href="https://github.com/mozilla/glean/" rel="nofollow">https://github.com/mozilla/glean/</a>

评论 #28369932 未加载

评论 #28368854 未加载

soonnowover 3 years ago

I had a look at the site and it seems to be parsing source code in multiple languages and storing the parsed "syntax trees" into a database for querying.I would love to know what the usecase for this tool is aside from maybe being a source for presentations? (We have 5 million if statements).How can this be used to improve code quality or any other aspect of the code lifecycle?Or is it solving problems in a completely different problem area?

评论 #28366448 未加载

coderddover 3 years ago

Great to see this space moving! Any pointers on diff vs Kythe? Kythe has a mostly fixed schema, for one.One of the pain points using Kythe is wiring up the indexer to the build system. Would Glean indexers be easier to wire up for the common cases?Other is the index post-processing, which is not very scalable in the open source version (due to go-beam having rough Flunk support, for example).Third, how does it link up references across compilation units? Is it heuristic, or relies on unique keys from indexers matching? Or across languages?

评论 #28367220 未加载

balddenimheroover 3 years ago

Datalog-ish query languages sure is a fun area to be working in. Such DSLs exist for various domains and, like Semmle's codeQL or the more academic Soufflé, Glean focuses on the domain of programming languages.Glean seems to still be work in progress, e.g. no support for recursive queries yet, but I wonder where they're heading. I'll certainly keep an eye on the project but I wonder how exactly Glean aims to -- or maybe it already does -- improve upon the alternatives? From the talk linked in another comment I guess the distinctive feature may be the planned integration with IDEs. Correct me if I'm wrong. Other contenders provide great querying technology but there is indeed no strong focus on making such tech really convenient and integrated yet.

评论 #28368298 未加载

doddsiedoddsover 3 years ago

An excellent talk by Simon Marlow on Glean here: <a href="https://youtu.be/-OPN7QPsYKE" rel="nofollow">https://youtu.be/-OPN7QPsYKE</a>

评论 #28366573 未加载

booleandilemmaover 3 years ago

The very first page of the site should have examples of what you can do with it.

评论 #28375551 未加载

aabaker99over 3 years ago

Cool! I would love to play around with this.How do I write a schema and indexer for my favorite programming language that isn't currently (and won't be) supported with official releases?For Schemas, [1] says to modify (or base new ones off) these: <a href="https://github.com/facebookincubator/Glean/tree/main/glean/schema/source" rel="nofollow">https://github.com/facebookincubator/Glean/tree/main/glean/s...</a>For Indexers, it's a little less clear but it looks like I need to write my own type checker?[1] <a href="https://glean.software/docs/schema/workflow" rel="nofollow">https://glean.software/docs/schema/workflow</a>

z3t4over 3 years ago

This seems very interesting, would love to see more alternatives to TreeSitter and microsoft LSP - what makes those hard to use is lack of examples and tutorials. So I hope tbere will be examples and tutorials. For example: How do you find all variables in scope when the text cursor is on line x and col y in /file/path/file.js

Grimm1over 3 years ago

Very cool! How does this differ algorithmically from the trigram based search that everything uses from google code search from like 20 years ago?And continuing off of that theme in practical terms how does it stand up against zoekt?I’m curious because zoekt is kind of slow when it comes to ingesting large amounts of code like all of the publicly available code on GitHubThe few people using that commercially have basically had to spend a lot of time rewriting parts of it to make their goal of public codesearch for all attainable.I and a few people I know are pretty convinced that there are better and easier ways / technologies to make that happen.

ExtraEover 3 years ago

What, uh, is this? This is a space that I’m not familiar with and the linked site doesn’t make it super clear.

ctvoover 3 years ago

Great job with this. What's your roadmap for releasing some of the tooling for editor integration? Really, the question is should I build something or wait a few weeks?

metalliqazover 3 years ago

We have used SciTools Understand to do this on local source code. What is the use of putting this in the cloud? The website doesn't really explain that.

tclancyover 3 years ago

Getting a 401 when trying `docker pull ghcr.io/facebookincubator/glean/demo:latest` -- is that true for anyone else?

评论 #28369883 未加载

log101over 3 years ago

I didn't understand what it does

评论 #28366396 未加载

avinasshover 3 years ago

How does this actually work? Where can I learn more about the indexing and searching?

justinmchaseover 3 years ago

But whats an example of a fact? Looks cool but I have no idea what its for.

marcodiegoover 3 years ago

Is it a modern cscope?

rognjenover 3 years ago

Meta: should the ?open tracking part of the URL be removed?

_jezell_over 3 years ago

Is this basically Facebook's version of SourceGraph?

ing33kover 3 years ago

7GB docker image !

erlichover 3 years ago

I can't believe Facebook hasn't canned Flowtype yet and moved to TypeScript. They will have to do it eventually.

评论 #28367890 未加载

评论 #28367116 未加载

评论 #28367488 未加载

da39a3eeover 3 years ago

I was recently looking for a library that takes a few lines of source code as input, and predicts the programming language as output.That seems like a very tractable machine learning problem, yet all I could find was a single python library which looks nice, but doesn't have much adoption, and requires installing the entirety of tensorflow despite the fact that users just want a trained model and a predict() function.Why doesn't a popular library like this exist?

评论 #28368248 未加载