Tree-sitter: an incremental parsing system for programming tools

476 pointsby sbt567over 4 years ago

25 comments

srcreighover 4 years ago

Tree Sitter is amazing. The parsing is fast enough to run on every keystroke. The parse tree is extremely concise and readable. It resembles an AST more than a parse tree (ie no 11 levels of binary op precedence rules in the tree). The parse tree emits specific ERROR nodes, so you can get a semi-functional tree even with broken syntax.I can't wait for the tools to get built with this. Paredit for TypeScript. Syntax-tree based highlighting (vs regex highlighting). A command to "add an arg to current function" which works across languages. A command to add a CSS class to the nearest JSX node, or to walk up the tree at the className="| ..." position, adding a new className if it doesn't exist.There's a nicely documented Emacs package for this [1]. The documentation is at [2]. The parse trees work great. There's syntax highlighting support and tree-walking APIs. There's a bit of confusion about TSX vs typescript langs but it's fixable with some config change [3].[1]: <a href="https://github.com/ubolonton/emacs-tree-sitter" rel="nofollow">https://github.com/ubolonton/emacs-tree-sitter</a> [2]: <a href="https://ubolonton.github.io/emacs-tree-sitter/" rel="nofollow">https://ubolonton.github.io/emacs-tree-sitter/</a> [3]: <a href="https://github.com/ubolonton/emacs-tree-sitter/issues/66#issuecomment-778692779" rel="nofollow">https://github.com/ubolonton/emacs-tree-sitter/issues/66#iss...</a>

评论 #26227214 未加载

评论 #26227850 未加载

评论 #26231316 未加载

评论 #26227516 未加载

评论 #26228252 未加载

评论 #26227084 未加载

评论 #26230533 未加载

efritzover 4 years ago

I'm an engineer on the code intelligence team at Sourcegraph.We've been busy building out true precise code intelligence/navigation support, but we also have a mode for zero-configuration code navigation based on text search, universal-ctags, and hand-rolled regular expressions (which works surprisingly well!). Tree-sitter would definitely give better results than our current ctags-based approach. It's been catching our attention more and more lately, and we have plans to use it to upgrade our out-of-the-box, instant code navigation experience.It's not the exact right fit for our primary goals though, since it's designed around being extremely fast while editing and robust against errors. Sourcegraph is only used for navigating committed code, so we're leveraging formats like LSIF to generate complete semantic graphs of codebases and their entire dependency tree. That'll enable a lot of features that are out of reach for tree-sitter, but is a lot harder to get working out of the box and it's a much bigger technical investment.It's very interesting to see the topological space that houses these solutions fill out. Every tool has its own set of unique trade-offs and fall somewhere on these spectrums:- fast vs slow- precise vs imprecise- zero-configuration vs configuration requiredWe've visited a few islands in this space but still very curious to see what other islands can be discovered. We're especially excited about tools and formats like tree-sitter and LSIF around which a large and supportive community can grow so that all the products we love and rely on as developers can all make forward progress.

评论 #26234015 未加载

评论 #26232953 未加载

评论 #26232896 未加载

ritter2aover 4 years ago

I tried to use this to ease the front end work load of students in a compiler project (building a C compiler) for a University course, so that the project could be focused on the more interesting middle and back end parts of the compiler. However, reported bugs in the C grammar that saw no activity at all [1] made this impossible. From this small sample of experiences, I was left with the impression that Tree Sitter is great for things like syntax highlighting, where wrong results are annoying but not dramatic, but not so suitable for tools that need a really correct syntax tree.--- [1] <a href="https://github.com/tree-sitter/tree-sitter-c/issues/51" rel="nofollow">https://github.com/tree-sitter/tree-sitter-c/issues/51</a>

评论 #26238374 未加载

maxbrunsfeldover 4 years ago

Hey, Tree-sitter author here. Thanks for posting! Let me know if you have questions about the project.

评论 #26230392 未加载

评论 #26227838 未加载

评论 #26231525 未加载

评论 #26227479 未加载

评论 #26228485 未加载

评论 #26227312 未加载

评论 #26382708 未加载

评论 #26231047 未加载

评论 #26230599 未加载

pcr910303over 4 years ago

To me, the most impressive use of tree-sitter was an iOS text editor that uses it to parse huge JSON files / mixed language files and highlight them in a very robust way. [0][1] I’m hoping tree-sitter becomes more common like LSP and Emacs can get exact highlighting and other tools with it…[0]: <a href="https://twitter.com/simonbs/status/1352697855845273600" rel="nofollow">https://twitter.com/simonbs/status/1352697855845273600</a>[1]: <a href="https://twitter.com/simonbs/status/1362492842141171720?s=21" rel="nofollow">https://twitter.com/simonbs/status/1362492842141171720?s=21</a>

评论 #26226292 未加载

评论 #26226242 未加载

评论 #26226267 未加载

alissasoboover 4 years ago

You can watch a good Strangeloop presentation on Tree Sitter. <a href="https://www.youtube.com/watch?v=Jes3bD6P0To" rel="nofollow">https://www.youtube.com/watch?v=Jes3bD6P0To</a>

chrisseatonover 4 years ago

Tree-sitter is unfathomable to me. This is the grammar for Ruby:<a href="https://github.com/tree-sitter/tree-sitter-ruby/blob/master/grammar.js" rel="nofollow">https://github.com/tree-sitter/tree-sitter-ruby/blob/master/...</a>I find it absolutely amazing that a grammar for something as complicated as Ruby can be so concise. Less than a thousand lines. The corresponding Bison grammar is 13k lines. And I think the tree-sitter one is scannerless so also includes the lexer?! How do they do it?

评论 #26226559 未加载

评论 #26227244 未加载

评论 #26233763 未加载

评论 #26226729 未加载

dangover 4 years ago

If curious, past threads:Tree-sitter: new incremental parsing system for programming tools (2018) [video] - <a href="https://news.ycombinator.com/item?id=21675113" rel="nofollow">https://news.ycombinator.com/item?id=21675113</a> - Dec 2019 (28 comments)Tree-sitter – a new parsing system for programming tools [video] - <a href="https://news.ycombinator.com/item?id=18213022" rel="nofollow">https://news.ycombinator.com/item?id=18213022</a> - Oct 2018 (25 comments)Others?

评论 #26231186 未加载

Grimm1over 4 years ago

I recently used this to put together a unified PL classification model. It's nice because any language treesitter grows to support we'll support pretty effortlessly and treesitter captures more than enough nuance per language to derive high quality classifications.It's fair to say we can classify a snippet of code based on either single or multiple AST paths produced by treesitter. Right now only doing the programming language but extending it to function classification or description etc isn't out of the question we just don't need it right now.

Anniliover 4 years ago

I'm curious to see if Tree-sitter can be used to provide fast and rich code navigation. I was able to implement simple goto definition/references [1], not sure if it can be used for more advanced navigation features in a language-agnostic way.If you're interested, GitHub is already using it [2] for that purpose and Sourcegraph is experimenting it [3][1] <a href="https://github.com/alidn/lsif-os" rel="nofollow">https://github.com/alidn/lsif-os</a> [2] <a href="https://github.com/github/semantic" rel="nofollow">https://github.com/github/semantic</a> [3] <a href="https://github.com/sourcegraph/sourcegraph/issues/17378" rel="nofollow">https://github.com/sourcegraph/sourcegraph/issues/17378</a>

评论 #26231104 未加载

ducktectiveover 4 years ago

Is this the same thing neovim uses for syntax highlighting?Is there a chance for it getting integrated to vim? Last I checked vim used a regex method which was slow and faulty.

评论 #26225648 未加载

drewdennisonover 4 years ago

We've been using tree-sitter for Semgrep and it's nothing short of incredible. Amazing work by Max and team.

评论 #26232483 未加载

xvilkaover 4 years ago

So far it's the amazing tool and we are happy to use it in our projects. The only two complaints I have is the dependency on JavaScript[1] and missing Rust runtime option[2].[1] <a href="https://github.com/tree-sitter/tree-sitter/issues/465" rel="nofollow">https://github.com/tree-sitter/tree-sitter/issues/465</a>[2] <a href="https://github.com/tree-sitter/tree-sitter/issues/465" rel="nofollow">https://github.com/tree-sitter/tree-sitter/issues/465</a>

brundolfover 4 years ago

Here's what it looks like to call it from Rust: <a href="https://github.com/tree-sitter/tree-sitter/tree/master/lib/binding_rust" rel="nofollow">https://github.com/tree-sitter/tree-sitter/tree/master/lib/b...</a>Seems like this would make it much easier to bootstrap a performant language-server. Very cool; maybe that will be my next project.

评论 #26227353 未加载

himujjalover 4 years ago

Wrote tree-sitter-svelte. Was a good experience. I am also writing a programming language of my own similar to TypeScript and I am using tree-sitter for the same. Its a delight to work with it. Removes a lot of the worries.

评论 #26238414 未加载

guerrillaover 4 years ago

Is the use case for this mainly IDEs or is it intended to replace traditional lexer and parser generators too?

评论 #26226532 未加载

评论 #26227275 未加载

wiradikusumaover 4 years ago

While we're in this discussion: Say I want to implement "SQL" for my app (if you've used Jira, I want to make my own JQL). Is this the tool for that? I'm looking for something much simpler than ANTLR.

评论 #26238399 未加载

nojvekover 4 years ago

This is really cool, I 100% agree that as programmers we’re editing and thinking in terms of ASTs. It just happens that text is a high density way to represent those ASTs.I’m going to play with this and see if I can make a generic language server for vscode that works across languages. Unless someone has already done that.What would be really cool is that tree-sitter (or a sister package) that provides incremental formatting primitives across languages.The closest language agnostic formatter that comes to mind is prettier.js with its extensions.incremental parser —> language server -> formatter across languages would be super rad.

ahelwerover 4 years ago

I half-wrote a tree-sitter grammar for a niche DSL (the PRISM probabilistic model checking language). It was a very nice experience. It's part of another half-written side project to create a language server for PRISM; I still haven't gotten around to making the whole end-to-end pipeline work.With its syntax tree query frontend I wonder whether tree-sitter would make a good interpreter frontend for some niche languages, or you need something more powerful.

sidntrivediover 4 years ago

Does GitHub currently use tree-sitter for syntax highlighting? If yes, are the libraries open-source? Thanks :)

评论 #26238277 未加载

The_rationalistover 4 years ago

Here is the "tracking issue" for JetBrains IDEs <a href="https://youtrack.jetbrains.com/issue/KT-45087" rel="nofollow">https://youtrack.jetbrains.com/issue/KT-45087</a> Upvote the issue if you wanna bump the priority

erezshabout 4 years ago

I tried looking through the docs, and couldn't find any mention of which algorithm you are using. It seems like some LR, grammar, but which kind? LALR? GLR? It seems like a very important bit of information, that's suspiciously missing.

评论 #26259587 未加载

评论 #26260567 未加载

stelfover 4 years ago

Where is the SQL parser? Any specific reason why is it missing (not even started)?

评论 #26238146 未加载

The_rationalistover 4 years ago

Are there any benchmarck available versus TextMate regexes?

ameliusover 4 years ago

Next steps: incrementally resolve symbols and type-check?

评论 #26227476 未加载

25 comments

srcreighover 4 years ago

评论 #26227214 未加载

评论 #26227850 未加载

评论 #26231316 未加载

评论 #26227516 未加载

评论 #26228252 未加载

评论 #26227084 未加载

评论 #26230533 未加载

efritzover 4 years ago

评论 #26234015 未加载

评论 #26232953 未加载

评论 #26232896 未加载

ritter2aover 4 years ago

评论 #26238374 未加载

maxbrunsfeldover 4 years ago

Hey, Tree-sitter author here. Thanks for posting! Let me know if you have questions about the project.

评论 #26230392 未加载

评论 #26227838 未加载

评论 #26231525 未加载

评论 #26227479 未加载

评论 #26228485 未加载

评论 #26227312 未加载

评论 #26382708 未加载

评论 #26231047 未加载

评论 #26230599 未加载

pcr910303over 4 years ago

评论 #26226292 未加载

评论 #26226242 未加载

评论 #26226267 未加载

alissasoboover 4 years ago

You can watch a good Strangeloop presentation on Tree Sitter. <a href="https://www.youtube.com/watch?v=Jes3bD6P0To" rel="nofollow">https://www.youtube.com/watch?v=Jes3bD6P0To</a>

chrisseatonover 4 years ago

评论 #26226559 未加载

评论 #26227244 未加载

评论 #26233763 未加载

评论 #26226729 未加载

dangover 4 years ago

评论 #26231186 未加载

Grimm1over 4 years ago

Anniliover 4 years ago

评论 #26231104 未加载

ducktectiveover 4 years ago

Is this the same thing neovim uses for syntax highlighting?Is there a chance for it getting integrated to vim? Last I checked vim used a regex method which was slow and faulty.

评论 #26225648 未加载

drewdennisonover 4 years ago

We've been using tree-sitter for Semgrep and it's nothing short of incredible. Amazing work by Max and team.

评论 #26232483 未加载

xvilkaover 4 years ago

brundolfover 4 years ago

评论 #26227353 未加载

himujjalover 4 years ago

评论 #26238414 未加载

guerrillaover 4 years ago

Is the use case for this mainly IDEs or is it intended to replace traditional lexer and parser generators too?

评论 #26226532 未加载

评论 #26227275 未加载

wiradikusumaover 4 years ago

评论 #26238399 未加载

nojvekover 4 years ago

ahelwerover 4 years ago

sidntrivediover 4 years ago

Does GitHub currently use tree-sitter for syntax highlighting? If yes, are the libraries open-source? Thanks :)

评论 #26238277 未加载

The_rationalistover 4 years ago

erezshabout 4 years ago

评论 #26259587 未加载

评论 #26260567 未加载

stelfover 4 years ago

Where is the SQL parser? Any specific reason why is it missing (not even started)?

评论 #26238146 未加载

The_rationalistover 4 years ago

Are there any benchmarck available versus TextMate regexes?

ameliusover 4 years ago

Next steps: incrementally resolve symbols and type-check?

评论 #26227476 未加载