Tree Sitter is amazing. The parsing is fast enough to run on every keystroke. The parse tree is extremely concise and readable. It resembles an AST more than a parse tree (ie no 11 levels of binary op precedence rules in the tree). The parse tree emits specific ERROR nodes, so you can get a semi-functional tree even with broken syntax.<p>I can't wait for the tools to get built with this. Paredit for TypeScript. Syntax-tree based highlighting (vs regex highlighting). A command to "add an arg to current function" which works across languages. A command to add a CSS class to the nearest JSX node, or to walk up the tree at the className="| ..." position, adding a new className if it doesn't exist.<p>There's a nicely documented Emacs package for this [1]. The documentation is at [2]. The parse trees work great. There's syntax highlighting support and tree-walking APIs. There's a bit of confusion about TSX vs typescript langs but it's fixable with some config change [3].<p>[1]: <a href="https://github.com/ubolonton/emacs-tree-sitter" rel="nofollow">https://github.com/ubolonton/emacs-tree-sitter</a>
[2]: <a href="https://ubolonton.github.io/emacs-tree-sitter/" rel="nofollow">https://ubolonton.github.io/emacs-tree-sitter/</a>
[3]: <a href="https://github.com/ubolonton/emacs-tree-sitter/issues/66#issuecomment-778692779" rel="nofollow">https://github.com/ubolonton/emacs-tree-sitter/issues/66#iss...</a>
I'm an engineer on the code intelligence team at Sourcegraph.<p>We've been busy building out true precise code intelligence/navigation support, but we also have a mode for zero-configuration code navigation based on text search, universal-ctags, and hand-rolled regular expressions (which works surprisingly well!). Tree-sitter would definitely give better results than our current ctags-based approach. It's been catching our attention more and more lately, and we have plans to use it to upgrade our out-of-the-box, instant code navigation experience.<p>It's not the exact right fit for our primary goals though, since it's designed around being extremely fast while editing and robust against errors. Sourcegraph is only used for navigating committed code, so we're leveraging formats like LSIF to generate complete semantic graphs of codebases and their entire dependency tree. That'll enable a lot of features that are out of reach for tree-sitter, but is a lot harder to get working out of the box and it's a <i>much</i> bigger technical investment.<p>It's very interesting to see the topological space that houses these solutions fill out. Every tool has its own set of unique trade-offs and fall somewhere on these spectrums:<p>- fast vs slow<p>- precise vs imprecise<p>- zero-configuration vs configuration required<p>We've visited a few islands in this space but still very curious to see what other islands can be discovered. We're especially excited about tools and formats like tree-sitter and LSIF around which a large and supportive community can grow so that all the products we love and rely on as developers can all make forward progress.
I tried to use this to ease the front end work load of students in a compiler project (building a C compiler) for a University course, so that the project could be focused on the more interesting middle and back end parts of the compiler.
However, reported bugs in the C grammar that saw no activity at all [1] made this impossible. From this small sample of experiences, I was left with the impression that Tree Sitter is great for things like syntax highlighting, where wrong results are annoying but not dramatic, but not so suitable for tools that need a really correct syntax tree.<p>---
[1] <a href="https://github.com/tree-sitter/tree-sitter-c/issues/51" rel="nofollow">https://github.com/tree-sitter/tree-sitter-c/issues/51</a>
To me, the most impressive use of tree-sitter was an iOS text editor that uses it to parse huge JSON files / mixed language files and highlight them in a very robust way. [0][1] I’m hoping tree-sitter becomes more common like LSP and Emacs can get exact highlighting and other tools with it…<p>[0]: <a href="https://twitter.com/simonbs/status/1352697855845273600" rel="nofollow">https://twitter.com/simonbs/status/1352697855845273600</a><p>[1]: <a href="https://twitter.com/simonbs/status/1362492842141171720?s=21" rel="nofollow">https://twitter.com/simonbs/status/1362492842141171720?s=21</a>
You can watch a good Strangeloop presentation on Tree Sitter. <a href="https://www.youtube.com/watch?v=Jes3bD6P0To" rel="nofollow">https://www.youtube.com/watch?v=Jes3bD6P0To</a>
Tree-sitter is unfathomable to me. This is the grammar for Ruby:<p><a href="https://github.com/tree-sitter/tree-sitter-ruby/blob/master/grammar.js" rel="nofollow">https://github.com/tree-sitter/tree-sitter-ruby/blob/master/...</a><p>I find it absolutely amazing that a grammar for something as complicated as Ruby can be so concise. Less than a thousand lines. The corresponding Bison grammar is 13k lines. And I think the tree-sitter one is scannerless so also includes the lexer?! How do they do it?
If curious, past threads:<p><i>Tree-sitter: new incremental parsing system for programming tools (2018) [video]</i> - <a href="https://news.ycombinator.com/item?id=21675113" rel="nofollow">https://news.ycombinator.com/item?id=21675113</a> - Dec 2019 (28 comments)<p><i>Tree-sitter – a new parsing system for programming tools [video]</i> - <a href="https://news.ycombinator.com/item?id=18213022" rel="nofollow">https://news.ycombinator.com/item?id=18213022</a> - Oct 2018 (25 comments)<p>Others?
I recently used this to put together a unified PL classification model. It's nice because any language treesitter grows to support we'll support pretty effortlessly and treesitter captures more than enough nuance per language to derive high quality classifications.<p>It's fair to say we can classify a snippet of code based on either single or multiple AST paths produced by treesitter. Right now only doing the programming language but extending it to function classification or description etc isn't out of the question we just don't need it right now.
I'm curious to see if Tree-sitter can be used to provide fast and rich code navigation. I was able to implement simple goto definition/references [1], not sure if it can be used for more advanced navigation features in a language-agnostic way.<p>If you're interested, GitHub is already using it [2] for that purpose and Sourcegraph is experimenting it [3]<p>[1] <a href="https://github.com/alidn/lsif-os" rel="nofollow">https://github.com/alidn/lsif-os</a>
[2] <a href="https://github.com/github/semantic" rel="nofollow">https://github.com/github/semantic</a>
[3] <a href="https://github.com/sourcegraph/sourcegraph/issues/17378" rel="nofollow">https://github.com/sourcegraph/sourcegraph/issues/17378</a>
Is this the same thing neovim uses for syntax highlighting?<p>Is there a chance for it getting integrated to vim? Last I checked vim used a regex method which was slow and faulty.
So far it's the amazing tool and we are happy to use it in our projects. The only two complaints I have is the dependency on JavaScript[1] and missing Rust runtime option[2].<p>[1] <a href="https://github.com/tree-sitter/tree-sitter/issues/465" rel="nofollow">https://github.com/tree-sitter/tree-sitter/issues/465</a><p>[2] <a href="https://github.com/tree-sitter/tree-sitter/issues/465" rel="nofollow">https://github.com/tree-sitter/tree-sitter/issues/465</a>
Here's what it looks like to call it from Rust: <a href="https://github.com/tree-sitter/tree-sitter/tree/master/lib/binding_rust" rel="nofollow">https://github.com/tree-sitter/tree-sitter/tree/master/lib/b...</a><p>Seems like this would make it much easier to bootstrap a performant language-server. Very cool; maybe that will be my next project.
Wrote tree-sitter-svelte. Was a good experience. I am also writing a programming language of my own similar to TypeScript and I am using tree-sitter for the same. Its a delight to work with it. Removes a lot of the worries.
While we're in this discussion: Say I want to implement "SQL" for my app (if you've used Jira, I want to make my own JQL). Is this the tool for that? I'm looking for something much simpler than ANTLR.
This is really cool, I 100% agree that as programmers we’re editing and thinking in terms of ASTs. It just happens that text is a high density way to represent those ASTs.<p>I’m going to play with this and see if I can make a generic language server for vscode that works across languages. Unless someone has already done that.<p>What would be really cool is that tree-sitter (or a sister package) that provides incremental formatting primitives across languages.<p>The closest language agnostic formatter that comes to mind is prettier.js with its extensions.<p>incremental parser —> language server -> formatter across languages would be super rad.
I half-wrote a tree-sitter grammar for a niche DSL (the PRISM probabilistic model checking language). It was a very nice experience. It's part of another half-written side project to create a language server for PRISM; I still haven't gotten around to making the whole end-to-end pipeline work.<p>With its syntax tree query frontend I wonder whether tree-sitter would make a good interpreter frontend for some niche languages, or you need something more powerful.
Here is the "tracking issue" for JetBrains IDEs
<a href="https://youtrack.jetbrains.com/issue/KT-45087" rel="nofollow">https://youtrack.jetbrains.com/issue/KT-45087</a>
Upvote the issue if you wanna bump the priority
I tried looking through the docs, and couldn't find any mention of which algorithm you are using. It seems like some LR, grammar, but which kind? LALR? GLR? It seems like a very important bit of information, that's suspiciously missing.