TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Tree-sitter: an incremental parsing system for programming tools

476 点作者 sbt567大约 4 年前

25 条评论

srcreigh大约 4 年前
Tree Sitter is amazing. The parsing is fast enough to run on every keystroke. The parse tree is extremely concise and readable. It resembles an AST more than a parse tree (ie no 11 levels of binary op precedence rules in the tree). The parse tree emits specific ERROR nodes, so you can get a semi-functional tree even with broken syntax.<p>I can&#x27;t wait for the tools to get built with this. Paredit for TypeScript. Syntax-tree based highlighting (vs regex highlighting). A command to &quot;add an arg to current function&quot; which works across languages. A command to add a CSS class to the nearest JSX node, or to walk up the tree at the className=&quot;| ...&quot; position, adding a new className if it doesn&#x27;t exist.<p>There&#x27;s a nicely documented Emacs package for this [1]. The documentation is at [2]. The parse trees work great. There&#x27;s syntax highlighting support and tree-walking APIs. There&#x27;s a bit of confusion about TSX vs typescript langs but it&#x27;s fixable with some config change [3].<p>[1]: <a href="https:&#x2F;&#x2F;github.com&#x2F;ubolonton&#x2F;emacs-tree-sitter" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;ubolonton&#x2F;emacs-tree-sitter</a> [2]: <a href="https:&#x2F;&#x2F;ubolonton.github.io&#x2F;emacs-tree-sitter&#x2F;" rel="nofollow">https:&#x2F;&#x2F;ubolonton.github.io&#x2F;emacs-tree-sitter&#x2F;</a> [3]: <a href="https:&#x2F;&#x2F;github.com&#x2F;ubolonton&#x2F;emacs-tree-sitter&#x2F;issues&#x2F;66#issuecomment-778692779" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;ubolonton&#x2F;emacs-tree-sitter&#x2F;issues&#x2F;66#iss...</a>
评论 #26227214 未加载
评论 #26227850 未加载
评论 #26231316 未加载
评论 #26227516 未加载
评论 #26228252 未加载
评论 #26227084 未加载
评论 #26230533 未加载
efritz大约 4 年前
I&#x27;m an engineer on the code intelligence team at Sourcegraph.<p>We&#x27;ve been busy building out true precise code intelligence&#x2F;navigation support, but we also have a mode for zero-configuration code navigation based on text search, universal-ctags, and hand-rolled regular expressions (which works surprisingly well!). Tree-sitter would definitely give better results than our current ctags-based approach. It&#x27;s been catching our attention more and more lately, and we have plans to use it to upgrade our out-of-the-box, instant code navigation experience.<p>It&#x27;s not the exact right fit for our primary goals though, since it&#x27;s designed around being extremely fast while editing and robust against errors. Sourcegraph is only used for navigating committed code, so we&#x27;re leveraging formats like LSIF to generate complete semantic graphs of codebases and their entire dependency tree. That&#x27;ll enable a lot of features that are out of reach for tree-sitter, but is a lot harder to get working out of the box and it&#x27;s a <i>much</i> bigger technical investment.<p>It&#x27;s very interesting to see the topological space that houses these solutions fill out. Every tool has its own set of unique trade-offs and fall somewhere on these spectrums:<p>- fast vs slow<p>- precise vs imprecise<p>- zero-configuration vs configuration required<p>We&#x27;ve visited a few islands in this space but still very curious to see what other islands can be discovered. We&#x27;re especially excited about tools and formats like tree-sitter and LSIF around which a large and supportive community can grow so that all the products we love and rely on as developers can all make forward progress.
评论 #26234015 未加载
评论 #26232953 未加载
评论 #26232896 未加载
ritter2a大约 4 年前
I tried to use this to ease the front end work load of students in a compiler project (building a C compiler) for a University course, so that the project could be focused on the more interesting middle and back end parts of the compiler. However, reported bugs in the C grammar that saw no activity at all [1] made this impossible. From this small sample of experiences, I was left with the impression that Tree Sitter is great for things like syntax highlighting, where wrong results are annoying but not dramatic, but not so suitable for tools that need a really correct syntax tree.<p>--- [1] <a href="https:&#x2F;&#x2F;github.com&#x2F;tree-sitter&#x2F;tree-sitter-c&#x2F;issues&#x2F;51" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;tree-sitter&#x2F;tree-sitter-c&#x2F;issues&#x2F;51</a>
评论 #26238374 未加载
maxbrunsfeld大约 4 年前
Hey, Tree-sitter author here. Thanks for posting! Let me know if you have questions about the project.
评论 #26230392 未加载
评论 #26227838 未加载
评论 #26231525 未加载
评论 #26227479 未加载
评论 #26228485 未加载
评论 #26227312 未加载
评论 #26382708 未加载
评论 #26231047 未加载
评论 #26230599 未加载
pcr910303大约 4 年前
To me, the most impressive use of tree-sitter was an iOS text editor that uses it to parse huge JSON files &#x2F; mixed language files and highlight them in a very robust way. [0][1] I’m hoping tree-sitter becomes more common like LSP and Emacs can get exact highlighting and other tools with it…<p>[0]: <a href="https:&#x2F;&#x2F;twitter.com&#x2F;simonbs&#x2F;status&#x2F;1352697855845273600" rel="nofollow">https:&#x2F;&#x2F;twitter.com&#x2F;simonbs&#x2F;status&#x2F;1352697855845273600</a><p>[1]: <a href="https:&#x2F;&#x2F;twitter.com&#x2F;simonbs&#x2F;status&#x2F;1362492842141171720?s=21" rel="nofollow">https:&#x2F;&#x2F;twitter.com&#x2F;simonbs&#x2F;status&#x2F;1362492842141171720?s=21</a>
评论 #26226292 未加载
评论 #26226242 未加载
评论 #26226267 未加载
alissasobo大约 4 年前
You can watch a good Strangeloop presentation on Tree Sitter. <a href="https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=Jes3bD6P0To" rel="nofollow">https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=Jes3bD6P0To</a>
chrisseaton大约 4 年前
Tree-sitter is unfathomable to me. This is the grammar for Ruby:<p><a href="https:&#x2F;&#x2F;github.com&#x2F;tree-sitter&#x2F;tree-sitter-ruby&#x2F;blob&#x2F;master&#x2F;grammar.js" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;tree-sitter&#x2F;tree-sitter-ruby&#x2F;blob&#x2F;master&#x2F;...</a><p>I find it absolutely amazing that a grammar for something as complicated as Ruby can be so concise. Less than a thousand lines. The corresponding Bison grammar is 13k lines. And I think the tree-sitter one is scannerless so also includes the lexer?! How do they do it?
评论 #26226559 未加载
评论 #26227244 未加载
评论 #26233763 未加载
评论 #26226729 未加载
dang大约 4 年前
If curious, past threads:<p><i>Tree-sitter: new incremental parsing system for programming tools (2018) [video]</i> - <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=21675113" rel="nofollow">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=21675113</a> - Dec 2019 (28 comments)<p><i>Tree-sitter – a new parsing system for programming tools [video]</i> - <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=18213022" rel="nofollow">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=18213022</a> - Oct 2018 (25 comments)<p>Others?
评论 #26231186 未加载
Grimm1大约 4 年前
I recently used this to put together a unified PL classification model. It&#x27;s nice because any language treesitter grows to support we&#x27;ll support pretty effortlessly and treesitter captures more than enough nuance per language to derive high quality classifications.<p>It&#x27;s fair to say we can classify a snippet of code based on either single or multiple AST paths produced by treesitter. Right now only doing the programming language but extending it to function classification or description etc isn&#x27;t out of the question we just don&#x27;t need it right now.
Annili大约 4 年前
I&#x27;m curious to see if Tree-sitter can be used to provide fast and rich code navigation. I was able to implement simple goto definition&#x2F;references [1], not sure if it can be used for more advanced navigation features in a language-agnostic way.<p>If you&#x27;re interested, GitHub is already using it [2] for that purpose and Sourcegraph is experimenting it [3]<p>[1] <a href="https:&#x2F;&#x2F;github.com&#x2F;alidn&#x2F;lsif-os" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;alidn&#x2F;lsif-os</a> [2] <a href="https:&#x2F;&#x2F;github.com&#x2F;github&#x2F;semantic" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;github&#x2F;semantic</a> [3] <a href="https:&#x2F;&#x2F;github.com&#x2F;sourcegraph&#x2F;sourcegraph&#x2F;issues&#x2F;17378" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;sourcegraph&#x2F;sourcegraph&#x2F;issues&#x2F;17378</a>
评论 #26231104 未加载
ducktective大约 4 年前
Is this the same thing neovim uses for syntax highlighting?<p>Is there a chance for it getting integrated to vim? Last I checked vim used a regex method which was slow and faulty.
评论 #26225648 未加载
drewdennison大约 4 年前
We&#x27;ve been using tree-sitter for Semgrep and it&#x27;s nothing short of incredible. Amazing work by Max and team.
评论 #26232483 未加载
xvilka大约 4 年前
So far it&#x27;s the amazing tool and we are happy to use it in our projects. The only two complaints I have is the dependency on JavaScript[1] and missing Rust runtime option[2].<p>[1] <a href="https:&#x2F;&#x2F;github.com&#x2F;tree-sitter&#x2F;tree-sitter&#x2F;issues&#x2F;465" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;tree-sitter&#x2F;tree-sitter&#x2F;issues&#x2F;465</a><p>[2] <a href="https:&#x2F;&#x2F;github.com&#x2F;tree-sitter&#x2F;tree-sitter&#x2F;issues&#x2F;465" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;tree-sitter&#x2F;tree-sitter&#x2F;issues&#x2F;465</a>
brundolf大约 4 年前
Here&#x27;s what it looks like to call it from Rust: <a href="https:&#x2F;&#x2F;github.com&#x2F;tree-sitter&#x2F;tree-sitter&#x2F;tree&#x2F;master&#x2F;lib&#x2F;binding_rust" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;tree-sitter&#x2F;tree-sitter&#x2F;tree&#x2F;master&#x2F;lib&#x2F;b...</a><p>Seems like this would make it much easier to bootstrap a performant language-server. Very cool; maybe that will be my next project.
评论 #26227353 未加载
himujjal大约 4 年前
Wrote tree-sitter-svelte. Was a good experience. I am also writing a programming language of my own similar to TypeScript and I am using tree-sitter for the same. Its a delight to work with it. Removes a lot of the worries.
评论 #26238414 未加载
guerrilla大约 4 年前
Is the use case for this mainly IDEs or is it intended to replace traditional lexer and parser generators too?
评论 #26226532 未加载
评论 #26227275 未加载
wiradikusuma大约 4 年前
While we&#x27;re in this discussion: Say I want to implement &quot;SQL&quot; for my app (if you&#x27;ve used Jira, I want to make my own JQL). Is this the tool for that? I&#x27;m looking for something much simpler than ANTLR.
评论 #26238399 未加载
nojvek大约 4 年前
This is really cool, I 100% agree that as programmers we’re editing and thinking in terms of ASTs. It just happens that text is a high density way to represent those ASTs.<p>I’m going to play with this and see if I can make a generic language server for vscode that works across languages. Unless someone has already done that.<p>What would be really cool is that tree-sitter (or a sister package) that provides incremental formatting primitives across languages.<p>The closest language agnostic formatter that comes to mind is prettier.js with its extensions.<p>incremental parser —&gt; language server -&gt; formatter across languages would be super rad.
ahelwer大约 4 年前
I half-wrote a tree-sitter grammar for a niche DSL (the PRISM probabilistic model checking language). It was a very nice experience. It&#x27;s part of another half-written side project to create a language server for PRISM; I still haven&#x27;t gotten around to making the whole end-to-end pipeline work.<p>With its syntax tree query frontend I wonder whether tree-sitter would make a good interpreter frontend for some niche languages, or you need something more powerful.
sidntrivedi大约 4 年前
Does GitHub currently use tree-sitter for syntax highlighting? If yes, are the libraries open-source? Thanks :)
评论 #26238277 未加载
The_rationalist大约 4 年前
Here is the &quot;tracking issue&quot; for JetBrains IDEs <a href="https:&#x2F;&#x2F;youtrack.jetbrains.com&#x2F;issue&#x2F;KT-45087" rel="nofollow">https:&#x2F;&#x2F;youtrack.jetbrains.com&#x2F;issue&#x2F;KT-45087</a> Upvote the issue if you wanna bump the priority
erezsh大约 4 年前
I tried looking through the docs, and couldn&#x27;t find any mention of which algorithm you are using. It seems like some LR, grammar, but which kind? LALR? GLR? It seems like a very important bit of information, that&#x27;s suspiciously missing.
评论 #26259587 未加载
评论 #26260567 未加载
stelf大约 4 年前
Where is the SQL parser? Any specific reason why is it missing (not even started)?
评论 #26238146 未加载
The_rationalist大约 4 年前
Are there any benchmarck available versus TextMate regexes?
amelius大约 4 年前
Next steps: incrementally resolve symbols and type-check?
评论 #26227476 未加载