Tree Sitter and the Complications of Parsing Languages

225 点作者 podiki超过 3 年前

10 条评论

matklad超过 3 年前

> Well, because it’s gosh-darn hard to do it the right way.I think this overstates the difficulty. This of course depends a lot on the language, but for a reasonable one (not C++) you can just go and write the parser by hand. I’d ballpark this as three weeks, if you know how the thing is supposed to work.> it doesn’t have to redo the whole thing on every keypress.This is probably what makes the task seem harder than it is. Incremental parsing is nice, but not mandatory. rust-analyzer and most IntelliJ parsers re-parse the whole file on every change (IJ does incremental lexing, which is simple).> The reason (most) LSP servers don’t offer syntax highlighting is because of the drag on performance.I am surprised to hear that. We never had performance problems with highlighting on the server in rust-analyzer. I remember that for Emacs specifically there were client side problems with parsing LSP JSON.> Every keystroke you type must be sent to the server, processed, a partial tree returned, and your syntax highlighting updated.That’s not the bottleneck for syntax highlighting, typechecking is (and it’s typechecking that makes highlighting especially interesting).In general, my perception of what’s going on with proper parsing in the industry is a bit different. I’d say status quo from five years back boils down to people just getting accustomed to the way things were done. Compiler authors generally didn’t think about syntax highlighting or completions, and editors generally didn’t want to do the parsing stuff. JetBrains were the exception, as they just did the thing. In this sense, LSP was a much-needed stimulus to just start doing things properly. People were building rich IDE experiences before LSP just fine (see dart analyzer), it’s just that relatively few languages saw it as an important problem to solve at all.

评论 #29330774 未加载

评论 #29328892 未加载

评论 #29328460 未加载

评论 #29328854 未加载

评论 #29330018 未加载

评论 #29330511 未加载

评论 #29328524 未加载

评论 #29329557 未加载

aidos超过 3 年前

Before this conversation is railroaded by talk about language servers, as the article points out, tree sitter tends to need to be a bit closer to the environment to be effective.There’s still work to do, but having tree sitter in neovim feels like a great step forward.

评论 #29328090 未加载

评论 #29327983 未加载

smcameron超过 3 年前

> Semantic BovinatorHeh. A long time ago I wrote a video game[1] somewhat similar to Williams Defender, and casting about for some sort of "theme" for the game, I hit upon the "editor wars", the ancient storied battle between vi and emacs. You are ostensibly "vi", (a little spaceship vaguely reminiscent of the Vipers from Battlestar Galactica) cruising through system memory, evading system processes, GDB instances, etc trying to recover your ".swp" files. How to represent Emacs? Obviously, via a giant blimp! and I could display all sorts of messages on the side of the blimp, singing the praises of Emacs, and disparaging fans of vi. And the Emacs blimp had a "memory leak", which meant that pieces of the xemacs source code would literally leak out of the back end of the blimp, with the letters floating lazily away, like smoke. So that meant I had to take a look at the xemacs source, dig through it and try to find some funny bits to put in. Of course, "semantic bovinate" jumped out at me.[2][1] <a href="https://github.com/smcameron/wordwarvi" rel="nofollow">https://github.com/smcameron/wordwarvi</a> [2] <a href="https://github.com/smcameron/wordwarvi/blob/master/wordwarvi.c#L1719" rel="nofollow">https://github.com/smcameron/wordwarvi/blob/master/wordwarvi...</a>

评论 #29339662 未加载

dgellow超过 3 年前

Checkout the project page here: <a href="https://tree-sitter.github.io/tree-sitter/" rel="nofollow">https://tree-sitter.github.io/tree-sitter/</a>Quite a lot of languages are already supported, it's really nice to see. I might have a use for such a library for a personal project :)You can play around with the playground here: <a href="https://tree-sitter.github.io/tree-sitter/playground" rel="nofollow">https://tree-sitter.github.io/tree-sitter/playground</a>

kieckerjan超过 3 年前

I suppose that these days I am one of the few professional programmers who has an active dislike of syntax highlighting. I find it immensely distracting. The only stuff I allow the highlighter to touch are my comments (I turn them bold) and I consider this a somewhat frivolous indulgence.(I appreciate the complexity of the problem, btw)

评论 #29331745 未加载

评论 #29328076 未加载

评论 #29328577 未加载

评论 #29329909 未加载

评论 #29328808 未加载

评论 #29331613 未加载

评论 #29328251 未加载

评论 #29328752 未加载

评论 #29332267 未加载

评论 #29328006 未加载

grenoire超过 3 年前

I am in love with language servers, the quality of life improvement is just unreal.

评论 #29327730 未加载

jicea超过 3 年前

I'm a maintainer of a cli HTTP client with a text plain file format, Hurl [1]. I would like to begin to add support for various IDE (VSCode, IntelliJ), starting from syntax highlighting, but I have hard time to start.I struggle on many "little" details, for instance: syntax error should be exactly the same in the terminal and in the IDE. Should I reimplement exactly the same parsing or should I reuse some of the cli tools parser? If I reuse it, how do I implement things given than, for instance, IntelliJ plugin are written in Java/Kotlin, while VScode plugin are Javascript/TypeScript, and Hurl is written in Rust...Very hard to figure all when it's not your core domain,[1] <a href="https://hurl.dev" rel="nofollow">https://hurl.dev</a>

评论 #29334704 未加载

评论 #29331184 未加载

rcshubhadeep超过 3 年前

tree-sitter is a great framework. I have used it quite a bit in past. I even created a small library on top of it, called tree-hugger (<a href="https://github.com/autosoft-dev/tree-hugger" rel="nofollow">https://github.com/autosoft-dev/tree-hugger</a>) Really enjoyed their playground as well.

IshKebab超过 3 年前

> The reason (most) LSP servers don’t offer syntax highlighting is because of the drag on performance. Every keystroke you type must be sent to the server, processed, a partial tree returned, and your syntax highlighting updated. Repeat that up to 100 words per minute (or whatever your typing speed is) and you’re looking at a lot of cross-chatter that is just better suited for in-process communication.While I agree... he might be surprised to know that that is what all language servers do anyway, even if they don't provide syntax highlighting. Every keystroke gets sent over the LSP. As JSON. It's amazing it works as well as it does.

0x008超过 3 年前

Not coming from the vim/eMacs world, I fail to understand what treesitter is compare to a language server? Why would I need both?

评论 #29329182 未加载