Since this is built on top of tree-sitter, it can be extended[0] to work with other languages as well, no matter how obscure, as long as a tree-sitter grammar exists. This IMO really highlights the power of having an ecosystem built around tools like tree-sitter because they allow for powerful dev UX tools to be more democratized. Excellent syntax highlighting, error recovery, linting, now tree-sensitive diffing can be provided to languages big and small.<p>[0] <a href="https://difftastic.wilfred.me.uk/adding_a_parser.html" rel="nofollow">https://difftastic.wilfred.me.uk/adding_a_parser.html</a>
This is a really cool example of tree diffing via path finding. I noticed that this was the approach I used when I did tree diffing, and sure enough looks like this was inspired by autochrome which was inspired by my post (<a href="https://thume.ca/2017/06/17/tree-diffing/" rel="nofollow">https://thume.ca/2017/06/17/tree-diffing/</a>).<p>I'm curious exactly why A* failed here. It worked great for me, as long as you design a good heuristic. I imagine it might have been complicated to design a good heuristic with an expanded move set. I see autochrome had to abandon A* and has an explanation of why, but that explanation shouldn't apply to difftastic I think.
Although, I do not have much to add; using `difftastic`[0] & `delta` [1] is a very cool combo to make _git_ a little more approachable for newbies like me.<p>I use delta as my daily driver but sometimes when I want the contextual info, switching to `env GIT_EXTERNAL_DIFF=difft git log -p --ext-diff` gives a better picture.<p>[0]: <a href="https://github.com/Wilfred/difftastic" rel="nofollow">https://github.com/Wilfred/difftastic</a><p>[1]: <a href="https://github.com/dandavison/delta" rel="nofollow">https://github.com/dandavison/delta</a>
<i>Fun fact: I thought of diffing programs as working out what has changed. The goal of diffing is actually to work out what hasn’t changed!</i><p>+1 insightful
My dream would be to have a three-way merge tool that worked like this at a semantic level. It feels like merges almost always have the information needed to automatically resolve, but our line-based tools are too simple to see it.
My favorite <i>diff</i> tool is <i>diff2html</i> - see the diff in your browser as HTML!<p><a href="https://diff2html.xyz/" rel="nofollow">https://diff2html.xyz/</a><p>Install the CLI, run the command (alias diff='diff2html -s side') - I run this at least every time before committing to quickly see all I've done.
One more tree-sitter based diffing tool - diffsitter<p><a href="https://github.com/afnanenayet/diffsitter" rel="nofollow">https://github.com/afnanenayet/diffsitter</a>
SemanticMerge is an existing commercial product that works in a similar fashion. I've found it much nicer to use than text-based diff tools.<p><a href="https://www.plasticscm.com/semanticmerge/documentation/intro-guide/semanticmerge-intro-guide" rel="nofollow">https://www.plasticscm.com/semanticmerge/documentation/intro...</a>
This post reminded me of the Trail Of Bits post about Graphtage: <a href="https://blog.trailofbits.com/2020/08/28/graphtage/" rel="nofollow">https://blog.trailofbits.com/2020/08/28/graphtage/</a> Graphtage is written in Python and can be used as a library. But it seems Difftastic can be used as a git diff tool directly.
A friend (Hi Jeff!) wrote DiffMerge: <a href="https://sourcegear.com/diffmerge" rel="nofollow">https://sourcegear.com/diffmerge</a> - another alternative diff & merge.
Is this something you can turn on be default for Git when working with others that don’t use Difftastic, or could that lead to some weird behaviors?<p>(I don’t know enough about the internals of Git to answer this myself.)
I like that this seems to have ways to address two common problems with tree diffs:<p>1. The nesting/unnesting/merging/splitting problem that many tree diff algorithms have a hard time with are handled here by allowing the diff to see insertion/deletion of delimiters. I guess one way to see it is that the tool is doing a text level diff augmented with the tree structures to calculate the cost of diffs and choose the cheapest one it can find.<p>2. The problem of syntax errors. I think this just depends a lot on how well tree sitter copes with syntax errors (or weird syntax that’s hard to parse or eg a committed merge conflict) but my understanding is that it is designed to cope ok with syntax errors.<p>I basically felt that tree diffing was not very viable because of these issues but seeing this project I think I’ve changed my mind. I guess it remains to be seen how good performance is though (maybe this isn’t good enough yet if it can sometimes be very slow).
This article loses me when it gets to the "calculating the diff" section.<p>>Autochrome and difftastic represent diffing as a shortest path problem on a directed acyclic graph. A vertex represents a pair of positions: the position in the left-hand side s-expression (before), and the position in the right-hand side s-expression (after).<p>>The goal is to find the shortest route from the start vertex (where both positions are before the first item in the programs) to the end vertex (where both positions are after the last item in the program).<p>I don't understand what this means at all? What is a "position" here? Position of what? If it's a graph, what do the edges represent? The diagrams afterwards aren't very helpful either, I can't make head nor tail of them. They talk about a "start" vertex and an "end" vertex, but before that it said that a vertex is a pair of start-end positions ... I'm totally lost.
I got lost on the last example: `(foo (bar))` -> `(foo (novel) (bar))`. The diff that adds `novel` and a pair of parentheses seemed impossible -- wouldn't you also have to delete and re-add `bar`?<p>Writing this comment it occurs to me that the structural diff doesn't translate to plaintext very well, and thus is not accessible to folks with red-green colorblindness.
It is interesting to see the use of the A* path finding algorithm for finding optimal matchings of nodes.<p>The approach from Chawathe et. al splits nodes from the before/after trees into chains by their label in the syntax grammar, and then runs myers’ longest common subsequence on each pair of chains. Some parameters t, f are used to have an approximate ‘equals’ method for subtrees.<p>This iteratively builds a set of matchings between equivalent nodes from the old and new trees. Here’s the paper <a href="https://dl.acm.org/doi/10.1145/235968.233366" rel="nofollow">https://dl.acm.org/doi/10.1145/235968.233366</a><p>I’d be curious to see if this approach handles re-ordering of nodes better. The ‘fastMatch’ algorithm described above will typically miss matching cases where a node that is not order sensitive (i.e a function in a namespace can be moved somewhere else in that namespace).
Smarter diffs are great, but what is especially nice is smarter merges. I would love to see this extended to be able to do the same kinds of auto-merges that SemanticMerge could. I would pay for that functionality (just like I paid for SemanticMerge for years before they took it away to instead entice people to pay for their stupid VCS).
This makes sense to me on a high level, but I would love to see more practical examples so I can quickly wrap my head around use cases without having to download and use the tool first.
On cool diff tools, diffoscope is my favourite:<p><a href="https://diffoscope.org/" rel="nofollow">https://diffoscope.org/</a>
I see some Clojure examples in the article. We use Clojure almost exclusively and this would be a great replacement for line-based Git diffs, providing much more insight in actual changes.<p>Is there any chance such an alternative differ could be used in Git (and adjacent tools like GitLab), or are we stuck with line-based forever?
Nice! Will def check it out. What I don’t understand is why it sees “value”->target as a diff.
Also inside the if statement, target was semantically unchanged
I would love to use this but I can't be bothered to save the text to a file every time. I always paste into<p><a href="https://www.diffnow.com/compare-clips" rel="nofollow">https://www.diffnow.com/compare-clips</a> or
<a href="http://incaseofstairs.com/jsdiff/" rel="nofollow">http://incaseofstairs.com/jsdiff/</a><p>Does anybody know any better alternatives which work with pasting?