TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Difftastic, the fantastic diff

698 pointsby goranmoominover 2 years ago

30 comments

sirabenover 2 years ago
Since this is built on top of tree-sitter, it can be extended[0] to work with other languages as well, no matter how obscure, as long as a tree-sitter grammar exists. This IMO really highlights the power of having an ecosystem built around tools like tree-sitter because they allow for powerful dev UX tools to be more democratized. Excellent syntax highlighting, error recovery, linting, now tree-sensitive diffing can be provided to languages big and small.<p>[0] <a href="https:&#x2F;&#x2F;difftastic.wilfred.me.uk&#x2F;adding_a_parser.html" rel="nofollow">https:&#x2F;&#x2F;difftastic.wilfred.me.uk&#x2F;adding_a_parser.html</a>
评论 #32752287 未加载
评论 #32748039 未加载
评论 #32775950 未加载
trishumeover 2 years ago
This is a really cool example of tree diffing via path finding. I noticed that this was the approach I used when I did tree diffing, and sure enough looks like this was inspired by autochrome which was inspired by my post (<a href="https:&#x2F;&#x2F;thume.ca&#x2F;2017&#x2F;06&#x2F;17&#x2F;tree-diffing&#x2F;" rel="nofollow">https:&#x2F;&#x2F;thume.ca&#x2F;2017&#x2F;06&#x2F;17&#x2F;tree-diffing&#x2F;</a>).<p>I&#x27;m curious exactly why A* failed here. It worked great for me, as long as you design a good heuristic. I imagine it might have been complicated to design a good heuristic with an expanded move set. I see autochrome had to abandon A* and has an explanation of why, but that explanation shouldn&#x27;t apply to difftastic I think.
评论 #32749083 未加载
mfrwover 2 years ago
Although, I do not have much to add; using `difftastic`[0] &amp; `delta` [1] is a very cool combo to make _git_ a little more approachable for newbies like me.<p>I use delta as my daily driver but sometimes when I want the contextual info, switching to `env GIT_EXTERNAL_DIFF=difft git log -p --ext-diff` gives a better picture.<p>[0]: <a href="https:&#x2F;&#x2F;github.com&#x2F;Wilfred&#x2F;difftastic" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;Wilfred&#x2F;difftastic</a><p>[1]: <a href="https:&#x2F;&#x2F;github.com&#x2F;dandavison&#x2F;delta" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;dandavison&#x2F;delta</a>
评论 #32747405 未加载
robin_realaover 2 years ago
<i>Fun fact: I thought of diffing programs as working out what has changed. The goal of diffing is actually to work out what hasn’t changed!</i><p>+1 insightful
grogersover 2 years ago
My dream would be to have a three-way merge tool that worked like this at a semantic level. It feels like merges almost always have the information needed to automatically resolve, but our line-based tools are too simple to see it.
评论 #32750120 未加载
评论 #32749308 未加载
评论 #32750582 未加载
yborisover 2 years ago
My favorite <i>diff</i> tool is <i>diff2html</i> - see the diff in your browser as HTML!<p><a href="https:&#x2F;&#x2F;diff2html.xyz&#x2F;" rel="nofollow">https:&#x2F;&#x2F;diff2html.xyz&#x2F;</a><p>Install the CLI, run the command (alias diff=&#x27;diff2html -s side&#x27;) - I run this at least every time before committing to quickly see all I&#x27;ve done.
评论 #32752098 未加载
yewenjieover 2 years ago
One more tree-sitter based diffing tool - diffsitter<p><a href="https:&#x2F;&#x2F;github.com&#x2F;afnanenayet&#x2F;diffsitter" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;afnanenayet&#x2F;diffsitter</a>
X-Cubedover 2 years ago
SemanticMerge is an existing commercial product that works in a similar fashion. I&#x27;ve found it much nicer to use than text-based diff tools.<p><a href="https:&#x2F;&#x2F;www.plasticscm.com&#x2F;semanticmerge&#x2F;documentation&#x2F;intro-guide&#x2F;semanticmerge-intro-guide" rel="nofollow">https:&#x2F;&#x2F;www.plasticscm.com&#x2F;semanticmerge&#x2F;documentation&#x2F;intro...</a>
评论 #32749559 未加载
评论 #32749393 未加载
Gehinnnover 2 years ago
It would be so cool if there was a json output, so that other tools (eg VS Code) can use this diffing algorithm! Thanks for explaining the algorithm!
评论 #32750188 未加载
luke-stanleyover 2 years ago
This post reminded me of the Trail Of Bits post about Graphtage: <a href="https:&#x2F;&#x2F;blog.trailofbits.com&#x2F;2020&#x2F;08&#x2F;28&#x2F;graphtage&#x2F;" rel="nofollow">https:&#x2F;&#x2F;blog.trailofbits.com&#x2F;2020&#x2F;08&#x2F;28&#x2F;graphtage&#x2F;</a> Graphtage is written in Python and can be used as a library. But it seems Difftastic can be used as a git diff tool directly.
prirunover 2 years ago
A friend (Hi Jeff!) wrote DiffMerge: <a href="https:&#x2F;&#x2F;sourcegear.com&#x2F;diffmerge" rel="nofollow">https:&#x2F;&#x2F;sourcegear.com&#x2F;diffmerge</a> - another alternative diff &amp; merge.
评论 #32750433 未加载
janaagaardover 2 years ago
Is this something you can turn on be default for Git when working with others that don’t use Difftastic, or could that lead to some weird behaviors?<p>(I don’t know enough about the internals of Git to answer this myself.)
评论 #32747673 未加载
评论 #32747905 未加载
评论 #32752976 未加载
评论 #32749218 未加载
dan-robertsonover 2 years ago
I like that this seems to have ways to address two common problems with tree diffs:<p>1. The nesting&#x2F;unnesting&#x2F;merging&#x2F;splitting problem that many tree diff algorithms have a hard time with are handled here by allowing the diff to see insertion&#x2F;deletion of delimiters. I guess one way to see it is that the tool is doing a text level diff augmented with the tree structures to calculate the cost of diffs and choose the cheapest one it can find.<p>2. The problem of syntax errors. I think this just depends a lot on how well tree sitter copes with syntax errors (or weird syntax that’s hard to parse or eg a committed merge conflict) but my understanding is that it is designed to cope ok with syntax errors.<p>I basically felt that tree diffing was not very viable because of these issues but seeing this project I think I’ve changed my mind. I guess it remains to be seen how good performance is though (maybe this isn’t good enough yet if it can sometimes be very slow).
评论 #32751392 未加载
_dain_over 2 years ago
This article loses me when it gets to the &quot;calculating the diff&quot; section.<p>&gt;Autochrome and difftastic represent diffing as a shortest path problem on a directed acyclic graph. A vertex represents a pair of positions: the position in the left-hand side s-expression (before), and the position in the right-hand side s-expression (after).<p>&gt;The goal is to find the shortest route from the start vertex (where both positions are before the first item in the programs) to the end vertex (where both positions are after the last item in the program).<p>I don&#x27;t understand what this means at all? What is a &quot;position&quot; here? Position of what? If it&#x27;s a graph, what do the edges represent? The diagrams afterwards aren&#x27;t very helpful either, I can&#x27;t make head nor tail of them. They talk about a &quot;start&quot; vertex and an &quot;end&quot; vertex, but before that it said that a vertex is a pair of start-end positions ... I&#x27;m totally lost.
评论 #32762117 未加载
bqmjjx0kacover 2 years ago
I got lost on the last example: `(foo (bar))` -&gt; `(foo (novel) (bar))`. The diff that adds `novel` and a pair of parentheses seemed impossible -- wouldn&#x27;t you also have to delete and re-add `bar`?<p>Writing this comment it occurs to me that the structural diff doesn&#x27;t translate to plaintext very well, and thus is not accessible to folks with red-green colorblindness.
评论 #32753060 未加载
评论 #32750468 未加载
buffaloPizzaBoyover 2 years ago
It is interesting to see the use of the A* path finding algorithm for finding optimal matchings of nodes.<p>The approach from Chawathe et. al splits nodes from the before&#x2F;after trees into chains by their label in the syntax grammar, and then runs myers’ longest common subsequence on each pair of chains. Some parameters t, f are used to have an approximate ‘equals’ method for subtrees.<p>This iteratively builds a set of matchings between equivalent nodes from the old and new trees. Here’s the paper <a href="https:&#x2F;&#x2F;dl.acm.org&#x2F;doi&#x2F;10.1145&#x2F;235968.233366" rel="nofollow">https:&#x2F;&#x2F;dl.acm.org&#x2F;doi&#x2F;10.1145&#x2F;235968.233366</a><p>I’d be curious to see if this approach handles re-ordering of nodes better. The ‘fastMatch’ algorithm described above will typically miss matching cases where a node that is not order sensitive (i.e a function in a namespace can be moved somewhere else in that namespace).
评论 #32762070 未加载
ziml77over 2 years ago
Smarter diffs are great, but what is especially nice is smarter merges. I would love to see this extended to be able to do the same kinds of auto-merges that SemanticMerge could. I would pay for that functionality (just like I paid for SemanticMerge for years before they took it away to instead entice people to pay for their stupid VCS).
评论 #32750788 未加载
yositoover 2 years ago
This makes sense to me on a high level, but I would love to see more practical examples so I can quickly wrap my head around use cases without having to download and use the tool first.
pabs3over 2 years ago
Reminds me of the token-based git authorship tool, cregit:<p><a href="https:&#x2F;&#x2F;github.com&#x2F;cregit&#x2F;cregit" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;cregit&#x2F;cregit</a> <a href="https:&#x2F;&#x2F;lwn.net&#x2F;Articles&#x2F;698425&#x2F;" rel="nofollow">https:&#x2F;&#x2F;lwn.net&#x2F;Articles&#x2F;698425&#x2F;</a> <a href="https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=iXZV5uAYMJI" rel="nofollow">https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=iXZV5uAYMJI</a>
pabs3over 2 years ago
On cool diff tools, diffoscope is my favourite:<p><a href="https:&#x2F;&#x2F;diffoscope.org&#x2F;" rel="nofollow">https:&#x2F;&#x2F;diffoscope.org&#x2F;</a>
评论 #32751796 未加载
vincentdmover 2 years ago
I see some Clojure examples in the article. We use Clojure almost exclusively and this would be a great replacement for line-based Git diffs, providing much more insight in actual changes.<p>Is there any chance such an alternative differ could be used in Git (and adjacent tools like GitLab), or are we stuck with line-based forever?
评论 #32748516 未加载
jbverschoorover 2 years ago
Nice! Will def check it out. What I don’t understand is why it sees “value”-&gt;target as a diff. Also inside the if statement, target was semantically unchanged
评论 #32747865 未加载
jesse__over 2 years ago
That&#x27;s a diffing fantastic blog post, thanks for writing it! Not to mention the actual tool ;) I&#x27;ll definitely be trying it out
idiocraticover 2 years ago
Great article. It’s inspiring to see what applications some simple algorithms can have. Kudos for building the tool and sharing your approach.
AndrewDuckerover 2 years ago
For the semantic side of things, could an LSP server be used? Building diffs on top of that would be very handy.
评论 #32748892 未加载
rossmohaxover 2 years ago
Next step is to make an editor where we type&#x2F;edit AST, not text.
评论 #32749095 未加载
评论 #32749621 未加载
billconanover 2 years ago
I hope this can be an API too, to support other utility tools.
bifftasticover 2 years ago
I like the name
whoomp12342over 2 years ago
no wandows installation :-(
评论 #32749072 未加载
mfbx9da4over 2 years ago
I would love to use this but I can&#x27;t be bothered to save the text to a file every time. I always paste into<p><a href="https:&#x2F;&#x2F;www.diffnow.com&#x2F;compare-clips" rel="nofollow">https:&#x2F;&#x2F;www.diffnow.com&#x2F;compare-clips</a> or <a href="http:&#x2F;&#x2F;incaseofstairs.com&#x2F;jsdiff&#x2F;" rel="nofollow">http:&#x2F;&#x2F;incaseofstairs.com&#x2F;jsdiff&#x2F;</a><p>Does anybody know any better alternatives which work with pasting?
评论 #32748029 未加载
评论 #32760259 未加载
评论 #32748033 未加载
评论 #32749351 未加载
评论 #32749736 未加载