TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Ask HN: Are there any good Diff tools for Jupyter Notebooks?

51 pointsby darosatialmost 3 years ago
This is one of the biggest pain points of my work - reviewing diffs of Jupyter Notebooks.<p>Does anyone have any good tools for this that preserve the visuals of the Notebooks.<p>My approach has always been rendering the files as .py without the cell outputs and comparing which is a big PITA.<p>Anyone have any advice?

14 comments

stiffalmost 3 years ago
You can use jupytext to maintain dual .py&#x2F;.ipynb representation of notebooks and keep both versions in sync:<p><a href="https:&#x2F;&#x2F;github.com&#x2F;mwouts&#x2F;jupytext&#x2F;blob&#x2F;main&#x2F;docs&#x2F;paired-notebooks.md" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;mwouts&#x2F;jupytext&#x2F;blob&#x2F;main&#x2F;docs&#x2F;paired-not...</a><p>It works both ways, it can update the .py file each time you save the notebook, or you can edit the .py file and have the jupytext command line tool update the .ipynb.
评论 #31469029 未加载
nolrozalmost 3 years ago
Visual studio code has a diffing view for notebooks that looks very promising. <a href="https:&#x2F;&#x2F;code.visualstudio.com&#x2F;docs&#x2F;datascience&#x2F;jupyter-notebooks#_custom-notebook-diffing" rel="nofollow">https:&#x2F;&#x2F;code.visualstudio.com&#x2F;docs&#x2F;datascience&#x2F;jupyter-noteb...</a>
评论 #31471813 未加载
评论 #31473652 未加载
dahartalmost 3 years ago
Can you talk more about why you’re working in Jupyter Notebooks at a level that needs diff reviews? Are you reviewing your own work, or the work of others?<p>One option would be to start a policy to always “restart and clear output” before saving. This cleans the output cells and makes the .ipynb files diffable. Just happens to also make them nice for storing in version control.<p>Another option would be to work in pure python files in the first place, and only use Jupyter after the fact. The close brother to Jupyter is the Spyder IDE, which gives you most of the benefits of quick visual outputs, but also has a nice python debugger built in.
评论 #31470317 未加载
评论 #31471010 未加载
aulinalmost 3 years ago
I used something as a precommit hook in the past that removed plots and other rendered content and only kept text and code in git index. I&#x27;m almost sure it was <a href="https:&#x2F;&#x2F;github.com&#x2F;kynan&#x2F;nbstripout" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;kynan&#x2F;nbstripout</a> but it&#x27;s been a while and I could be wrong.<p>Once the hook was in place git diff worked well enough to not need any other diffing tool.
评论 #31470601 未加载
cschmidtalmost 3 years ago
There is <a href="https:&#x2F;&#x2F;nbdime.readthedocs.io&#x2F;en&#x2F;latest&#x2F;" rel="nofollow">https:&#x2F;&#x2F;nbdime.readthedocs.io&#x2F;en&#x2F;latest&#x2F;</a>, although I haven&#x27;t used it personally to know how good it is.
评论 #31469292 未加载
评论 #31473731 未加载
amirathialmost 3 years ago
Here are tools people commonly use for notebook version control with git -<p>[1] nbdime to view local diffs &amp; merge changes<p>[2] jupytext for 2-way sync between notebook &amp; markdown&#x2F;scripts<p>[3] JupyterLab git extension for git clone &#x2F; pull &#x2F; push &amp; see visual diffs<p>[4] Jupyerlab gitplus to create GitHub PRs from JupyterLab<p>[5] ReviewNB for reviewing &amp; diff&#x27;ing notebook PRs &#x2F; Commits on GitHub<p>Disclaimer: While I’m the author of last two (GitPlus &amp; ReviewNB), I’ve represented the overall landscape in an unbiased way. I&#x27;ve been working on this specific problem for 3+ years &amp; regularly talk to teams who use GitHub with notebooks.<p>[1] <a href="https:&#x2F;&#x2F;nbdime.readthedocs.io" rel="nofollow">https:&#x2F;&#x2F;nbdime.readthedocs.io</a><p>[2] <a href="https:&#x2F;&#x2F;jupytext.readthedocs.io" rel="nofollow">https:&#x2F;&#x2F;jupytext.readthedocs.io</a><p>[3] <a href="https:&#x2F;&#x2F;github.com&#x2F;jupyterlab&#x2F;jupyterlab-git" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;jupyterlab&#x2F;jupyterlab-git</a><p>[4] <a href="https:&#x2F;&#x2F;github.com&#x2F;ReviewNB&#x2F;jupyterlab-gitplus" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;ReviewNB&#x2F;jupyterlab-gitplus</a><p>[5] <a href="https:&#x2F;&#x2F;www.reviewnb.com&#x2F;" rel="nofollow">https:&#x2F;&#x2F;www.reviewnb.com&#x2F;</a>
iqkznnftalmost 3 years ago
The solution is don&#x27;t use ipynb. Instead, use an IDE that can run code segments in files, and version those files.<p>You end up with files which are syntactically correct code, versionable, and can be run in segments just like ipynb. Win, win, win.
exevpalmost 3 years ago
You can use clean and smudge filters in git. Since notebook files are JSON it&#x27;s pretty straightforward to stripe outputs from them using `jq`:<p><a href="http:&#x2F;&#x2F;timstaley.co.uk&#x2F;posts&#x2F;making-git-and-jupyter-notebooks-play-nice&#x2F;" rel="nofollow">http:&#x2F;&#x2F;timstaley.co.uk&#x2F;posts&#x2F;making-git-and-jupyter-notebook...</a>
yanbianhoboalmost 3 years ago
We use ReviewNB at work, it integrates very nicely with github providing the same PR review workflow, it’s a paid tool though.
rgavuliakalmost 3 years ago
We’re using reviewNB, it works though we don’t do too many iterations of a notebook.
barrrraldalmost 3 years ago
Hex just launched a diff view feature, along with git sync and a clean file format: <a href="https:&#x2F;&#x2F;hex.tech&#x2F;blog&#x2F;github-sync" rel="nofollow">https:&#x2F;&#x2F;hex.tech&#x2F;blog&#x2F;github-sync</a>
dkeathleyalmost 3 years ago
In addition to this, you can keep a dual markdown version that uses a much more human-readable syntax and preserves both code and markdown sections of the Jupyter notebook. This is also via jupytext. In both jupyterlab and jupyter you can pair the two versions (something like what is discussed here: <a href="https:&#x2F;&#x2F;www.wrighters.io&#x2F;jupytext-notebooks-as-markdown-or-python&#x2F;" rel="nofollow">https:&#x2F;&#x2F;www.wrighters.io&#x2F;jupytext-notebooks-as-markdown-or-p...</a>) and they will stay in sync automatically.
freedombenalmost 3 years ago
For the Elixir equivalent of Jupyter (called Livebook) I&#x27;ve been keeping the markdown files in a `livebooks` directory so diffing them is as easy as `git diff` or any other existing text-based diff tools. It&#x27;s been pretty successful.
TekMolalmost 3 years ago
In Google Colab, when you &quot;Download ipynb&quot; you get a file that looks like json.<p>You can prettify it via &quot;python3 -m json.tool&quot; for example. Then you have a structure that you can diff via your favorite diff tool.<p>What is a pita about it?