Wow, no mention of DVC (<a href="http://www.dvc.org" rel="nofollow noreferrer">http://www.dvc.org</a>)? That has been invaluable for data scientist workflows.<p>I definitely do like to strip notebooks and make them run-idempotent to the best of my ability, but sometimes you just need stateful notebooks. And since .ipynb are technically json but in reality act more like a binary file format (with respect to diffing), DVC is the ideal tool to store them. Don't get me started on git annex or LFS, both of those took years off my life due to stress of using them and them bugging out.<p>Also I am hardly a fan of XML, but does anyone feel like notebook files would have been a near-ideal use-case of it? It's literally a collection of markup. The fact that json was chosen over xml I think is somewhat damning of xml as an application data storage format. I think xml is perfectly cromulent as a write-once-read-many <i>presentation</i> format or rendering target (html, svg, GeniCam api info), but it seems to flounder in virtually every other domain it's been shoehorned into, with the exception of office application formats.<p>Actually, downthread there is a link to a jupyer enhancement proposal for a .nb.md markdown based format. I think this is great. One theme I keep coming across in my computer science journey is that formats which have mandatory closing endcaps are kind of a PITA. It seems the stream-of-containers (with state machines as needed) is all-around better. JSON-LD is better than JSON, streaming video formats are better than ones that stick metadata at the end, zip is... an eldritch horror, etc.