科技回声

10 条评论

enriquto将近 2 年前

Curious that they discuss several options, but ignore the totally obvious one: just use jupytext [0]. Jupytext is a (tiny) jupyter extension that reads/writes notebooks as python files, with text cells being represented as comments. With jupytext, you do away with the stupid .ipynb format. As long as you don't need to save the cell outputs, which is the case for version control, jupytext is the way to go.People: pip install jupytext. All your python files will become notebooks, and your notebooks will become python files.[0] <a href="https://jupytext.readthedocs.io/en/latest/" rel="nofollow noreferrer">https://jupytext.readthedocs.io/en/latest/</a>

评论 #36630307 未加载

评论 #36637482 未加载

评论 #36630798 未加载

kortex将近 2 年前

Wow, no mention of DVC (<a href="http://www.dvc.org" rel="nofollow noreferrer">http://www.dvc.org</a>)? That has been invaluable for data scientist workflows.I definitely do like to strip notebooks and make them run-idempotent to the best of my ability, but sometimes you just need stateful notebooks. And since .ipynb are technically json but in reality act more like a binary file format (with respect to diffing), DVC is the ideal tool to store them. Don't get me started on git annex or LFS, both of those took years off my life due to stress of using them and them bugging out.Also I am hardly a fan of XML, but does anyone feel like notebook files would have been a near-ideal use-case of it? It's literally a collection of markup. The fact that json was chosen over xml I think is somewhat damning of xml as an application data storage format. I think xml is perfectly cromulent as a write-once-read-many presentation format or rendering target (html, svg, GeniCam api info), but it seems to flounder in virtually every other domain it's been shoehorned into, with the exception of office application formats.Actually, downthread there is a link to a jupyer enhancement proposal for a .nb.md markdown based format. I think this is great. One theme I keep coming across in my computer science journey is that formats which have mandatory closing endcaps are kind of a PITA. It seems the stream-of-containers (with state machines as needed) is all-around better. JSON-LD is better than JSON, streaming video formats are better than ones that stick metadata at the end, zip is... an eldritch horror, etc.

wdroz将近 2 年前

If you don't need to "commit" the output, you can just use nbconvert[0]:<pre><code> jupyter nbconvert --clear-output --inplace my_notebook.ipynb </code></pre> So you can use git as usual, like for code.[0] -- <a href="https://nbconvert.readthedocs.io/en/latest/" rel="nofollow noreferrer">https://nbconvert.readthedocs.io/en/latest/</a>

评论 #36634304 未加载

milliams将近 2 年前

There is a draft JEP (Jupyter Enhancement Proposal) for Markdown-based notebooks (<a href="https://github.com/jupyter/enhancement-proposals/pull/103">https://github.com/jupyter/enhancement-proposals/pull/103</a>) which will make it a little more RMarkdown-like.

nvy将近 2 年前

Seems to me that this article does a great job explaining why jupyter notebooks are a poor collaboration tool.I wish that non-emacs implementations of org were more commonplace, as it's a pretty sane markup language and supports embedded code and graphics, diffs nicely, and doesn't introduce the insanity of JSON.

joelschw将近 2 年前

The native GitHub feature in preview will make this a lot better for those able to use it <a href="https://github.blog/changelog/2023-03-01-feature-preview-rich-jupyter-notebook-diffs/" rel="nofollow noreferrer">https://github.blog/changelog/2023-03-01-feature-preview-ric...</a>

评论 #36632645 未加载

sashk将近 2 年前

You don't need to commit output into the git. I used pre-commit filter in git, where it will strip all output from the notebook before it was committed into repository. This allowed us to review the code changes of notebooks.

TeeWEE将近 2 年前

My quick solution is to not commit the result cells, only the commands. So its just code.

DryLabRebel将近 2 年前

You forgot another issue:- containing potentially sensitive data in your notebook

sdfghswe将近 2 年前

I haven't read the link and I'm not going to.I realized that jupyter notebooks are a flawed idea when I've tried vs code. vs code uses jupyter-the-protocol (as opposed to jupyter-the-notebooks) in order to give you a notebook-like experience that doesn't involve the jupyter notebook file format. VS code's interactive files are valid python code.To me that killed jupyter notebooks. Why use something that is strictly worse in every respect?

评论 #36633290 未加载

10 条评论

enriquto将近 2 年前

评论 #36630307 未加载

评论 #36637482 未加载

评论 #36630798 未加载

kortex将近 2 年前

wdroz将近 2 年前

评论 #36634304 未加载

milliams将近 2 年前

nvy将近 2 年前

joelschw将近 2 年前

评论 #36632645 未加载

sashk将近 2 年前

TeeWEE将近 2 年前

My quick solution is to not commit the result cells, only the commands. So its just code.

DryLabRebel将近 2 年前

You forgot another issue:- containing potentially sensitive data in your notebook

sdfghswe将近 2 年前

评论 #36633290 未加载

Git and Jupyter Notebooks Guide

10 条评论

Git and Jupyter Notebooks Guide

10 条评论