TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Git and Jupyter Notebooks Guide

176 点作者 sixhobbits将近 2 年前

10 条评论

enriquto将近 2 年前
Curious that they discuss several options, but ignore the totally obvious one: just use jupytext [0]. Jupytext is a (tiny) jupyter extension that reads&#x2F;writes notebooks as python files, with text cells being represented as comments. With jupytext, you do away with the stupid .ipynb format. As long as you don&#x27;t need to save the cell outputs, which is the case for version control, jupytext is the way to go.<p>People: pip install jupytext. All your python files will become notebooks, and your notebooks will become python files.<p>[0] <a href="https:&#x2F;&#x2F;jupytext.readthedocs.io&#x2F;en&#x2F;latest&#x2F;" rel="nofollow noreferrer">https:&#x2F;&#x2F;jupytext.readthedocs.io&#x2F;en&#x2F;latest&#x2F;</a>
评论 #36630307 未加载
评论 #36637482 未加载
评论 #36630798 未加载
kortex将近 2 年前
Wow, no mention of DVC (<a href="http:&#x2F;&#x2F;www.dvc.org" rel="nofollow noreferrer">http:&#x2F;&#x2F;www.dvc.org</a>)? That has been invaluable for data scientist workflows.<p>I definitely do like to strip notebooks and make them run-idempotent to the best of my ability, but sometimes you just need stateful notebooks. And since .ipynb are technically json but in reality act more like a binary file format (with respect to diffing), DVC is the ideal tool to store them. Don&#x27;t get me started on git annex or LFS, both of those took years off my life due to stress of using them and them bugging out.<p>Also I am hardly a fan of XML, but does anyone feel like notebook files would have been a near-ideal use-case of it? It&#x27;s literally a collection of markup. The fact that json was chosen over xml I think is somewhat damning of xml as an application data storage format. I think xml is perfectly cromulent as a write-once-read-many <i>presentation</i> format or rendering target (html, svg, GeniCam api info), but it seems to flounder in virtually every other domain it&#x27;s been shoehorned into, with the exception of office application formats.<p>Actually, downthread there is a link to a jupyer enhancement proposal for a .nb.md markdown based format. I think this is great. One theme I keep coming across in my computer science journey is that formats which have mandatory closing endcaps are kind of a PITA. It seems the stream-of-containers (with state machines as needed) is all-around better. JSON-LD is better than JSON, streaming video formats are better than ones that stick metadata at the end, zip is... an eldritch horror, etc.
wdroz将近 2 年前
If you don&#x27;t need to &quot;commit&quot; the output, you can just use nbconvert[0]:<p><pre><code> jupyter nbconvert --clear-output --inplace my_notebook.ipynb </code></pre> So you can use git as usual, like for code.<p>[0] -- <a href="https:&#x2F;&#x2F;nbconvert.readthedocs.io&#x2F;en&#x2F;latest&#x2F;" rel="nofollow noreferrer">https:&#x2F;&#x2F;nbconvert.readthedocs.io&#x2F;en&#x2F;latest&#x2F;</a>
评论 #36634304 未加载
milliams将近 2 年前
There is a draft JEP (Jupyter Enhancement Proposal) for Markdown-based notebooks (<a href="https:&#x2F;&#x2F;github.com&#x2F;jupyter&#x2F;enhancement-proposals&#x2F;pull&#x2F;103">https:&#x2F;&#x2F;github.com&#x2F;jupyter&#x2F;enhancement-proposals&#x2F;pull&#x2F;103</a>) which will make it a little more RMarkdown-like.
nvy将近 2 年前
Seems to me that this article does a great job explaining why jupyter notebooks are a poor collaboration tool.<p>I wish that non-emacs implementations of org were more commonplace, as it&#x27;s a pretty sane markup language and supports embedded code and graphics, diffs nicely, and doesn&#x27;t introduce the insanity of JSON.
joelschw将近 2 年前
The native GitHub feature in preview will make this a lot better for those able to use it <a href="https:&#x2F;&#x2F;github.blog&#x2F;changelog&#x2F;2023-03-01-feature-preview-rich-jupyter-notebook-diffs&#x2F;" rel="nofollow noreferrer">https:&#x2F;&#x2F;github.blog&#x2F;changelog&#x2F;2023-03-01-feature-preview-ric...</a>
评论 #36632645 未加载
sashk将近 2 年前
You don&#x27;t need to commit output into the git. I used pre-commit filter in git, where it will strip all output from the notebook before it was committed into repository. This allowed us to review the code changes of notebooks.
TeeWEE将近 2 年前
My quick solution is to not commit the result cells, only the commands. So its just code.
DryLabRebel将近 2 年前
You forgot another issue:<p>- containing potentially sensitive data in your notebook
sdfghswe将近 2 年前
I haven&#x27;t read the link and I&#x27;m not going to.<p>I realized that jupyter notebooks are a flawed idea when I&#x27;ve tried vs code. vs code uses jupyter-the-protocol (as opposed to jupyter-the-notebooks) in order to give you a notebook-like experience that doesn&#x27;t involve the jupyter notebook file format. VS code&#x27;s interactive files are valid python code.<p>To me that killed jupyter notebooks. Why use something that is strictly worse in every respect?
评论 #36633290 未加载