Estimating Number of Jupyter Notebooks on Github

275 pointsby eoinmurray92about 6 years ago

23 comments

In the same spirit as “Effective Java” and “Effective C++” we need to have a book entitled “Effective Jupyter Notebooks”. Here are some of my items below. Maybe this sub-thread can come up with an outline for this book.Item #1 Writing a notebook is foremost an exercise in expository writing. Make sure the writing is high quality is the first objective when writing a notebook. This is the Knuth’s literate programming idea where prose takes precedence over code, which is usually the reverse of the way we usually program; code first, comments second.Item #2 Don't use notebooks for general purpose programming. Notebooks are supposed to have an audience and clearly explain something.Item #3 Keep code cells simple and clear. If needing a comment in the code block, consider putting that verbiage in a markdown cell instead and elaborating on the idea the notebook is trying to convey.Item #4 Don't make notebooks a long series of extended code cells, or even worse, just one long cell. Explain what is going on or see Item #2.

评论 #19861511 未加载

评论 #19861324 未加载

评论 #19861258 未加载

评论 #19861043 未加载

评论 #19861350 未加载

评论 #19861005 未加载

评论 #19861184 未加载

评论 #19875501 未加载

评论 #19862054 未加载

评论 #19866316 未加载

评论 #19862191 未加载

kbdabout 6 years ago

If you ever put notebooks in source control, you owe it to yourself to try the text-based notebooks supported in Visual Studio Code[1]. They're round-trippable with real (i.e. browser-based) notebooks, yet are much better for collaboration, diffing, and editing.[1] <a href="https://code.visualstudio.com/docs/python/jupyter-support" rel="nofollow">https://code.visualstudio.com/docs/python/jupyter-support</a>

评论 #19860386 未加载

评论 #19860854 未加载

评论 #19862964 未加载

评论 #19862103 未加载

评论 #19865310 未加载

alpbabout 6 years ago

Why doesn't this just use the GitHub public dataset available on Google BigQuery to have much more accurate data rather than "scraping GitHub web search results"? <a href="https://cloud.google.com/bigquery/public-data/" rel="nofollow">https://cloud.google.com/bigquery/public-data/</a>There are a lot of examples of people analyzing public code on GitHub efficiently for patterns and usages with BigQuery and getting pretty accurate data out of it. <a href="https://medium.com/google-cloud/analyzing-go-code-with-bigquery-485c70c3b451" rel="nofollow">https://medium.com/google-cloud/analyzing-go-code-with-bigqu...</a>If you use GitHub on a daily basis, you are unlucky enough to know that web search sadly can't even find words that exist in your repository.

评论 #19863125 未加载

manaskarekarabout 6 years ago

Off topic:Possibly something in my config, but I've recently got a lot of<pre><code> "Sorry, something went wrong. Reload?" </code></pre> when trying to view Jupyter notebooks on github itself. Seems to be working right now.I have used this as an alternate: <a href="https://nbviewer.jupyter.org/" rel="nofollow">https://nbviewer.jupyter.org/</a>

评论 #19860035 未加载

评论 #19859937 未加载

评论 #19859861 未加载

评论 #19860370 未加载

评论 #19859855 未加载

评论 #19859983 未加载

tincholioabout 6 years ago

If only more people would use org-babel...If you're on emacs and like Jupyter, there's <a href="https://github.com/dzop/emacs-jupyter" rel="nofollow">https://github.com/dzop/emacs-jupyter</a> , which is pretty nice. I've been using it for a few days with Julia, and it works really nice. It also allows you to use different kernels from the same org-mode file, though I haven't tried to pass data between them yet (should be possible, though, at least it works in plain org-mode).

评论 #19860957 未加载

评论 #19863336 未加载

评论 #19860771 未加载

lmeyerovabout 6 years ago

We use notebooks heavily for onboarding devs & data scientists to Graphistry, and I only see that increasing.Interestingly, for initial use, we increasingly start teams on their existing internal NB servers, and for new ones, they either start on Jupyter included in their Graphistry AMI or use Google Colab. So, very little outside of our quick start notebook skeletons hits GitHub.So... How many notebooks are actually out there? Probably an even more interesting growth curve...!

评论 #19860256 未加载

eoinmurray92about 6 years ago

OP and Founder of Kyso here - we built Kyso to make it easier to blog your notebooks to the public and also to make them easier to share in teams.The linked post is actually a Jupyter notebook itself - analysing the number of notebooks on Github.A key element with Kyso is that the code is hidden by default to make it readable to non-technical people but you can click on the "code hidden" button on the top right to see the code in full.If you want to give Kyso a go - sign up and import from Github directly on this page: <a href="https://kyso.io/github" rel="nofollow">https://kyso.io/github</a>, or upload using this page: <a href="https://kyso.io/create/study" rel="nofollow">https://kyso.io/create/study</a>

评论 #19860297 未加载

评论 #19860231 未加载

评论 #19862736 未加载

sytseabout 6 years ago

We're seeing an explosion of Jupyter use as well on GitLab. GitLab already makes Jupyter easier to install on a Kubernetes cluster <a href="https://docs.gitlab.com/ee/user/project/clusters/#installing-applications" rel="nofollow">https://docs.gitlab.com/ee/user/project/clusters/#installing...</a> In response to the growing demand we're doing two things:1. Adding better Jupyter support to GitLab 12.0 <a href="https://gitlab.com/gitlab-org/gitlab-ce/issues/47138" rel="nofollow">https://gitlab.com/gitlab-org/gitlab-ce/issues/47138</a> as suggested by my co-founder.2. Making it easier to do the entire data lifecycle with Meltano <a href="https://meltano.com/" rel="nofollow">https://meltano.com/</a> which plans to include JupyterHub

xchaoticabout 6 years ago

Python Notebooks will the be the PERL of 2010s - write once, pretty impossible to maintain long term

hodderabout 6 years ago

Very cool work here. This is a pretty epic post, so please do not take this the wrong way.I was under the impression that FB Prophet was optimal for significantly seasonal time series data.Honestly given the fickle nature of these kind of growth patterns beyond the very near term, an ARIMA with a flat vol or a simple eyeball extrapolation in my experience as a quant would likely generate just as reasonable/reliable results.While I understand this is likely intended as a standalone project, it would be interesting to run a comparison of ARIMA vs FB Prophet on out of sample trending Github tools/file types, as well as the general performance of these predictions beyond a one year time frame (especially vs the reported confidence intervals in Prophet).I am not that familiar with how Prophet works, so I am absolutely open to being humbled and corrected. I have a project myself that has a varying seasonal component and I am looking forward to diving into Prophet for a deeper understanding. I am attempting to model an Asian 2 asset spread option with a volume weighted average index price setting mechanism where the underlying exhibits seasonality in the volume traded over the trading time window. I am currently running a Monte Carlo on the valuation with a simple average settlement assumption, as opposed to a volume weighted average assumption, and I was thinking Prophet could help.Does anyone have experience in financial time series analysis and option valuation who would care to chime in?Also, what is everyone's thoughts on using prophet non seasonal vol clustering times series?

评论 #19861360 未加载

martinzugnoniabout 6 years ago

My two cents: We've been recently working in a FREE hosted version of Jupyter Lab mainly intended for education. Feel free to check it out.<a href="https://notebooks.ai/" rel="nofollow">https://notebooks.ai/</a>Would love to hear some feedback.

评论 #19860260 未加载

评论 #19860787 未加载

KyleOSabout 6 years ago

I think it would be cool to run the same analysis on the number of R Notebooks on Github and compare the two.

评论 #19860243 未加载

评论 #19860170 未加载

评论 #19860130 未加载

syntaxingabout 6 years ago

On the topic of Jupyter Notebooks, Is there something similar to a paid version of Google's CoLab? CoLab is so awesome for creating prototypes and even better since it's free. However, there is no paid alternative that I have seen. I do not want to have to deal with setting up my own VM or server. The way that CoLab is perfect for what I need.

评论 #19860488 未加载

评论 #19860478 未加载

评论 #19860512 未加载

评论 #19864890 未加载

评论 #19863026 未加载

nsxwolfabout 6 years ago

So I just learned they're not laptops.

评论 #19862858 未加载

randomfoolabout 6 years ago

There's also the GitHub extracts table available in BigQuery which allows analysis of the contents of the notebooks themselves: <a href="https://bigquery.cloud.google.com/table/fh-bigquery:github_extracts.contents_ipynb?pli=1" rel="nofollow">https://bigquery.cloud.google.com/table/fh-bigquery:github_e...</a>

airockerabout 6 years ago

We built a runnable jupyter notebook website. Would someone be able to take a look and give us some feedback?<a href="https://datacabinet.systems" rel="nofollow">https://datacabinet.systems</a>We are VM based for now but are moving to be kubernetes based to make sharing better. Our initial market is classrooms.

评论 #19860698 未加载

评论 #19860607 未加载

hyperbovineabout 6 years ago

Especially odd because they are so unsuitable for use with git. Someone needs to find a way to fix this.

评论 #19860898 未加载

gus_massaabout 6 years ago

Isn't the prediction too low? My (unsupported) prediction fitting a smooth curve in the graphic is <a href="https://imgur.com/a/ykeIxPm" rel="nofollow">https://imgur.com/a/ykeIxPm</a>

JBorrowabout 6 years ago

This seems like an incredibly complicated way to fit an exponential to data...

formalsystemabout 6 years ago

Maybe it's time to be able to run them implicitly on Azure cloud?

trpcabout 6 years ago

nice marketing, kyso.io team

funkythingsssabout 6 years ago

I hate jupyter notebooks. Joel Grus puts it perfectly: <a href="https://docs.google.com/presentation/d/1n2RlMdmv1p25Xy5thJUhkKGvjtV-dkAIsUXP-AL4ffI/edit#slide=id.g362da58057_0_1" rel="nofollow">https://docs.google.com/presentation/d/1n2RlMdmv1p25Xy5thJUh...</a>past hn discussion: <a href="https://news.ycombinator.com/item?id=17856700" rel="nofollow">https://news.ycombinator.com/item?id=17856700</a>

评论 #19860437 未加载

jjthebluntabout 6 years ago

exponentially?

评论 #19861079 未加载