Distill: a modern machine learning journal

930 pointsby jasikparkabout 8 years ago

30 comments

j2kunabout 8 years ago

I sure hope this catches on, but we should all be aware of the hurdles:- Little incentive for researchers to do this beyond their own good will.- Most ML researchers are bad writers, and it's unlikely that the editing team will do the work needed (which is often a larger reorganization of a paper and ideas) to improve clarity.- Producing great writing and clear, interactive figures, and managing an ongoing github repo require nontrivial amounts of extra time, and researchers already have strained time budgets.- It requires you to learn git, front-end web design, random javascript libraries (I for one think d3 is a nuisance), exacerbating the time suck on tangents to research.Maybe you could convince researchers to contribute with prizes that aligned with their university's goals. Just spitballing here, but maybe for each "top paper" award, get a team together to further clarify the ideas for a public audience, collaborate with the university and their department and some pop-science writers, and get some serious publicity beyond academic circles. If that doesn't convince a university administration that the work is worth the lower publication count, what will?In the worst case it'll be the miserable graduate students' jobs to implement all these publication efforts, and they won't be able to spend time learning how to do research.

评论 #13916814 未加载

评论 #13919477 未加载

评论 #13917315 未加载

评论 #13916825 未加载

评论 #13917335 未加载

评论 #13918613 未加载

评论 #13917116 未加载

评论 #13916923 未加载

评论 #13916970 未加载

评论 #13918140 未加载

评论 #13920917 未加载

colah3about 8 years ago

Various announcements:Google Research: <a href="https://research.googleblog.com/2017/03/distill-supporting-clarity-in-machine.html" rel="nofollow">https://research.googleblog.com/2017/03/distill-supporting-c...</a>DeepMind: <a href="https://deepmind.com/blog/distill-communicating-science-machine-learning/" rel="nofollow">https://deepmind.com/blog/distill-communicating-science-mach...</a>OpenAI: <a href="https://openai.com/blog/Distill/" rel="nofollow">https://openai.com/blog/Distill/</a>YC Research: <a href="http://blog.ycombinator.com/distill-an-interactive-visual-journal-for-machine-learning-research/" rel="nofollow">http://blog.ycombinator.com/distill-an-interactive-visual-jo...</a>Chris Olah: <a href="http://colah.github.io/posts/2017-03-Distill/" rel="nofollow">http://colah.github.io/posts/2017-03-Distill/</a>

评论 #13916111 未加载

评论 #13918359 未加载

评论 #13916307 未加载

评论 #13917345 未加载

评论 #13916713 未加载

choxiabout 8 years ago

I've been trying to read more primary source information, sort of as my own way of combatting "fake news" but before that term was coined. There's a learning curve to it, but I've found that reading S1 filings and Quarterly Earnings Reports can be more enlightening than reading a news article on any given company. Likewise, reading research papers on biology and deep learning is significantly more valuable than reading articles or educational content on those topics.As you'd imagine though, it's really hard. Reading a two page research paper is a very different experience from reading a NYTimes or WSJ article. The information density is enormous, the vocabulary is very domain specific, and it can take days or weeks of re-reading and looking up terms to finally understand a paper.I'm really excited about Distill, there's a lot of value in making research papers more accessible and interesting. I've noticed that the ML/AI field has been very pioneering about research publication process, some papers are now published with source code on GitHub and the authors answering questions on r/machinelearning. This seems like a really great next step, I hope other fields of science will break away from traditional journals and do the same.

TuringNYCabout 8 years ago

I don't want to undermine visualizations, they are awesome, but one of the big problems I see with ML research is the lack of re-produceability. I know that Google, Facebook and some others already share associated source repos, but it should almost be mandatory when working with public benchmark datasets. Source + Docker Images would be even better.I worked in clinical research in a past life and studies would be highly discounted if they couldn't be reproduced. A highly detailed methods section was key. Many ML papers I see tend to have incredibly formalized LaTeX+Greek obsessed methods section, but far short of anything to allow reproduction. Some ML papers, i swear must have run their parameter searches a 1000 times to overfit and magically achieve 99% AUC.Worse, I actually have tons of spare GPU farm capacity i'd love to devote to re-producing research, tweaking, trying it on adjacent datasets, etc. But the effort to re-produce is too high for most papers.It is also disappointing to see various input datasets strewn about individuals' personal homepages, and sometimes end up broken. Sometimes the "original" dataset is in a pickled form after having already gone through multiple upstream transformations. I hope Distill can instill some good best practices to the community.

评论 #13916903 未加载

minimaxirabout 8 years ago

The announcements and About page indicate an emphasis on visuals and presentation, which I apprI've. But when I think of "modern machine learning," I think of open-source and reproducibility (e.g. Jupyter notebooks).Will the papers published on Distill maintain transparency of the statistical process?I see in the submission notes that articles are required to be a public GitHub repo, which is a positive indicator. Although the actual code itself does not seem to be a requirement.

评论 #13916943 未加载

Xeoncrossabout 8 years ago

As a developer with a weaker background in mathematics, I face a language barrier with many modern algorithms. After lots of research I can understand and explain them in code, but I have no idea what your artistic-looking MathXML means.Visualizations or algorithms described using code are much, much easier for me to understand and serve as a great starting point for unpacking the math explanations.

评论 #13916912 未加载

blinryabout 8 years ago

Shameless self-plug: If you like interactive explanations, check out <a href="http://explorableexplanations.com/" rel="nofollow">http://explorableexplanations.com/</a> and the explorables subreddit: <a href="https://www.reddit.com/r/explorables/" rel="nofollow">https://www.reddit.com/r/explorables/</a>

cingabout 8 years ago

Is there any concern about a web-native journal being less "future-proof"? I've come across quite a few interactive learning demonstrations in Flash/Java that no longer work.

评论 #13916754 未加载

评论 #13918139 未加载

dangabout 8 years ago

YC Research's (and longtime HNer!) michael_nielsen wrote an announcement here: <a href="http://blog.ycombinator.com/distill-an-interactive-visual-journal-for-machine-learning-research/" rel="nofollow">http://blog.ycombinator.com/distill-an-interactive-visual-jo...</a>. Hopefully he'll participate in the discussion too.

rememberlennyabout 8 years ago

I wish there was a way to subscribe to a weekly email related to this.

评论 #13916346 未加载

sytelusabout 8 years ago

This is great but it would have been even better if Distill was designed to play well with the current system. Vast majority of researchers are focused on publishing at various conferences with strict deadlines. Even if they had all the skillsets and time to produce these beautiful illustrations, I highly doubt this will change.Also, it is very likely that veterans in the field might think of this format as too verbose and too sugar coated, more appropriate for less math-savvy users and therefore not mainstream. Furthermore, I really feel TeX is irreplaceable unless you got all of its feature covered. All of the historic effort to replace TeX - even with bells and whistles of WYSIWYG editors - in research has failed and its important to learn from those failures. You will be surprised how many researchers insist on printing out the paper for reading even when they have access to tablets and PC.Instead of being another peer reviewed journal, Distill could act as the following:- platform to publish supplemental material and code- platform to manage communication/issues post publication- platform for readers to invite other readers for peer review and generate "front page" based on some sort of reviewer trust relationship.- platform to host Python and MatLab code with web frontends without researchers having to learn new developer skills- support pdf submissions but without all the eliteness of arxiv and using algorithms to create the "front page" based on some sort of peer reviewer rankings.Above features are indeed sorely missing and Distill has good opportunity to become an "add-on" to current academic publishing systems as opposed to another peer reviewed journal.

transcranialabout 8 years ago

This is really exciting! Chris et al: have you guys seen Keras.js (<a href="https://github.com/transcranial/keras-js" rel="nofollow">https://github.com/transcranial/keras-js</a>)? It could probably be useful for certain interactive visualizations or papers.

fnlabout 8 years ago

How does this provide IF ratings? Probably irrelevant for industry, but publishing in academia is all about IF, no matter how bad and corrupt one might think it is.And what about long-term stability/presence. Most top journals and their publishing houses (NPG, Elsevier, Springer) are likely to hang around for another decade (or two...), while I don't feel so sure about that for a product like GitHub. Maybe Distill is/will be officially backed (financially) by the industry names supporting it?That being said, I'd love seeing this succeed, but there seems much to be done to get this really "off the ground" beyond being a (much?!) nicer GitXiv.

评论 #13919340 未加载

radarsat1about 8 years ago

While this is very nice, I'm a bit confused about the target. What kind of material is intended to be published here in the future?Because the blog post and title seems to be describing it as a "journal" intended to replace PDF publications, but the actual content appears to be more in the tutorial/survey category, e.g. "how to use t-SNE," etc. Is this intended to be a place to publish new research in the future, or is it meant more for enhanced "medium"-style blog posts?Both are fine, I just find the dissonance between the announcement and the actual content a bit confusing.

chairmanwowabout 8 years ago

I feel like science publication in general could benefit from disruption of the publishing model. I'm not sure that the toolkit that Distill has provided is quite enough to totally change the paradigm, and it currently restricted to only one field.I like the idea of having research being approachable for the non-scientist, and the more important question of whether there is a more efficient form (in terms of communicating new science between scientists) for research papers to take.Is there any relevant work along this vector of thought that I should check out? Because I would really love to do some work on this.

评论 #13921298 未加载

ycHammerabout 8 years ago

Would saving jupyter notebooks as .html work? PS: I have published in all of top-4 tier ML conferences but sk at html/css/js. What is my pathway to distill now? I, like every other researcher worth her/his name in salt is always running behind clock when it comes to deadlines and lit to review. So, yeah? Coaxing myself into investing time for css/html/js in lieu of picking up more math tools seems criminal to me. Am I alone in this ?

mysoreabout 8 years ago

Wow this comes with great timing!I am a UI-developer who has been wanting to learn ML forever. I started working on1. fast.ai 2. think bayes 3. UW data science @ scale w/ coursera 4. udacity car nano degreeI'm going to write some articles about what I learn and hopefully move into the ML field as a data engineer in 6 months. I figure I got into my current job with a visual portfolio of nicely designed css/js demos, maybe the same thing will work for AI.

Old_Thrashbargabout 8 years ago

I don't see it written explicitly; can anyone confirm that this journal is fully open-access?

评论 #13916419 未加载

评论 #13916506 未加载

JorgeGTabout 8 years ago

You should definitely assign a DOI to each article.

评论 #13917591 未加载

评论 #13917275 未加载

EternalDataabout 8 years ago

Looks very good (especially the team behind it!), but I wonder if there's a discrete step down to where you make machine learning materials accessible to the general public beyond data visualizations and clear writing. This will certainly be a more interactive experience, but it seems to cater to those who are "in-the-know" and require a bit more interactivity/clarity. It'd be nice to discuss the format changes or the "TLDR" bot of machine learning that makes machine learning research truly accessible to the general public.

fwxabout 8 years ago

This is amazing! My burning question - as has been pointed out in the thread, the effort to produce a great article on Distill - generating interactive figures, doing front end web dev etc. would require a lot of time and resources on the part of the researchers. Is it possible to include within Distill an option to connect researchers to willing-and-able developers in those domains (for example, me) to help them get it done?

aabajianabout 8 years ago

I already have a nomination. The guy who wrote this blog post:<a href="http://adilmoujahid.com/posts/2016/06/introduction-deep-learning-python-caffe/" rel="nofollow">http://adilmoujahid.com/posts/2016/06/introduction-deep-lear...</a>It's the only way I could get a working model of Caffe while understanding the data preparation steps. I've already retrofitted it to classify tumors.

taliesinbabout 8 years ago

Great stuff! I'm a fan of what's gone up on distill so far. Question for colah and co if they're still around: When does the first issue of the journal come out (edit: looks like individual articles just get published when they get published, n/m). Also, that "before/after" visualization of the gradient descent convergence is intriguing -- where's it from?

评论 #13919015 未加载

blunteabout 8 years ago

I don't know jack about machine learning, but these illustrations are gorgeous - simple, elegant, and aesthetically very pleasing.

wodenokotoabout 8 years ago

Looking at the how-to section[1] for creating distil articles, I fail to find how to write math and some notes on how best to reference sections of the document.Other than that, this looks, much, much easier to write than LaTex.[1] <a href="http://distill.pub/guide/" rel="nofollow">http://distill.pub/guide/</a>

djabattabout 8 years ago

It would be cool to see greater diversity of thinking on the about page. perhaps the pub is designed for insiders.Having more research transparency is great for community of likes minds to learn from. A suggested addition is an section and team to lead a discussion ML ethics.

good_vibesabout 8 years ago

I will definitely submit my first paper to Distill. It draws upon a few different fields but the foundation is definitely machine learning.What a time to be alive!

mastaziabout 8 years ago

r/MachineLearning discussion:<a href="https://www.reddit.com/r/MachineLearning/comments/60hy0t/the_journal_distill_launches_today_in_a_nutshell/" rel="nofollow">https://www.reddit.com/r/MachineLearning/comments/60hy0t/the...</a>

ycHammerabout 8 years ago

Anyone here has any idea if Jupyter notebook -> save as .html would do the trick?

skynodeabout 8 years ago

Hopefully this won't be another ResearchGate dressed in open source clothing.

30 comments

j2kunabout 8 years ago

评论 #13916814 未加载

评论 #13919477 未加载

评论 #13917315 未加载

评论 #13916825 未加载

评论 #13917335 未加载

评论 #13918613 未加载

评论 #13917116 未加载

评论 #13916923 未加载

评论 #13916970 未加载

评论 #13918140 未加载

评论 #13920917 未加载

colah3about 8 years ago

评论 #13916111 未加载

评论 #13918359 未加载

评论 #13916307 未加载

评论 #13917345 未加载

评论 #13916713 未加载

choxiabout 8 years ago

TuringNYCabout 8 years ago

评论 #13916903 未加载

minimaxirabout 8 years ago

评论 #13916943 未加载

Xeoncrossabout 8 years ago

评论 #13916912 未加载

blinryabout 8 years ago

cingabout 8 years ago

Is there any concern about a web-native journal being less "future-proof"? I've come across quite a few interactive learning demonstrations in Flash/Java that no longer work.

评论 #13916754 未加载

评论 #13918139 未加载

dangabout 8 years ago

rememberlennyabout 8 years ago

I wish there was a way to subscribe to a weekly email related to this.

评论 #13916346 未加载

sytelusabout 8 years ago

transcranialabout 8 years ago

fnlabout 8 years ago

评论 #13919340 未加载

radarsat1about 8 years ago

chairmanwowabout 8 years ago

评论 #13921298 未加载

ycHammerabout 8 years ago

mysoreabout 8 years ago

Old_Thrashbargabout 8 years ago

I don't see it written explicitly; can anyone confirm that this journal is fully open-access?

评论 #13916419 未加载

评论 #13916506 未加载

JorgeGTabout 8 years ago

You should definitely assign a DOI to each article.

评论 #13917591 未加载

评论 #13917275 未加载

EternalDataabout 8 years ago

fwxabout 8 years ago

aabajianabout 8 years ago

taliesinbabout 8 years ago

评论 #13919015 未加载

blunteabout 8 years ago

I don't know jack about machine learning, but these illustrations are gorgeous - simple, elegant, and aesthetically very pleasing.

wodenokotoabout 8 years ago

djabattabout 8 years ago

good_vibesabout 8 years ago

I will definitely submit my first paper to Distill. It draws upon a few different fields but the foundation is definitely machine learning.What a time to be alive!

mastaziabout 8 years ago

ycHammerabout 8 years ago

Anyone here has any idea if Jupyter notebook -> save as .html would do the trick?

skynodeabout 8 years ago

Hopefully this won't be another ResearchGate dressed in open source clothing.