A commit history of BERT and its forks

103 pointsby amitnessabout 5 years ago

12 comments

ivorasabout 5 years ago

First we need to move away from PDF as a method of distribution of research papers. While it's an immense improvement over the dead-tree medium and images containing scanned paper pages, it's a visual presentation medium instead of a semantic medium.We'll see how long that particular revolution in communication takes.Once / if we have all that information in even a loosely structured but plaintext format (whatever that may be ... JSON? XML? semantic HTML? MHTML? TeX? as long as both images and text can live in the same file), there will be another revolution in accessibility, ease of distribution and accountability.Who knows, we might even stop mentioning GitHub as a best practice for that and use something distributed, blockchain-like.Imagine a future where citations of previous work looks like "As H.G.Wells et all mention in block 4f98aaf1 document #391, the foo is better than the bar." with hyperlinks of course. Peddling my own wares, it can be done with something like <a href="https://github.com/ivoras/daisy" rel="nofollow">https://github.com/ivoras/daisy</a> .

评论 #23124792 未加载

评论 #23127003 未加载

评论 #23126635 未加载

评论 #23125675 未加载

评论 #23129471 未加载

评论 #23125795 未加载

activatedgeekabout 5 years ago

I think this nature of retro-fitting models used for software engineering into research is ill-targeted.Research is non-linear by nature. One of the reasons research code is usually not well written is because of this same fact - there is no clear plan (or else it wouldn’t be research). Restricting research cultures to follow this linear progress model also imposes an artificial block that may be detrimental to productivity. Alternatively, this exercise could demand a second pass over research code which cleans it up and puts it in its “place”. This may or may not work because “clean up” is a non-trivial task which the researcher may find limited time and utility for.By no means I’m against writing clean code. In fact, I’ve had multiple discussions advocating it but it’s hard to keep everyone aligned. The primary objective of research in a fast-paced field like Machine Learning is to get the idea out the door. Unfortunately, the exceptionally compounding benefits of writing clean thoughtful code from the beginning are realized much later in this timeline. By then, it’s too late and too hard to retro-fix code.My understanding so far (having written code for multiple research projects) is that the only way to fix this culture is to deeply ingrain the compounding benefits of clean code in new researchers. By exposing junior researchers to the compounding benefits of “clean” code, we can gradually nudge the community into a more favorable culture with respect to this “commit history” of research.

评论 #23125274 未加载

6gvONxR4sf7oabout 5 years ago

I think this is missing the point of the research. The point of e.g. ALBERT isn't<pre><code> -Next Sentence Prediction +Sentence Order Prediction +Cross-layer Parameter Sharing +Factorized Embeddings </code></pre> The point was what each of those things do, in terms of theory and experimentation. Factorized embeddings are needed because of the portion of memory used in embeddings, NSP isn't useful because it's too easy, etc. Consider a paper like this Smoothness and Stability in GANs [0] from ICLR the other week. You could summarize it as Wassertstein loss + spectral normalization + gradient penalty. Or you could summarize the same work as a generator's convex conjugate + fenchel-moreau duality + L-smoothness + inf-convolutions. Both would be missing the point. Research isn't code. Ideas, their motivation, their demonstration, their explanation, and their testing are represented in the form they are for a reason. Natural language + mathematical language + tables + graphs + references, etc.It's why we aren't having this discussion in terms of diffs versus comments we're replying to.[0] <a href="https://iclr.cc/virtual_2020/poster_HJeOekHKwr.html" rel="nofollow">https://iclr.cc/virtual_2020/poster_HJeOekHKwr.html</a>

评论 #23126617 未加载

RobertoGabout 5 years ago

Related to this, I have this idea for a while that, publishing a paper, could mean to appear as a submission to a Hacker News style forum, where the users are identified members of the scientific community.It would allow a kind of public review process of the paper and it would float the more interested papers for the community to the "first page".

评论 #23124718 未加载

评论 #23124980 未加载

评论 #23125278 未加载

nfcabout 5 years ago

I've given some thought to this and a related idea.Let's say that for a subset of scientific papers you have the possibility of specifying both the premises and results in a way that can be composed.Let's say for example that your result is that the rate of expansion of the universe is N. Other papers could cite this result let's say through a URL of the result. Other papers could then use this URL as premise for their results and we could create an automatic system that would notify all this papers if the result has changed after new data for any of the elements of the chain. Scientists could be notified that they should revise their own papers to see if their conclusions change with the new data so other papers depending on them can be notified. A paper could be even marked stale if after the change of one of the premises the authors have not confirmed that the conclusions are still valid, this staleness would propagate down the chainA very simplified structure of the data would be sth like this:{ premises: ['urlOfResult1', 'urlOfResult2',...], conclusions: [RateOfExpansion: > N] }This is obviously terribly simplified, I suppose it'd take me a lot of time to clearly explain in more detail how such a system could work and I thought about it a long time ago.It could be interesting to apply this to other fields, for example public policy: In this case let's say that we have created this law because of this piece of data. The data changes, we could be notified that perhaps the law could benefit from a new look.{ premises: ['urlOfResultLeadIsNotDangerous'], conclusions: ['PeopleCanUseLeadInPipesAsMuchAsTheyWant'] }Such a system could be made even more generic. Probably people have already worked in this kind of systems but I never took the time to investigate it. If someone knows of examples of this kind of systems applied I'd be very happy to know more

评论 #23125161 未加载

mimixcoabout 5 years ago

We are working on a similar concept[0] using a programming language designed for this exact purpose[1]. One of our key ideas is "fact diffing" between two papers with different narrative text. We think this will be useful in all kinds of scientific and academic work.[0] <a href="http://mimix.io/recipes" rel="nofollow">http://mimix.io/recipes</a>[1] <a href="http://mimix.io/specs" rel="nofollow">http://mimix.io/specs</a>

评论 #23124725 未加载

canjobearabout 5 years ago

A meaningful diff summary like this can only be made in retrospect after some time. There are a lot more differences between, say, BERT and XLNet than the ones listed. At the time of publication, it wasn’t yet clear which particular differences were the important ones (and to some extent it is still not clear).

goose847about 5 years ago

A really interesting take! Makes scientific papers feel more like a tool rather than an article or book. Certainly nice in the case of Computational work. Also allows the potential for long term projects that get incremented on in what would’ve been separate papers. Additionally if new, relevant information comes to light long after a paper has been published, the authors could reference this to give a more complete story.

评论 #23124157 未加载

JBiserkovabout 5 years ago

See <a href="https://nextjournal.com/#feature-immutability" rel="nofollow">https://nextjournal.com/#feature-immutability</a> for an example with Immutability on all levels of the system - not just the source code, but also the computations/analyses.

mjw1007about 5 years ago

I wish research papers had a much stronger notion of "effective date".By that, I mean the date to use to interpret any wording like "current" or "recently" or "yet" used inside the paper.For some reason preprints often have no visible date on them at all, and automated datestamps can be misleadingly recent, if someone makes a minor change without rewriting the whole thing.

usrusrabout 5 years ago

Almost, but not quite, entirely unlike this?<a href="https://en.wikipedia.org/wiki/Special:History/Wikipedia:No_original_research" rel="nofollow">https://en.wikipedia.org/wiki/Special:History/Wikipedia:No_o...</a>