As always, Jeff Dean doesn't fail to inspire respect.<p>Tackling a complex problem (still relevant today) at an early age, getting great results <i>and</i> describing the solution clearly/concisely.<p>My master thesis was ~60 pages long, and was probably about 1/1000 as useful as this one.
An underappreciated aspect of this is finding an academic department that would allow you to submit something this concise as a senior thesis.<p>My experience, mostly in grad school, was that anyone editing my work wanted more verbiage. If you only needed a short, one-sentence paragraph to say something, it just wasn’t accepted. There had to be more.<p>Jeff Dean is an uncommonly good communicator. But he also benefited from being allowed, perhaps even encouraged, to prioritize effective and concise communication.<p>Most people aren’t so lucky, and end up learning that this type of concision will not go over well. People presume you’re writing like a know-it-all, or that you didn’t do due diligence on prior work.
I guess it's not totally surprising that Dean's undergrad thesis was on training neural networks and the main choice was between or in-graph replication. This is still one of the big issues with TensorFlow today.<p>One thing most people don't get is that Dean is basically a computer scientist with expertise in compiler optimizations, and TF is basically an attempt at turning neural network speedups into problems related to compiler optimization.<p>I'd like to thank my undergrad university for hosting my undergrad thesis for 25 years with only 1-2 URL changes. Some interesting details include: Latex2Html held up, mostly, for 25 years and several URL changes. The underlying topic is still relevant (training the weight coefficients of a binary classifier to maximize performance) to my work today, even if I didn't understand gradient descent or softmax at the time.
Wonder who was his advisor back then, because I don't think it's mentioned in the thesis. Or he did this on his own, which is not surprising by the way.
Really interesting and innovative early work, and I think it also explains why tensorflow does not support within layer model parallelism. It's amazing how much our early experiences shape us down the road.<p>My entire career has consisted of reimplementing bits and pieces of things I've previously built all the way back to high school and then reimplementing whatever was new on the previous round in the next one.
As a side note, I already have a draft of my essay (not published yet) that replaces the mention of storage costs with a mention of Ruth Porat. The point is why Ruth Porat was hired in the first place.