TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

GradIEEEnt half decent

275 点作者 notmysql_大约 2 年前

10 条评论

unlikelymordant大约 2 年前
Reminds of some earlier openai work <a href="https:&#x2F;&#x2F;openai.com&#x2F;research&#x2F;nonlinear-computation-in-deep-linear-networks" rel="nofollow">https:&#x2F;&#x2F;openai.com&#x2F;research&#x2F;nonlinear-computation-in-deep-li...</a><p>&quot;Neural networks consist of stacks of a linear layer followed by a nonlinearity like tanh or rectified linear unit. Without the nonlinearity, consecutive linear layers would be in theory mathematically equivalent to a single linear layer. So it’s a surprise that floating point arithmetic is nonlinear enough to yield trainable deep networks.&quot;
评论 #35787588 未加载
chrisldgk大约 2 年前
Always delighted to see tom7 content. Dude just thinks out of the box like no one else.
评论 #35785605 未加载
评论 #35786305 未加载
评论 #35787418 未加载
VikingCoder大约 2 年前
Tom7 is kind of like the electronic version of the Primitive Technology Youtuber.<p>It&#x27;s fascinating watching someone using some of the worst tools ever to make something in about the most labor-intensive way imaginable - and it&#x27;s just beautiful. It&#x27;s practically meditative.
评论 #35787704 未加载
eur0pa大约 2 年前
Two videos so far this year and we&#x27;re still in May. What a bliss.
评论 #35786235 未加载
sva_大约 2 年前
&gt; I think this is a fractal in the sense that it is chaotic, has a color gradient, and could be on the cover of an electronic music album<p>I didn&#x27;t know of this criterion
评论 #35786999 未加载
alexb_大约 2 年前
This is one of the few tom7 videos that I see and am utterly confused by in every way. It feels like he&#x27;s speaking a different language than I am.
评论 #35790704 未加载
tysam_and大约 2 年前
I believe that this floating point imprecision was something that David Page used (I think it may have been accidental as he was originally doing something else with it?) to achieve a world record training speed on CIFAR10 by summing a bunch of loss values together instead of taking the average. The sum effectively reduced the precision of the loss value and seemed to have a regularizing impact on network training as best as I personally understand. :) &lt;3
SAI_Peregrinus大约 2 年前
Tom7 back to his usual madness. This time, exploiting floating point rounding, first to allow a neural network activation function to be linear (except for rounding errors), then expanding until ultimately making a 6502 emulator and proving that linear operations + rounding errors are Turing complete.
ironbound大约 2 年前
Best start to a video this year, really wish more people would use a hook to get people engaged in learning.
dirtyid大约 2 年前
two tom7 videos in as many months, what a treat.