TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

GradIEEEnt half decent

275 pointsby notmysql_about 2 years ago

10 comments

unlikelymordantabout 2 years ago
Reminds of some earlier openai work <a href="https:&#x2F;&#x2F;openai.com&#x2F;research&#x2F;nonlinear-computation-in-deep-linear-networks" rel="nofollow">https:&#x2F;&#x2F;openai.com&#x2F;research&#x2F;nonlinear-computation-in-deep-li...</a><p>&quot;Neural networks consist of stacks of a linear layer followed by a nonlinearity like tanh or rectified linear unit. Without the nonlinearity, consecutive linear layers would be in theory mathematically equivalent to a single linear layer. So it’s a surprise that floating point arithmetic is nonlinear enough to yield trainable deep networks.&quot;
评论 #35787588 未加载
chrisldgkabout 2 years ago
Always delighted to see tom7 content. Dude just thinks out of the box like no one else.
评论 #35785605 未加载
评论 #35786305 未加载
评论 #35787418 未加载
VikingCoderabout 2 years ago
Tom7 is kind of like the electronic version of the Primitive Technology Youtuber.<p>It&#x27;s fascinating watching someone using some of the worst tools ever to make something in about the most labor-intensive way imaginable - and it&#x27;s just beautiful. It&#x27;s practically meditative.
评论 #35787704 未加载
eur0paabout 2 years ago
Two videos so far this year and we&#x27;re still in May. What a bliss.
评论 #35786235 未加载
sva_about 2 years ago
&gt; I think this is a fractal in the sense that it is chaotic, has a color gradient, and could be on the cover of an electronic music album<p>I didn&#x27;t know of this criterion
评论 #35786999 未加载
alexb_about 2 years ago
This is one of the few tom7 videos that I see and am utterly confused by in every way. It feels like he&#x27;s speaking a different language than I am.
评论 #35790704 未加载
tysam_andabout 2 years ago
I believe that this floating point imprecision was something that David Page used (I think it may have been accidental as he was originally doing something else with it?) to achieve a world record training speed on CIFAR10 by summing a bunch of loss values together instead of taking the average. The sum effectively reduced the precision of the loss value and seemed to have a regularizing impact on network training as best as I personally understand. :) &lt;3
SAI_Peregrinusabout 2 years ago
Tom7 back to his usual madness. This time, exploiting floating point rounding, first to allow a neural network activation function to be linear (except for rounding errors), then expanding until ultimately making a 6502 emulator and proving that linear operations + rounding errors are Turing complete.
ironboundabout 2 years ago
Best start to a video this year, really wish more people would use a hook to get people engaged in learning.
dirtyidabout 2 years ago
two tom7 videos in as many months, what a treat.