TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Schedule-Free Learning – A New Way to Train

131 pointsby ironboundabout 1 year ago

8 comments

johndoughabout 1 year ago
I did a quick comparison on MNIST with a small ConvNet, comparing this AdamWSCheduleFree optimizer against a few other optimizers (RAdam, NAdam, AdamW, SGD, Adam, Adafactor, SophiaG). The validation accuracy seems to be okay and the train loss decreases remarkably quickly.<p>Validation accuracy: <a href="https:&#x2F;&#x2F;i.imgur.com&#x2F;8ZtX7Rd.png" rel="nofollow">https:&#x2F;&#x2F;i.imgur.com&#x2F;8ZtX7Rd.png</a><p>Train loss: <a href="https:&#x2F;&#x2F;i.imgur.com&#x2F;o5XdQ29.png" rel="nofollow">https:&#x2F;&#x2F;i.imgur.com&#x2F;o5XdQ29.png</a><p>Code: <a href="https:&#x2F;&#x2F;bpa.st&#x2F;NVJQ" rel="nofollow">https:&#x2F;&#x2F;bpa.st&#x2F;NVJQ</a> (currently only runs on my computer, but not enough time to clean it up)<p>Note that this is just a toy benchmark with very little hyperparameter tuning. You could probably get similar results with most optimizers and an appropriate schedule. Nevertheless, I appreciate every hyperparameter that I do not have to set manually.<p>In summary, this seems to be a promising optimizer. I&#x27;ll add it to my list of optimizers to try for new deep learning projects.
评论 #39957206 未加载
tysam_andabout 1 year ago
This is a pretty hyped-up optimizer that seems to have okay-ish performance in-practice, but there are a number of major red flags here. For one, the baselines are decently sandbagged, but the twitter posts sharing them (which are pretty hype-y) directly says that the baselines are &quot;highly tuned&quot; and that there&#x27;s no benchmark trickery (which is flat-out wrong). If someone has not had experience with said benchmarks, it is a plausible statement, having worked with some these datasets very closely, some of the baselines are simply terrible, I don&#x27;t know where they came from.<p>Additionally, the optimizer does actually appear to have a kind of momentum, despite claims directly saying the contrary, and uses it with a nesterov-like step (line 2 of 3 in the inner loop). Finally, it is &#x27;schedule-free&#x27; because the schedule is actually hardcoded into the algorithm itself -- 1.&#x2F;steps_taken which is not necessarily a rare learning rate schedule. This is a decently robust but sometimes suboptimal schedule, and I find it sketchy to make claims that it is &#x27;schedule-free&#x27;. This also cripples the optimizer by tying performance to the number of steps taken -- which is potentially a problem if you are using any batchsize+lr scaling strategies as I understand.<p>There is a mixture of hype and substance here, and I wish the author was more straightforward with their approach and claims. I think there is the potential for a good &quot;bolts-included&quot; optimizer with some of the ideas being presented here, but the amount of overhyping and deception makes me not want to trust any of the following work coming.<p>Unfortunately, hype is what sells best on Twitter, and some of the claims being made here appear to be at the very best deceptive, and at the very worst, untrue. I could be wrong -- these are just my personal opinions from my own experience, but I do occasionally find myself distraught about the things that tend to catch wind in the technical news cycle.<p>-Fern
评论 #39957818 未加载
评论 #39957576 未加载
shostackabout 1 year ago
And here I was hoping for something related to how to approach self-driven learning and education when you have a hectic and unpredictable schedule and are trying to fit learning in-between things with the fragments of time you have.
评论 #39955122 未加载
评论 #39957590 未加载
danielhanchenabout 1 year ago
I was asking on Twitter if Aaron had any experiments for transformers, since they provided some graphs for CNNs and the like, but no transformers.<p>* Aaron et al&#x27;s past work on D-Adaptation won a best ICML paper, with their follow up work being Prodigy - but both on transformers did similar or worse than AdamW. <a href="https:&#x2F;&#x2F;twitter.com&#x2F;danielhanchen&#x2F;status&#x2F;1775547139248341125" rel="nofollow">https:&#x2F;&#x2F;twitter.com&#x2F;danielhanchen&#x2F;status&#x2F;1775547139248341125</a><p>* Superconvergence + LR range finder + Fast AI&#x27;s Ranger21 optimizer was the goto optimizer for CNNs, and worked fabulously well, but on transformers, the learning rate range finder sadi 1e-3 was the best, whilst 1e-5 was better. However, the 1 cycle learning rate stuck. <a href="https:&#x2F;&#x2F;github.com&#x2F;huggingface&#x2F;transformers&#x2F;issues&#x2F;16013">https:&#x2F;&#x2F;github.com&#x2F;huggingface&#x2F;transformers&#x2F;issues&#x2F;16013</a><p>* A huge issue is this needs tuning??! But how about a well tuned AdamW? Eg see <a href="https:&#x2F;&#x2F;twitter.com&#x2F;kellerjordan0&#x2F;status&#x2F;1776716388037529843" rel="nofollow">https:&#x2F;&#x2F;twitter.com&#x2F;kellerjordan0&#x2F;status&#x2F;1776716388037529843</a> which outperformed it using a tuned SGD.<p>* I&#x27;m just a little bit reserved for now since the author themselves aren&#x27;t providing any transformer benchmarks, nor have they compared their CNN baselines to superconvergence, which is the goto standard for fast training for CNNs. Likewise <a href="https:&#x2F;&#x2F;parameterfree.com&#x2F;2023&#x2F;08&#x2F;30&#x2F;yet-another-icml-award-fiasco&#x2F;" rel="nofollow">https:&#x2F;&#x2F;parameterfree.com&#x2F;2023&#x2F;08&#x2F;30&#x2F;yet-another-icml-award-...</a> wasn&#x27;t pleasant.
评论 #39957701 未加载
rand0mwalkabout 1 year ago
Is there an accompanying paper out there?
评论 #39954630 未加载
评论 #39954552 未加载
评论 #39954722 未加载
ShamelessCabout 1 year ago
When starting out in deep learning, I just used a static learning rate with the Adam optimizer (no LR scheduler). Generally worked fine.
评论 #39957569 未加载
yinserabout 1 year ago
I’m continually impressed by Meta&#x2F;FAIR’s contributions to the open AI space. Never thought I’d say that
评论 #39956517 未加载
wwilimabout 1 year ago
By the title I thought this was for humans and I got excited for nothing