TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Trying Kolmogorov-Arnold Networks in Practice

146 pointsby Ameo11 months ago

8 comments

thesz11 months ago
The original paper [1] used LBFGS [2], it is quasi-second-order optimization algorithm.<p><pre><code> [1] https:&#x2F;&#x2F;arxiv.org&#x2F;pdf&#x2F;2404.19756 - &quot;Both MLPs and KANs are trained with LBFGS for 1800 steps in total.&quot; [2] https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Limited-memory_BFGS </code></pre> (Quasi-)Newton methods approximate learning rate using local curvature which gradient-based methods do not do.<p>The post relies on Tinygrad because it is familiar to author and author tinkers with batch size and learning rate, but not with optimizer itself.<p>I think that even line search for minimum on the direction of the batch gradient can provide most of the benefits of LBFGS. It is easy to implement.
评论 #40857317 未加载
评论 #40857004 未加载
dahart11 months ago
&gt; And here&#x27;s a neural network&#x2F;multi-layer perceptron with the same number of layers and nodes: One big difference to note is that there are far fewer connections between nodes in KANs compared to neural networks&#x2F;MLPs.<p>I think it’s probably worth clarifying a little here that a Bspline is essentially a little MLP, where, at least for uniform Bsplines, the depth is equal to the polynomial degree of the spline. (That’s also the width of the first layer.)<p>So those two network diagrams are only superficially similar, but the KAN is actually a much bigger network if degree &gt; 1 for the splines. I wonder if that contributed to the difficulty of training it. It is possible some of the “code smell” you noticed and got rid of is relatively important for achieving good results. I’d guess the processes for normalizing inputs and layers of a KAN need to be a bit different than for standard nets.
gwern11 months ago
Web design note for OP: you designed your site for dark-mode, and your initial SVGs are correct, but then it clashes with your graphs which are all light-mode. You can invert them in CSS, and they&#x27;ll look a lot better.<p>And you can choose which ones to invert <i>automatically</i> using the free+Free <a href="https:&#x2F;&#x2F;invertornot.com&#x2F;" rel="nofollow">https:&#x2F;&#x2F;invertornot.com&#x2F;</a> API - IoN will correctly return that eg <a href="https:&#x2F;&#x2F;i.ameo.link&#x2F;caa.png" rel="nofollow">https:&#x2F;&#x2F;i.ameo.link&#x2F;caa.png</a> (and the other two) should be inverted.
评论 #40860043 未加载
anonymousDan11 months ago
&#x27;hacky feeling techniques&#x27; - as opposed to the rest of DNN research??! More seriously, I wonder if some kind of hybrid approach could be possible&#x2F;beneficial (e.g. KANs for some layers?)
评论 #40860635 未加载
slashdave11 months ago
It is important to remember that the optimizers used in mainstream deep learning models have all been developed and fine tuned to work well with classic NN architectures. There is no such thing as a generic optimization algorithm.
davesque11 months ago
KANs seem like a great tool for the right job. However, based on my understanding of how they work, my intuitions tell me that they would be awful at image processing, which I think was one of the author&#x27;s test beds.
评论 #40859825 未加载
imalexsk11 months ago
the loss value he mentions for the KAN is ~1&#x2F;5 (0.00011) of the NN loss (0.006). Could be a massive difference in an actual task with larger&#x2F;complex datasets.
abetusk11 months ago
Here are what I think are the main conclusions of the article:<p>&quot;&quot;&quot; ... the most significant factor controlling performance is just parameter count. &quot;&quot;&quot;<p>&quot;&quot;&quot; No matter what I did, the most simple neural network was still outperforming the fanciest KAN-based model I tried. &quot;&quot;&quot;<p>I suspected this was the case when I first heard about KANs. Its nice to see someone diving into a bit more, even if it is just anecdotal.
评论 #40856499 未加载