Kolmogorov-Arnold Networks

568 点作者 sumo43大约 1 年前

33 条评论

GistNoesis大约 1 年前

I quickly skimmed the paper, got inspired to simplify it, and created some Pytorch Layer :<a href="https://github.com/GistNoesis/FourierKAN/">https://github.com/GistNoesis/FourierKAN/</a>The core is really just a few lines.In the paper they use some spline interpolation to represent 1d function that they sum. Their code seemed aimed at smaller sizes. Instead I chose a different representation, aka fourier coefficients that are used to interpolate the functions of individual coordinates.It should give an idea of Kolmogorov-Arnold networks representation power, it should probably converge easier than their spline version but spline version have less operations.Of course, if my code doesn't work, it doesn't mean theirs doesn't.Feel free to experiment and publish paper if you want.

评论 #40235456 未加载

评论 #40247231 未加载

评论 #40233557 未加载

评论 #40228108 未加载

评论 #40224332 未加载

krasin大约 1 年前

I've spent some time playing with their Jupyter notebooks. The most useful (to me, anyway) is their Example_3_classfication.ipynb ([1]).It works as advertised with the parameters selected by the authors, but if we modified the network shape in the second half of the tutorial (Classification formulation) from (2, 2) to (2, 2, 2), it fails to generalize. The training loss gets down to 1e-9, while test loss stays around 3e-1. Getting to larger network sizes does not help either.I would really like to see a bigger example with many more parameters and more data complexity and if it could be trained at all. MNIST would be a good start.Update: I increased the training dataset size 100x, and that helps with the overfitting, but now I can't get training loss below 1e-2. Still iterating on it; a GPU acceleration would really help - right now, my progress is limited by the speed of my CPU.1. <a href="https://github.com/KindXiaoming/pykan/blob/master/tutorials/Example_3_classfication.ipynb">https://github.com/KindXiaoming/pykan/blob/master/tutorials/...</a>

评论 #40220925 未加载

评论 #40230067 未加载

评论 #40233633 未加载

esafak大约 1 年前

There exists a Kolmogorov-Arnold inspired model in classical statistics called GAMs (<a href="https://en.wikipedia.org/wiki/Generalized_additive_model" rel="nofollow">https://en.wikipedia.org/wiki/Generalized_additive_model</a>), developed by Hastie and Tibshirani as an extension of GLMs (<a href="https://en.wikipedia.org/wiki/Generalized_linear_model" rel="nofollow">https://en.wikipedia.org/wiki/Generalized_linear_model</a>).GLMs in turn generalize logistic-, linear and other popular regression models.Neural GAMs with learned basis functions have already been proposed, so I'm a bit surprised that the prior art is not mentioned in this new paper. Previous applications focused more on interpretability.

评论 #40264761 未加载

montebicyclelo大约 1 年前

The success we're seeing with neural networks is tightly coupled with the ability to scale - the algorithm itself works at scale (more layers), but it also scales well with hardware, (neural nets mostly consist of matrix multiplications, and GPUs have specialised matrix multiplication acceleration) - one of the most impactful neural network papers, AlexNet, was impactful because it showed that NNs could be put on the GPU, scaled and accelerated, to great effect.It's not clear from the paper how well this algorithm will scale, both in terms of the algorithm itself (does it still train well with more layers?), and ability to make use of hardware acceleration, (e.g. it's not clear to me that the structure, with its per-weight activation functions, can make use of fast matmul acceleration).It's an interesting idea, that seems to work well and have nice properties on a smaller scale; but whether it's a good architecture for imagenet, LLMs, etc. is not clear at this stage.

评论 #40220823 未加载

cs702大约 1 年前

It's so refreshing to come across new AI research different from the usual "we modified a transformer in this and that way and got slightly better results on this and that benchmark." All those new papers proposing incremental improvements are important, but... everyone is getting a bit tired of them. Also, anecdotal evidence and recent work suggest we're starting to run into fundamental limits inherent to transformers, so we may well need new alternatives.[a]The best thing about this new work is that it's not an either/or proposition. The proposed "learnable spline interpolations as activation functions" can be used in conventional DNNs, to improve their expressivity. Now we just have to test the stuff to see if it really works better.Very nice. Thank you for sharing this work here!---[a] <a href="https://news.ycombinator.com/item?id=40179232">https://news.ycombinator.com/item?id=40179232</a>

评论 #40225436 未加载

评论 #40225676 未加载

评论 #40223792 未加载

mxwsn大约 1 年前

From the preprint - 100 input dimensions is considered "high", and most problems considered have 5 or fewer input dimensions. This is typical of physics-inspired settings I've seen considered in ML. The next step would be demonstrating them on MNIST, which, at 784 dimensions is tiny by modern standards.

评论 #40221265 未加载

ubj大约 1 年前

Very interesting! Kolmogorov neutral networks can represent discontinuous functions [1], but I've wondered about how practically applicable they are. This repo seems to show that they have some use after all.[1]: <a href="https://arxiv.org/abs/2311.00049" rel="nofollow">https://arxiv.org/abs/2311.00049</a>

评论 #40231434 未加载

reynoldss大约 1 年前

Perhaps a hasty comment but linear combinations of B-splines are yet another (higher-degree) B-spline. Isn't this simply fitting high degree B-splines to functions?

评论 #40225550 未加载

Lichtso大约 1 年前

1. Interestingly the foundations of this approach and MLP were invented / discovered around the same time about 66 years ago:1957: <a href="https://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Arnold_representation_theorem" rel="nofollow">https://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Arnold_repr...</a>1958: <a href="https://en.wikipedia.org/wiki/Multilayer_perceptron" rel="nofollow">https://en.wikipedia.org/wiki/Multilayer_perceptron</a>2. Another advantage of this approach is that it has only one class of parameters (the coefficients of the local activation functions) as opposed to MLP which has three classes of parameters (weights, biases, and the globally uniform activation function).3. Everybody is talking transformers. I want to see diffusion models with this approach.

评论 #40228022 未加载

评论 #40229957 未加载

评论 #40228750 未加载

评论 #40228431 未加载

cbsmith大约 1 年前

Feels like someone stuffed splines into decision trees.

评论 #40221182 未加载

评论 #40222089 未加载

adityang5大约 1 年前

Very cool stuff! Exciting to see so many people sharing their works on KANs. Seeing as the authors claim that KANs are able to reduce the issues of catastrophic forgetting that we see in MLPs, I thought "Wouldn't it be nice if there was an LLM that substituted MLPs with KANs?". I looked around and didn't find one, so I built one!- PyTorch Module of the KAN GPT- Deployed to PyPi- MIT Licence- Test Cases to ensure forward-backward passes work as expected- Training scriptI am currently working on training it on the WebText dataset to compare it to the original gpt2. Facing a few out-of-memory issues at the moment. Perhaps the vocab size (50257) is too large?I'm open to contributions and would love to hear your thoughts!<a href="https://github.com/AdityaNG/kan-gpt">https://github.com/AdityaNG/kan-gpt</a><a href="https://pypi.org/project/kan-gpt/" rel="nofollow">https://pypi.org/project/kan-gpt/</a>

cloudhan大约 1 年前

This reminds me of Weight Agnostic Neural Networks <a href="https://weightagnostic.github.io/" rel="nofollow">https://weightagnostic.github.io/</a>

apolar大约 1 年前

Article 2021: <a href="https://www.sciencedirect.com/science/article/abs/pii/S0952197620303742" rel="nofollow">https://www.sciencedirect.com/science/article/abs/pii/S09521...</a>Seminar 2021: <a href="https://warwick.ac.uk/fac/sci/maths/research/events/seminars/areas/applmath/2020-21/#WEEK1" rel="nofollow">https://warwick.ac.uk/fac/sci/maths/research/events/seminars...</a>Article in archive 2023: <a href="https://arxiv.org/abs/2305.08194" rel="nofollow">https://arxiv.org/abs/2305.08194</a>Video 2021: <a href="https://www.youtube.com/watch?v=eS_k6L638k0" rel="nofollow">https://www.youtube.com/watch?v=eS_k6L638k0</a>Extension to stochastic models where KAN builds the distribution 2023: <a href="https://www.youtube.com/watch?v=0hhJIpzxPR0" rel="nofollow">https://www.youtube.com/watch?v=0hhJIpzxPR0</a>

评论 #40266162 未加载

yobbo大约 1 年前

<a href="https://kindxiaoming.github.io/pykan/intro.html" rel="nofollow">https://kindxiaoming.github.io/pykan/intro.html</a>At the end of this example, they recover the symbolic formula that generated their training set: exp(x₂² + sin(3.14x₁)).It's like a computation graph with a library of "activation functions" that is optimised, and then pruned. You can recover good symbolic formulas from the pruned graph.Maybe not meaningful for MNIST.

评论 #40225764 未加载

diwank大约 1 年前

It’d be really cool to see a transformer with the MLP layers swapped for KANs and then compare its scaling properties with vanilla transformers

评论 #40224506 未加载

评论 #40219995 未加载

评论 #40220987 未加载

SpaceManNabs大约 1 年前

How does back propagation work now? Do these suffer from vanishing or exploding gradients?

评论 #40226155 未加载

评论 #40227168 未加载

phpkar大约 1 年前

<a href="https://arxiv.org/abs/2404.05903" rel="nofollow">https://arxiv.org/abs/2404.05903</a>

ALittleLight大约 1 年前

I can't assess this, but I do worry that overnight some algorithmic advance will enhance LLMs by orders of magnitude and the next big model to get trained is suddenly 10,000x better than GPT-4 and nobody's ready for it.

评论 #40220191 未加载

评论 #40226834 未加载

评论 #40220169 未加载

erwincoumans大约 1 年前

If you like this, you may also like this 2019 research paper: "Deep networks and the Kolmogorov–Arnold theorem" <a href="https://hadrien-montanelli.github.io/2019-06-25.html" rel="nofollow">https://hadrien-montanelli.github.io/2019-06-25.html</a> <a href="https://arxiv.org/pdf/1906.11945" rel="nofollow">https://arxiv.org/pdf/1906.11945</a>

mipt98大约 1 年前

A more elaborate implementation of this was published years ago, and it wasn't the very first one <a href="https://www.science.org/doi/10.1126/science.1165893" rel="nofollow">https://www.science.org/doi/10.1126/science.1165893</a>

评论 #40258342 未加载

kevmo314大约 1 年前

This seems very similar in concept to the finite element method. Nice to see patterns across fields like that.

syassami大约 1 年前

Nice implementation I've been playing with, <a href="https://github.com/Blealtan/efficient-kan">https://github.com/Blealtan/efficient-kan</a> alongside @GistNoesis's.

Maro大约 1 年前

Interesting!Would this approach (with non-linear learning) still be able to utilize GPUs to speed up training?

评论 #40219635 未加载

coderenegade大约 1 年前

I was under the impression that graph neural nets already trained learnable functions on graph edges rather than nodes, albeit typically on a fully connected graph. Is there any comparison to just a basic GNN here?

评论 #40233625 未加载

renonce大约 1 年前

So a new type of neural network that has been proven to work well on regression tasks common in physics? And tested in practice to fit well on elementary algebra and compositions of complex functions. But no evidence at all if it works on the most basic machine learning tasks like MNIST, not to mention language models.I mean it's great but at the current state it seems better suited for tasks where an explicit formula exists (though not known) and the goal is to predict it on unknown points (and be able to interpret the formula as a side effect). Deep learning tasks are more of a statistical nature (think models with a cross entropy loss - it's statistically predicting the frequency of different choices of the class/next token), it requires a specialized training procedure and it is designed to fit 100% rather than somewhat close (think linear algebra - it won't be good at it). It would very likely take a radically different idea to apply it to deep learning tasks. The recently updated "Author's note" also mentions this: "KANs are designed for applications where one cares about high accuracy and/or interpretability."It's great but let's be patient before we see this improve LLM accuracy or be used elsewhere.

nico大约 1 年前

Looks super interestingI wonder how many more new architectures are going to be found in the next few years

ComplexSystems大约 1 年前

Very interesting! Could existing MLP-style neural networks be put into this form?

nu91大约 1 年前

I am curious to know if this type of network can help with causal inference.

评论 #40238308 未加载

评论 #40268721 未加载

brrrrrm大约 1 年前

doesn't KA representation require continuous univariate functions? do B-splines actually cover the space of all continuous functions? wouldn't... MLPs be better for the learnable activation functions?

评论 #40247087 未加载

arianvanp大约 1 年前

This really reminds me of petrinets but an analog version? But instead of places and discrete tokens we have activation functions and signals. You can only trigger a transition if an activation function (place) has the right signal (tokens).

keynesyoudigit大约 1 年前

Eli5: why aren't these more popular and broadly used?

评论 #40225055 未加载

yza大约 1 年前

Bayesian KANs, KAN Transformers and KAN VAE's in 3.2...

WithinReason大约 1 年前

Looks very interesting, but my guess would be that this would run into the problem of exploding/vanishing gradients at larger depths, just like TanH or sigmoid networks do.