TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

LLMs Powered by Kolmogorov-Arnold Networks

6 点作者 adityang5大约 1 年前
Seeing as the authors claim that KANs are able to reduce the issues of catastrophic forgetting that we see in MLPs, I thought &quot;Wouldn&#x27;t it be nice if there was an LLM that substituted MLPs with KANs?&quot;. I looked around and didn&#x27;t find one, so I built one!<p>- PyTorch Module of the kan_gpt<p>- Deployed to PyPi<p>- MIT Licence<p>- Test Cases to ensure forward-backward passes work as expected<p>- Training script<p>I am currently working on training it on the WebText dataset to compare it to the original gpt2. Facing a few out-of-memory issues at the moment. Perhaps the vocab size (50257) is too large?<p>I&#x27;m open to contributions and would love to hear your thoughts!

1 comment

p1esk大约 1 年前
Why don’t you test it first with a small model on mnist or cifar?