TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Show HN: I replicated Anthropic's monosemanticity research using just my MacBook

2 点作者 neitherboosh大约 1 年前
Hi everyone,<p>I&#x27;ve been working on an open-source implementation of Anthropic&#x27;s research on monosemanticity (&quot;Towards Monosemanticity&quot;). The problem Anthropic is trying to solve is that language models are hard to interpret because individual neurons can be responsible for multiple different things. The research finds that training a small autoencoder on neuron activations can result in &quot;features&quot; which are much easier to interpret.<p>When I was reading the original research, I got really excited when I realized that the models they used were really small, and I could probably train them from scratch with just my M3 MBP. My models are somewhat undertrained compared to what Anthropic produced, but I think my results are still very compelling. Let me know what you think!

暂无评论

暂无评论