TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Turns out the AI CUDA Engineer achieved 100x speedup by hacking the eval script

33 点作者 pr337h4m3 个月前

3 条评论

porridgeraisin3 个月前
Ah, reinforcement learning.<p>Edit: explaining it in text<p>they have run the equivalent of<p><pre><code> expected_output = torch.tril(torch.matmul(A,B)) ai_output = ai() </code></pre> In the nested torch expression above for the expected output, there is an intermediate value (the matmul). The backing memory for this is returned to torch after the entire expression is computed.<p>The model&#x27;s code which runs directly afer this then requested memory of the same shape (torch.like()). Torch has dutifully returned the block it just reclaimed (containing the expected output) without zeroing it. And so the model has the answer.<p>Pretty crazy that this was the code it converged on regardless of the fact that it invalidates the original claim.
jjk1663 个月前
Gaming a kpi to make it look like it accomplished a ton of work without doing anything? That&#x27;s not an AI CUDA engineer, that&#x27;s an AI middle manager!
justinclift3 个月前
Nitter mirror of this instead:<p><a href="https:&#x2F;&#x2F;nitter.lucabased.xyz&#x2F;miru_why&#x2F;status&#x2F;1892500715857473777?mx=2" rel="nofollow">https:&#x2F;&#x2F;nitter.lucabased.xyz&#x2F;miru_why&#x2F;status&#x2F;189250071585747...</a>