TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

ReLU Strikes Back: Exploiting Activation Sparsity in Large Language Models(2023)

2 点作者 martinloretz3 个月前

1 comment

martinloretz3 个月前
I think this paper is the key to the next speedup in local LLM inference. By making the model sparse (using the ReLU activation), we can save around 80% of memory accesses and computations of the Feed Forward Layers. ReLU sets the output of a layer to 0 when it&#x27;s negative, and since any number multiplied by zero is zero, the next layer doesn&#x27;t need to load the rows of the weight matrix that would be zero after the multiplication.<p>Unfortunately there aren&#x27;t a lot of models currently trained with ReLU activation.