TE
科技回声
首页
24小时热榜
最新
最佳
问答
展示
工作
中文
GitHub
Twitter
首页
Sparse Llama: 70% Smaller, 3x Faster, Full Accuracy
40 点
作者
panabee
12 个月前
1 comment
free_bip
12 个月前
Specifically this is Llama2, not Llama3, was a bit disappointed from that. Also wasn't totally clear from the article - will this actually increase GPU inference speed / decrease GPU memory usage?