TE
科技回声
首页
24小时热榜
最新
最佳
问答
展示
工作
中文
GitHub
Twitter
首页
Show HN: FP32 matmul of large matrices up to 24% faster than cuBLAS on a 4090
4 点
作者
ap4
10 个月前
I decided to share a CUDA kernel I wrote over 5 months ago. Nvidia's hardware and software may surprise you.
2 条评论
thebuilderjr
10 个月前
Collapse
Wow this is a surprising result. Does this reproduce on other GPUs or just the 4090?
评论 #41122666 未加载
zorgmonkey
10 个月前
Collapse
The github repo doesn't seem to be accessible anymore, it is giving a 404 error.
评论 #41132805 未加载