<a href="http://cjcm.ijournals.net.cn/jslxxb/ch/reader/view_abstract.aspx?flag=2&file_no=202410220000002&journal_id=jslxxb" rel="nofollow">http://cjcm.ijournals.net.cn/jslxxb/ch/reader/view_abstract....</a><p>Appears to be this, though different title:
<a href="https://www.sciencedirect.com/science/article/pii/S0955799725000219" rel="nofollow">https://www.sciencedirect.com/science/article/pii/S095579972...</a>
<a href="https://archive.ph/Dy9An" rel="nofollow">https://archive.ph/Dy9An</a><p>I wonder if this is also a CUDA-bypass, PTX optimization that led to the 10x performance gain by Deepseek: <a href="https://xyzlabs.substack.com/p/deepseeks-latest-shocker-who-needs" rel="nofollow">https://xyzlabs.substack.com/p/deepseeks-latest-shocker-who-...</a>