TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Kompute – Vulkan Alternative to CUDA

204 点作者 coffeeaddict110 个月前

10 条评论

Conscat10 个月前
Vulkan has some advantages to OpenCL. You gain lower level control over memory allocation and resource synchronization. Rocm has an infamous synchronization pessimization which doesn&#x27;t exist for Vulkan. You can even explicitly allocate Vulkan resources at specific memory addresses, which means Vulkan can easily be used for embedded devices.<p>But some of the caveats for compute applications are currently:<p>- No bfloat16 in shaders<p>- No shader work graphs (GPU-driven shader control flow)<p>- No inline PTX (inline GCN&#x2F;RDNA&#x2F;GEN is available)<p>These may or may not be important to you. Vulkan recently gained an ability to seamlessly dispatch CUDA kernels if you need these in some places, but there aren&#x27;t currently similar Vulkan extensions for HIP.
评论 #41014632 未加载
评论 #41015653 未加载
评论 #41014234 未加载
评论 #41017898 未加载
einpoklum10 个月前
This is _not_ an alternative to CUDA nor to OpenCL. It has some high-level and opinionated API [1], which covers a part (rather small part) of the API of each of those.<p>It may, _in principle_, have been developed - with much more work than has gone into it - into such an alternative; but I am actually not sure of that since I have poor command of Vulcan. I got suspicious being someone who maintains C++ API wrappers for CUDA myself [2], and know that just doing that is a lot more code and a lot more work.<p>[1] - I assume it is opinionated to cater to CNN simulation for large language models, and basically not much more.<p>[2] - <a href="https:&#x2F;&#x2F;github.com&#x2F;eyalroz&#x2F;cuda-api-wrappers&#x2F;">https:&#x2F;&#x2F;github.com&#x2F;eyalroz&#x2F;cuda-api-wrappers&#x2F;</a>
Remnant4410 个月前
This looks great - I&#x27;ve been looking for a sustainable, cross-platform-and-vendor GPU compute solution, and the alternatives are not really great. CUDA is nvidia only, Metal is apple only, etc etc. OpenCL has been the closest match but it seems like it&#x27;s on the way out.<p>Does anyone have real world experience using Vulkan compute shaders versus, say, OpenCL? Does Kompute make things as straightforward as it seems?
评论 #41015453 未加载
评论 #41015356 未加载
评论 #41015197 未加载
pjmlp10 个月前
Alternatives can only become one, if they support the same set of C, C++, Fortran, and PTX compiler backends, with similar level of IDE integration, grapical GPGPU debugging, and frameworks.<p>Until then they are wannabe alternatives, for a subset of use cases, with lesser tooling.<p>It always feels like those proposing CUDA alternatives don&#x27;t understand what they are trying to replace, and that is already the first error.
评论 #41015433 未加载
kcb10 个月前
A key component of CUDA is that the kernels are written in C&#x2F;C++ and not some shader language you would only be familiar with if you were into graphics.
评论 #41014708 未加载
评论 #41014084 未加载
评论 #41014140 未加载
JackYoustra10 个月前
Anyone have a comparison to something like wgsl&#x27;s compute shader mode over stuff like wgpu? I&#x27;ve never seriously written in either.
评论 #41015881 未加载
cowmix10 个月前
Pytorch alreadh has Vulkan support -- and Kompute does not support pytorch yet. That&#x27;s is going to show adaptation of this project.
评论 #41013374 未加载
axsaucedo10 个月前
Kompute author here - thank you very much for sharing our work!<p>If you are interested to learn more, do join the community through our discord here: <a href="https:&#x2F;&#x2F;discord.gg&#x2F;MaH5Jv5zwv" rel="nofollow">https:&#x2F;&#x2F;discord.gg&#x2F;MaH5Jv5zwv</a><p>For some background, this project started after seeing various renowned machine learning frameworks like Pytorch and Tensorflow integrating Vulkan as a backend. The Vulkan SDK offers a great low level interface that enables for highly specialized optimizations - however it comes at a cost of highly verbose code which requires 800-2000 lines of code to even begin writing application code. This has resulted in each of these projects having to implement the same baseline to abstract the non-compute related features of the Vulkan SDK.<p>This large amount of non-standardised boiler-plate can result in limited knowledge transfer, higher chance of unique framework implementation bugs being introduced, etc. We are aiming to address this with Kompute. As of today, we are now part of the Linux Foundation, and slowly contributing to the cross-vendor GPGPU revolution.<p>Some of the key features &#x2F; highlights of Kompute:<p>* C++ SDK with Flexible Python Package * BYOV: Bring-your-own-Vulkan design to play nice with existing Vulkan applications * Asynchronous &amp; parallel processing support through GPU family queues * Explicit relationships for GPU and host memory ownership and memory management: <a href="https:&#x2F;&#x2F;kompute.cc&#x2F;overview&#x2F;memory-management.html" rel="nofollow">https:&#x2F;&#x2F;kompute.cc&#x2F;overview&#x2F;memory-management.html</a> * Robust codebase with 90% unit test code coverage: <a href="https:&#x2F;&#x2F;kompute.cc&#x2F;codecov&#x2F;" rel="nofollow">https:&#x2F;&#x2F;kompute.cc&#x2F;codecov&#x2F;</a> * Mobile enabled via Android NDK across several architectures<p>Relevant blog posts:<p>Machine Learning: <a href="https:&#x2F;&#x2F;towardsdatascience.com&#x2F;machine-learning-and-data-processing-in-the-gpu-with-vulkan-kompute-c9350e5e5d3a" rel="nofollow">https:&#x2F;&#x2F;towardsdatascience.com&#x2F;machine-learning-and-data-pro...</a><p>Mobile development: <a href="https:&#x2F;&#x2F;towardsdatascience.com&#x2F;gpu-accelerated-machine-learning-in-your-mobile-applications-using-the-android-ndk-vulkan-kompute-1e9da37b7617" rel="nofollow">https:&#x2F;&#x2F;towardsdatascience.com&#x2F;gpu-accelerated-machine-learn...</a><p>Game development (we need to update to Godot4): <a href="https:&#x2F;&#x2F;towardsdatascience.com&#x2F;supercharging-game-development-with-gpu-accelerated-ml-using-vulkan-kompute-the-godot-game-engine-4e75a84ea9f0" rel="nofollow">https:&#x2F;&#x2F;towardsdatascience.com&#x2F;supercharging-game-developmen...</a>
EVa5I7bHFq9mnYK10 个月前
Can&#x27;t we make a chip that only does one thing: multiply and add a lot of 32x32 matrices in parallel? I think that would be enough for all AI needs and easy to program.
评论 #41017779 未加载
评论 #41018152 未加载
ein0p10 个月前
All you really need form these in Transfofmers-dominated 2024 are GEMM and GEMV, plus fused RMS norm and some element wise primitives to apply RoPE and residuals. And all of that must be brain dead easy to install and access, and it should be cross platform. And yet no such thing exists as far as I can tell.