TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Ask HN: Resources for GPU Compilers?

74 pointsby zvikinoza9 months ago
Hi folks, I&#x27;ve done some cpu compilers for x86, RISC-V, LLVM (mostly for fun and some for profit). I&#x27;m eager to learn more about GPU&#x2F;TPU compilers and have been looking into Triton and XLA. What resources would be useful to learn HPC and&#x2F;or gpu compilers in depth (or any adjacent areas)?<p>Any such books, courses etc will be much appreciated.

12 comments

zoenolan9 months ago
Newer editions of Computer Organization and Design: The Hardware Software Interface covers GPUs [1]<p>Multiflow still has some relevant ideas [2]<p>Programming on Parallel Machines: GPU, Multicore, Clusters and More. Gives you a look at some of the issues [3]<p>SPIRV-VM is a virtual machine for executing SPIR-V shaders [4]<p>NyuziRaster: Optimizing Rasterizer Performance and Energy in the Nyuzi Open Source GPU [5]<p>Ocelot is a modular dynamic compilation framework for heterogeneous systems, providing various backend targets for CUDA programs and analysis modules for the PTX virtual instruction set. [6]<p>glslang is the Khronos-reference front end for GLSL&#x2F;ESSL, partial front end for HLSL, and a SPIR-V generator.<p>[1]: <a href="https:&#x2F;&#x2F;www.goodreads.com&#x2F;book&#x2F;show&#x2F;83895.Computer_Organization_Design" rel="nofollow">https:&#x2F;&#x2F;www.goodreads.com&#x2F;book&#x2F;show&#x2F;83895.Computer_Organizat...</a><p>[2]: <a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Multiflow" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Multiflow</a><p>[3]: <a href="http:&#x2F;&#x2F;heather.cs.ucdavis.edu&#x2F;parprocbook" rel="nofollow">http:&#x2F;&#x2F;heather.cs.ucdavis.edu&#x2F;parprocbook</a><p>[4]: <a href="https:&#x2F;&#x2F;github.com&#x2F;dfranx&#x2F;SPIRV-VM">https:&#x2F;&#x2F;github.com&#x2F;dfranx&#x2F;SPIRV-VM</a><p>[5]: <a href="https:&#x2F;&#x2F;www.cs.binghamton.edu&#x2F;~millerti&#x2F;nyuziraster.pdf" rel="nofollow">https:&#x2F;&#x2F;www.cs.binghamton.edu&#x2F;~millerti&#x2F;nyuziraster.pdf</a><p>[6]:<a href="https:&#x2F;&#x2F;code.google.com&#x2F;archive&#x2F;p&#x2F;gpuocelot&#x2F;" rel="nofollow">https:&#x2F;&#x2F;code.google.com&#x2F;archive&#x2F;p&#x2F;gpuocelot&#x2F;</a><p>[7]: <a href="https:&#x2F;&#x2F;github.com&#x2F;KhronosGroup&#x2F;glslang">https:&#x2F;&#x2F;github.com&#x2F;KhronosGroup&#x2F;glslang</a>
评论 #41437716 未加载
评论 #41448154 未加载
hansvm9 months ago
If you&#x27;re already familiar with compilers in the abstract, start by implementing some high-level solutions leveraging a GPU, some low-level performance-optimized kernels, and build up a bit of intuition. The high-level code depends on your goals, but for low-level code maybe try optimizing something equivalent to a binary tree (both with leaves smaller and larger than 4kb), something benefiting from operator fusion (e.g., a matmul followed by an element-wise exponential), and something benefiting from deeply understanding the memory hierarchy (e.g., multiplying two very large square matrices, and also inner&#x2F;outer producting two very narrow matrices).<p>From there, hopefully you&#x27;ll have the intuition to actually evaluate whether a given resource being recommended here is any good.
koolala9 months ago
Here is a Phd thesis for a GPU programming language that compiles on the gpu itself:<p><a href="https:&#x2F;&#x2F;scholarworks.iu.edu&#x2F;dspace&#x2F;items&#x2F;3ab772c9-92c9-4f59-bd95-40aff99e8c7a" rel="nofollow">https:&#x2F;&#x2F;scholarworks.iu.edu&#x2F;dspace&#x2F;items&#x2F;3ab772c9-92c9-4f59-...</a>
评论 #41464356 未加载
mbel9 months ago
For Intel you can just look into sources: <a href="https:&#x2F;&#x2F;github.com&#x2F;intel&#x2F;intel-graphics-compiler">https:&#x2F;&#x2F;github.com&#x2F;intel&#x2F;intel-graphics-compiler</a> ;) Also AMDGPU and NVPTX targets in LLVM might be interesting.
JonChesterfield9 months ago
Fair warning that the dominant model in GPU compilers is to use an ISA that looks like the GPUs did long ago. Expect to see i32 used to represent a machine vector of ints where you might reasonably expect &lt;32 x i32&gt;. There are actual i32 scalar registers as well, which are cheap to branch on, that are also represented in IR as i32. That is, the same as the vector. There are intrinsics that sometimes distinguish.<p>This makes a rather spectacular mess of the tooling. Instead of localising the cuda semantics in clang, we scatter it throughout the entire compiler pipeline, where it does especially nasty things to register allocation and generally obstructs non-cuda programming models. It&#x27;s remarkably difficult to persuade GPU people that this is a bad thing.<p>Also the GPU programming languages use very large compiler runtimes to do a degree of papering over the CPU-host GPU-target assumption that also dates from long ago, so expect to find a lot of complexity in multiple libraries acting somewhat like compiler-rt. Those are optional in reality but the compiler usually emits a lot of symbols that resolve to various vendor libraries.
sanxiyn9 months ago
The Deep Learning Compiler: A Comprehensive Survey (2020) is in fact a comprehensive-as-of-2020 but now somewhat dated survey of the field. Unfortunately, I am not aware of better overall survey more up to date.<p><a href="https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2002.03794" rel="nofollow">https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2002.03794</a>
adamnemecek9 months ago
<a href="https:&#x2F;&#x2F;futhark-lang.org&#x2F;" rel="nofollow">https:&#x2F;&#x2F;futhark-lang.org&#x2F;</a>
danielt39 months ago
<a href="https:&#x2F;&#x2F;theartofhpc.com&#x2F;" rel="nofollow">https:&#x2F;&#x2F;theartofhpc.com&#x2F;</a><p>Great book series on the subject of HPC. Not sure if it actually touches GPU. Great material anyway. BONUS: it&#x27;s free!
评论 #41448079 未加载
surfingdino9 months ago
<a href="https:&#x2F;&#x2F;docs.nvidia.com&#x2F;cuda&#x2F;cuda-compiler-driver-nvcc&#x2F;index.html" rel="nofollow">https:&#x2F;&#x2F;docs.nvidia.com&#x2F;cuda&#x2F;cuda-compiler-driver-nvcc&#x2F;index...</a>
abstractcontrol9 months ago
Staged FP in Spiral: <a href="https:&#x2F;&#x2F;www.youtube.com&#x2F;playlist?list=PL04PGV4cTuIVP50-B_1scXUUMn8qEBbSs" rel="nofollow">https:&#x2F;&#x2F;www.youtube.com&#x2F;playlist?list=PL04PGV4cTuIVP50-B_1sc...</a><p>Some of the stuff in this playlist might be relevant to you, though it is mostly about programming GPUs in a functional language that compiles to Cuda. The author (me) sometimes works on the language during the video, either fixing bugs or adding new features.
aportnoy9 months ago
For NVIDIA,<p>1. play around with the NVPTX LLVM backend and&#x2F;or try compiling CUDA with Clang,<p>2. get familiar with the PTX ISA,<p>3. play around with ptxas + nvdisasm.
raphlinus9 months ago
Faith Ekstrand has an impressive track record of compiler work and has written a few blog posts[1], [1a]. Her Mastodon[2] is also worth a follow.<p>SPIR-V is important in the compute shader space, especially because DXIL and Metal&#x27;s AIR are similar. I&#x27;m going to link three articles critical of SPIR-V: [3], [4], [5].<p>WebGPU [6] is interesting for a number of reasons, largely because they&#x27;re trying to actually nail down the semantics, and also make it safe (see the uniformity analysis [7] in particular, which is a very &quot;compiler&quot; approach to a GPU-specific problem). Both Tint and naga projects are open source, with lots of high quality discussion in the issue trackers.<p>Shader languages suck, and we really need a good one. Promising approaches are Circle [8] (which is C++ based, very advanced but not open source), and Slang [9] (an evolution of HLSL). The Vcc work (also related to [4]) is worth studying.<p>Best of luck! This is a fascinating, if frustrating, space, and there&#x27;s lots of room to improve things.<p>[1]: <a href="https:&#x2F;&#x2F;www.gfxstrand.net&#x2F;faith&#x2F;blog&#x2F;" rel="nofollow">https:&#x2F;&#x2F;www.gfxstrand.net&#x2F;faith&#x2F;blog&#x2F;</a><p>[1a]: <a href="https:&#x2F;&#x2F;www.collabora.com&#x2F;news-and-blog&#x2F;blog&#x2F;2024&#x2F;04&#x2F;25&#x2F;re-converging-control-flow-on-nvidia-gpus&#x2F;" rel="nofollow">https:&#x2F;&#x2F;www.collabora.com&#x2F;news-and-blog&#x2F;blog&#x2F;2024&#x2F;04&#x2F;25&#x2F;re-c...</a><p>[2]: <a href="https:&#x2F;&#x2F;mastodon.gamedev.place&#x2F;@gfxstrand" rel="nofollow">https:&#x2F;&#x2F;mastodon.gamedev.place&#x2F;@gfxstrand</a><p>[3]: <a href="https:&#x2F;&#x2F;kvark.github.io&#x2F;spirv&#x2F;2021&#x2F;05&#x2F;01&#x2F;spirv-horrors.html" rel="nofollow">https:&#x2F;&#x2F;kvark.github.io&#x2F;spirv&#x2F;2021&#x2F;05&#x2F;01&#x2F;spirv-horrors.html</a><p>[4]: <a href="https:&#x2F;&#x2F;xol.io&#x2F;blah&#x2F;the-trouble-with-spirv&#x2F;" rel="nofollow">https:&#x2F;&#x2F;xol.io&#x2F;blah&#x2F;the-trouble-with-spirv&#x2F;</a><p>[5]: <a href="https:&#x2F;&#x2F;themaister.net&#x2F;blog&#x2F;2022&#x2F;08&#x2F;21&#x2F;my-personal-hell-of-translating-dxil-to-spir-v-finale&#x2F;" rel="nofollow">https:&#x2F;&#x2F;themaister.net&#x2F;blog&#x2F;2022&#x2F;08&#x2F;21&#x2F;my-personal-hell-of-t...</a><p>[6]: <a href="https:&#x2F;&#x2F;github.com&#x2F;gpuweb&#x2F;gpuweb">https:&#x2F;&#x2F;github.com&#x2F;gpuweb&#x2F;gpuweb</a><p>[7]: <a href="https:&#x2F;&#x2F;www.w3.org&#x2F;TR&#x2F;2022&#x2F;WD-WGSL-20220505&#x2F;#uniformity-overview" rel="nofollow">https:&#x2F;&#x2F;www.w3.org&#x2F;TR&#x2F;2022&#x2F;WD-WGSL-20220505&#x2F;#uniformity-over...</a><p>[8]: <a href="https:&#x2F;&#x2F;www.circle-lang.org&#x2F;site&#x2F;index.html" rel="nofollow">https:&#x2F;&#x2F;www.circle-lang.org&#x2F;site&#x2F;index.html</a><p>[9]: <a href="https:&#x2F;&#x2F;github.com&#x2F;shader-slang&#x2F;slang">https:&#x2F;&#x2F;github.com&#x2F;shader-slang&#x2F;slang</a>
评论 #41465209 未加载