TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

An Introduction to GPU Programming in Julia

290 pointsby simondanischover 6 years ago

8 comments

maxbrunsfeldover 6 years ago
&gt; GPUArrays never had to implement automatic differentiation explicitly to support the backward pass of the neuronal network efficiently. This is because Julia&#x27;s automatic differentiation libraries work for arbitrary functions and emit code that can run efficiently on the GPU. This helps a lot to get Flux working on the GPU with minimal developer effort - and makes Flux GPU support work efficiently even for user defined functions. That this works out of the box without coordination between GPUArrays + Flux is a pretty unique property of Julia<p>Every time I read about Julia, I’m amazed. What a game changing tool.
daenzover 6 years ago
GPGPU (general purpose gpu) programming is pretty cool. I wrote a utility to let you do it in javascript, in the browser, awhile back <a href="https:&#x2F;&#x2F;github.com&#x2F;amoffat&#x2F;gpgpu.js" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;amoffat&#x2F;gpgpu.js</a><p>The thing to note about GPU programming is that the vast majority of overhead comes from data transfer. Sometimes, it is net faster to do the computation on the CPU, if your data set and data results are very large, even if the GPU performs each calculations faster on average due to parallelism. To illustrate, look at the benchmarks on gpgpu.js running a simple kernel:<p><pre><code> CPU: 6851.25ms GPU Total: 1449.29ms GPU Execution: 30.64ms GPU IO: 1418.65ms Theoretical Speedup: 223.59x Actual Speedup: 4.73x </code></pre> The theoretical speedup excludes data transfer while actual speedup includes it. The longer you can keep your data set on the GPU to do more calculations (avoiding back and forth IO), the bigger your net speed gains are.
评论 #18252376 未加载
评论 #18252341 未加载
Athasover 6 years ago
I&#x27;m a bit surprised to see that GPU Mandelbrot is only at best x75 faster than (sequential?) CPU. Does Julia just generate <i>really fast</i> (multicore&#x2F;vectorized?) CPU code? Does it also count communication costs? Fractal computations like that are extremely GPU friendly because they involve no memory accesses at all, except for writing the final result. I would expect at least two orders of magnitude improvement over a straightforwardly written C implementation.
评论 #18251120 未加载
评论 #18250391 未加载
currymjover 6 years ago
While having a Torch-esque GPU ndarray is great, the ability to easily write your own kernels without having to compile gnarly C++ code is what sets Julia apart from competitors IMO. Not sure if there&#x27;s any other dynamic language offering anything like this.
评论 #18252377 未加载
评论 #18251898 未加载
评论 #18251827 未加载
pjmlpover 6 years ago
Love it! So much more fun than being stuck with C derived languages for GPGPU programming.
评论 #18251466 未加载
eigenspaceover 6 years ago
It seems kinda weird to tout how great it is that we have CuArrays and CLArrays when CLArrays haven&#x27;t been updated for 1.0 and only claims experimental support for 0.6.<p>Really hoping we see some movement on CLArrays in the near future.
评论 #18250557 未加载
评论 #18250505 未加载
评论 #18251685 未加载
eghadover 6 years ago
If anyone wants to try out a free GPU using Google Colab&#x2F;Jupyter (K80, you might run into ram allocation issues if you&#x27;re not one of the lucky users who get to use the full amount) here&#x27;s a quick guide to get a Julia kernel up and running: <a href="https:&#x2F;&#x2F;discourse.julialang.org&#x2F;t&#x2F;julia-on-google-colab-free-gpu-accelerated-shareable-notebooks&#x2F;15319" rel="nofollow">https:&#x2F;&#x2F;discourse.julialang.org&#x2F;t&#x2F;julia-on-google-colab-free...</a>
评论 #18257113 未加载
IshKebabover 6 years ago
It doesn&#x27;t really describe the fundamental difference between a GPU and a 4000-core CPU, which is that the GPU has a <i>shared program counter</i>. All the cores must execute the same instruction at each cycle.
评论 #18251434 未加载
评论 #18251939 未加载
评论 #18251340 未加载
评论 #18254211 未加载