TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

AMD GPU Inference

270 pointsby fazkan8 months ago

23 comments

lhl7 months ago
For inference, if you have a supported card (or probably architecture if you are on Linux and can use HSA_OVERRIDE_GFX_VERSION), then you can probably run anything with (upstream) PyTorch and transformers. Also, compiling llama.cpp is has been pretty trouble-free for me for at least a year.<p>(If you are on Windows, there is usually a win-hip binary of llama.cpp in the project&#x27;s releases or if things totally refuse to work, you can use the Vulkan build as a (less performant) fallback).<p>Having more options can&#x27;t hurt, but ROCm 5.4.2 is almost 2 years old, and things have come a long way since then, so I&#x27;m curious about this being published freshly today, in October 2024.<p>BTW, I recently went through and updated my compatibility doc (focused on RDNA3) w&#x2F; ROCm 6.2 for those interested. A lot has changed just in the past few months (upstream bitsandbytes, upstream xformers, and Triton-based Flash Attention): <a href="https:&#x2F;&#x2F;llm-tracker.info&#x2F;howto&#x2F;AMD-GPUs" rel="nofollow">https:&#x2F;&#x2F;llm-tracker.info&#x2F;howto&#x2F;AMD-GPUs</a>
评论 #41723493 未加载
评论 #41725458 未加载
评论 #41723714 未加载
tcdent7 months ago
The rise of generated slop ml libraries is staggering.<p>This library is 50% print statements. And where it does branch, it doesn&#x27;t even need to.<p>Defines two environment variables and sets two flags on torch.
评论 #41727704 未加载
评论 #41730343 未加载
评论 #41734030 未加载
a21288 months ago
It seems to use an old, 2 year old version of ROCm (5.4.2) which I&#x27;m doubtful would support my RX 7900 XTX. I personally found it easiest to just use the latest `rocm&#x2F;pytorch` image and run what I need from there
评论 #41725594 未加载
slavik817 months ago
On Ubuntu 24.04 (and Debian Unstable¹), the OS-provided packages should be able to get llama.cpp running on ROCm on just about any discrete AMD GPU from Vega onwards²³⁴. No docker or HSA_OVERRIDE_GFX_VERSION required. The performance might not be ideal in every case⁵, but I&#x27;ve tested a wide variety of cards:<p><pre><code> # install dependencies sudo apt -y update sudo apt -y upgrade sudo apt -y install git wget hipcc libhipblas-dev librocblas-dev cmake build-essential # ensure you have permissions by adding yourself to the video and render groups sudo usermod -aG video,render $USER # log out and then log back in to apply the group changes # you can run `rocminfo` and look for your GPU in the output to check everything is working thus far # download a model, build llama.cpp, and run it wget https:&#x2F;&#x2F;huggingface.co&#x2F;TheBloke&#x2F;dolphin-2.2.1-mistral-7B-GGUF&#x2F;resolve&#x2F;main&#x2F;dolphin-2.2.1-mistral-7b.Q5_K_M.gguf?download=true -O dolphin-2.2.1-mistral-7b.Q5_K_M.gguf git clone https:&#x2F;&#x2F;github.com&#x2F;ggerganov&#x2F;llama.cpp.git cd llama.cpp git checkout b3267 HIPCXX=clang-17 cmake -H. -Bbuild -DGGML_HIPBLAS=ON -DCMAKE_HIP_ARCHITECTURES=&quot;gfx803;gfx900;gfx906;gfx908;gfx90a;gfx1010;gfx1030;gfx1100;gfx1101;gfx1102&quot; -DCMAKE_BUILD_TYPE=Release make -j16 -C build build&#x2F;bin&#x2F;llama-cli -ngl 32 --color -c 2048 --temp 0.7 --repeat_penalty 1.1 -n -1 -m ..&#x2F;dolphin-2.2.1-mistral-7b.Q5_K_M.gguf --prompt &quot;Once upon a time&quot; </code></pre> I&#x27;d suggest RDNA 3, MI200 and MI300 users should probably use the AMD-provided ROCm packages for improved performance. Users that need PyTorch should also use the AMD-provided ROCm packages, as PyTorch has some dependencies that are not available from the system packages. Still, you can&#x27;t beat the ease of installation or the compatibility with older hardware provided by the OS packages.<p>¹ <a href="https:&#x2F;&#x2F;lists.debian.org&#x2F;debian-ai&#x2F;2024&#x2F;07&#x2F;msg00002.html" rel="nofollow">https:&#x2F;&#x2F;lists.debian.org&#x2F;debian-ai&#x2F;2024&#x2F;07&#x2F;msg00002.html</a> ² Not including MI300 because that released too close to the Ubuntu 24.04 launch. ³ Pre-Vega architectures might work, but have known bugs for some applications. ⁴ Vega and RDNA 2 APUs might work with Linux 6.10+ installed. I&#x27;m in the process of testing that. ⁵ The version of rocBLAS that comes with Ubuntu 24.04 is a bit old and therefore lacks some optimizations for RDNA 3. It&#x27;s also missing some MI200 optimizations.
评论 #41726601 未加载
评论 #41726785 未加载
danielEM7 months ago
It has been like 8 months since I got Ryzen 8700G with NPU just for the purpose of inferencing NN, and so far only acceleration I&#x27;m getting is through vulkan on iGPU, not NPU (I&#x27;m using Linux only). On the bright side, with 64GB of RAM had no isues with trying models over 32GB. Kudos to llama.cpp for supporting vulkan backend!
评论 #41730268 未加载
rglullis7 months ago
So, this is all I needed to add to NixOS workstation:<p><pre><code> hardware.graphics.enable = true; services.ollama = { enable = true; acceleration = &quot;rocm&quot;; environmentVariables = { ROC_ENABLE_PRE_VEGA = &quot;1&quot;; HSA_OVERRIDE_GFX_VERSION = &quot;11.0.0&quot;; }; };</code></pre>
tomxor7 months ago
I almost tried to install AMD rocm a while ago after discovering the simplicity of llamafile.<p><pre><code> sudo apt install rocm Summary: Upgrading: 0, Installing: 203, Removing: 0, Not Upgrading: 0 Download size: 2,369 MB &#x2F; 2,371 MB Space needed: 35.7 GB &#x2F; 822 GB available </code></pre> I don&#x27;t understand how 36 GB can be justified for what amounts to a GPU driver.
评论 #41724067 未加载
评论 #41725267 未加载
评论 #41725064 未加载
评论 #41723601 未加载
stefan_7 months ago
This seems to be some AI generated wrapper around a wrapper of a wrapper.<p>&gt; # Other AMD-specific optimizations can be added here<p>&gt; # For example, you might want to set specific flags or use AMD-optimized libraries<p>What are we doing here, then?
评论 #41723930 未加载
freeqaz7 months ago
What&#x27;s the best bang-for-your-buck AMD GPU these days? I just bought 2 used 3090s for $750ish refurb&#x27;d on eBay. Curious what others are using for running LLMs locally.
评论 #41727921 未加载
评论 #41723470 未加载
评论 #41726881 未加载
评论 #41724469 未加载
leonheld7 months ago
People use &quot;Docker-based&quot; all the time but what they mean is that they ship $SOFTWARE in a Docker image.<p>&quot;Docker-based&quot; reads, to me, as if you were doing Inference on AMD cards with Docker somehow, which doesn&#x27;t make sense.
评论 #41723192 未加载
评论 #41723263 未加载
评论 #41723757 未加载
评论 #41724782 未加载
评论 #41723287 未加载
评论 #41725753 未加载
kristianp7 months ago
If you&#x27;re interested in how much the AMD graphics cards cost compared to the NVidia ones, I have <a href="https:&#x2F;&#x2F;gpuquicklist.com&#x2F;" rel="nofollow">https:&#x2F;&#x2F;gpuquicklist.com&#x2F;</a> which gives you a quick table view of lowest prices available on Amazon that I can find. &lt;&#x2F; selfpromotion&gt;
phkahler7 months ago
Does it work with an APU? I just put 64GB in my system and gonna drop in a 5700G. Will that be enough? SFF inference if so.
评论 #41728263 未加载
评论 #41724303 未加载
评论 #41724850 未加载
sandGorgon7 months ago
is anyone using the new HX370 based laptops for any LLM work ? i mean the ipex-llm libraries of Intel&#x27;s new Lunar Lake is already supporting Llama 3.2 (<a href="https:&#x2F;&#x2F;www.intel.com&#x2F;content&#x2F;www&#x2F;us&#x2F;en&#x2F;developer&#x2F;articles&#x2F;technical&#x2F;intel-ai-solutions-support-the-new-llama-3-2-model.html" rel="nofollow">https:&#x2F;&#x2F;www.intel.com&#x2F;content&#x2F;www&#x2F;us&#x2F;en&#x2F;developer&#x2F;articles&#x2F;t...</a>), but AMD&#x27;s new Zen5 chips dont seem to be much active here.
lenova7 months ago
Why ROCm 5.4, and not the latest (6.2)?<p><a href="https:&#x2F;&#x2F;github.com&#x2F;slashml&#x2F;amd_inference&#x2F;blob&#x2F;main&#x2F;Dockerfile#L2">https:&#x2F;&#x2F;github.com&#x2F;slashml&#x2F;amd_inference&#x2F;blob&#x2F;main&#x2F;Dockerfil...</a><p>Also looks like the Docker image provided by this project doesn&#x27;t successfully build: <a href="https:&#x2F;&#x2F;github.com&#x2F;slashml&#x2F;amd_inference&#x2F;issues&#x2F;2">https:&#x2F;&#x2F;github.com&#x2F;slashml&#x2F;amd_inference&#x2F;issues&#x2F;2</a>
评论 #41732691 未加载
ashirviskas7 months ago
I&#x27;m all for having more open source projects, but I do not see how it can be useful in this ecosystem, especially for people with newer AMD GPUs (not supported in this project) which are already supported in most popular projects?
评论 #41723609 未加载
talles7 months ago
Anyone else with an Intel Arc card idle waiting for some support?
kn1007 months ago
Sad that RDNA2 cards aren&#x27;t supported. Not even that old!
white_waluigi7 months ago
Isn&#x27;t this just a wrapper for huggingface-transformers?
评论 #41723598 未加载
dhruvdh7 months ago
Why would you use this over vLLM?
评论 #41723780 未加载
terminalcommand7 months ago
Will this work on older cards such as RX 570? Does anyone know?
ekianjo7 months ago
Does it work with GGUF files?
BaculumMeumEst7 months ago
how about they follow up 7900 XTX with a card that actually has some VRAM
评论 #41725349 未加载
khurdula7 months ago
Are we supposed to use AMD GPUs for this to work? Or Does it work on any GPU?
评论 #41723307 未加载