The ability to get huge amounts of VRAM/$ is what I find incredibly interesting. A lot of diffusion techniques are incredibly VRAM intensive and high VRAM consumer cards are rare and expensive. I'll gladly take the slower speeds of an APU if it means I can load the entire model in memory instead of having to offload chunks of it.
Would be interesting to try newer AMD Phoenix APU, specifically 7840H, 7840HS, 7940H or 7940HS: <a href="https://en.wikipedia.org/wiki/Template:AMD_Ryzen_Mobile_7040_series" rel="nofollow noreferrer">https://en.wikipedia.org/wiki/Template:AMD_Ryzen_Mobile_7040...</a><p>These chips don’t have a socket and were designed for laptops. However, they have up to 54W TDP which is not quite a laptop’s territory. Luckily, there’re mini-PCs on the market with them. The form factor is similar to Intel NUC or Mac Mini. An example is Minisforum UM790 Pro (disclaimer: I have never used one, so far only read a review).<p>The integrated Radeon 780M GPU includes 12 compute units of RDNA3, peak FP32 performance is about 9 TFlops, peak FP16 about 18 TFlops. The CPU supports two channels of DDR5-5600, a properly built computer has 138 GB/second memory bandwidth.
The 4600G supports two channels of DDR4-3200 which has a maximum memory bandwidth of around 50GB/s (actual graphics cards are in the hundreds). While this chip may be decent for SD and other compute-bound AI apps it won't be good for LLMs as inference speed is pretty much capped by memory bandwidth.<p>Apple Silicon has extremely high memory bandwidth which is why it performs so well with LLMs.
I wonder how does it compare to running same model on the same cpu in ram(making sure fast cpu ML library that utilises AVX2 is used for example Intel MKL)?<p>Also, when doing a test like this it's important to compare same bit depths so fp32 on both.
When it says "turned" I'm assuming there are some kernel boot parameters or driver configurations that are needed for it to allocate 16GB of main RAM for the GPU. Did they publish those or is this behavior out of the box?
Woot AMD now supports APU? I sold my notebook as i hit a wall when trying rocm [1] Is there a list oft Wirkung apu's ?<p>[1] <a href="https://github.com/RadeonOpenCompute/ROCm/issues/1587">https://github.com/RadeonOpenCompute/ROCm/issues/1587</a>
How well do these workloads parallelize? Especially over customer-tier interconnects? What's stopping someone from picking up 100 of these to setup a cool little 1.6tb-vram cluster? Whole thing would probably cost less than an h100.
This reminds me of Radeon Pro SSG. A 2017 GPU with four ssd slots and 16GB of HBM, maybe now with nvme PCIe 5.0 ssds it could be redone.<p>There is videos on YouTube of it with 2TB of flash.
The post is short so I'll paste it here. If this is against the rules please ban me.<p>-----begin copy paste-----<p>The 4600G is currently selling at price of $95. It includes a 6-core CPU and 7-core GPU. 5600G is also inexpensive - around $130 with better CPU but the same GPU as 4600G.<p>It can be turned into a 16GB VRAM GPU under Linux and works similar to AMD discrete GPU such as 5700XT, 6700XT, .... It thus supports AMD software stack: ROCm. Thus it supports Pytorch, Tensorflow. You can run most of the AI applications.<p>16GB VRAM is also a big deal, as it beats most of discrete GPU. Even those GPU has better computing power, they will get out of memory errors if application requires 12 or more GB of VRAM. Although the speed is an issue, it's better than out of memory errors.<p>For stable diffusion, it can generate a 50 steps 512x512 image around 1 minute and 50 seconds. This is better than some high end CPUs.<p>5600G was a very popular product, so if you have one, I encourage you to test it. I made some videos tutorials for it. Please search tech-practice9805 for on Youtube and subscribe to the channel for future contents. Or see the video links in Comments.<p>Please also follow me on X: <a href="https://twitter.com/TechPractice1" rel="nofollow noreferrer">https://twitter.com/TechPractice1</a>
Thanks for reading!
From the Reddit thread: <i>"That's about 0.55 iterations per second. For $95, can't really complain."</i><p><a href="https://old.reddit.com/r/Amd/comments/15t0lsm/i_turned_a_95_amd_apu_into_a_16gb_vram_gpu_and_it/jwj79d1/" rel="nofollow noreferrer">https://old.reddit.com/r/Amd/comments/15t0lsm/i_turned_a_95_...</a>