Even as an ML-focused graphics-less GPU, this is great. If this can be prototyped on an FPGA, it would be even better. Using block RAM for shared memory and built-in PCIe and DDR IP blocks should help speed things up considerably.<p>It unfortunately wouldn't be very cost-effective for training ML models, but it would take things a step closer to actual tape-out (if some organization has the $$$ for it).
Worth noting this is targeting ML applications, so I don't think you'll be able to display even a text console with it for the foreseeable future.<p>But I love that this is even in the realm of possibilities! There's no reason we couldn't, in principle, have a small open-source GPU taping out on the free Skywater shuttle, and I am here for it!
Perhaps also see the (OpenPOWER-based) Libre-SOC effort <a href="https://libre-soc.org/" rel="nofollow">https://libre-soc.org/</a>
> Internal GPU Core ISA loosely compliant with RISC-V ISA. Where RISC-V conflicts with designing for a GPU setting, we break with RISC-V.<p>Very amateur question: I thought RISC-V added vector extensions so you could use it directly for GPU/TPU chips without having to fragment the ecosystem?
Neat! Looks like it's very much in its early stages (no concurrent execution/threads yet) but it's so great to see FOSS digital design work in an industry dominated by huge players
I'm not really optimistic about the hardware and the tape-out goal. The author seems to have a very basic knowledge about it.<p>For instance, the int multiplier design is overengineered, very naive and far from state of the art (no pipeline, no adder compressor). I would suggest the author to check wallace tree multiplier.<p>But at this stage it would be preferable to use the native verilog multiply or a DSP macro to target FPGA for prototyping, and to focus the SIMT architecture and the pipelining. Arithmetic unit design is a science in its own.<p>However it's a beautiful project!
I'm interested whether it would be a good idea to implement a Vulkan driver for such GPU via emulating TMUs and ROPs in software, and it might not even matter that much since modern rendering pipelines are more and more compute reliant anyways(UE5 Nanite barely uses hardware rasterizers, latest idTech uses software rasterization as well). The only problem I see lies is with raytracing since it is pretty reliant on fixed function units.
From the planning document:<p>> Branching: Done<p>> Single instruction multiple thread (SIMT): Planned<p>I guess we should be supportive, and it is impressive how far they got on the software side, but boy, is the author in for a surprise.