BMC's in general leave me uneasy.<p>I like the idea, it is a small computer that is used to monitor and control your big computer. But hate the implementation. Why are they all super secret special firmware blobs? Why can't I just install my linux of choice and run the manufacturers software? This would still suck but not as bad as the full stack nonsense they foist on you at this point.
The repo claims that the servers themselves throttle the GPUs, but isn't it the GPUs themselves that can throttle or maybe the OS? Neither of those are controlled by the server (hopefully) so is there a different system at play here?
There are a number of valid engineering reasons for thermal dissipation why you don't want to overload the heat producing things in a 1U server beyond what it was designed for.<p>This article doesn't mention <i>at all</i> what the max TDP of each gpu is, which makes me suspicious. Or things like max tdp of cpus (such as when running a prime number calculatio multi core stress benchmark to load them to 100%) combined with total wattage of GPUs.<p>If you have never built an x86-64 1U dual socket server from discrete whitebox components (chassis, power supply, 12x13 size motherboard, etc) this is harder to intuitively understand.<p>I would recommend that people who want four powerful GPUs in something they own themselves to look at more conventional sized server chassis, 3U to 4U in height, or tower format if it doesn't need to be in a datacenter cabinet somewhere.
I'm dealing with something similar. I wanted to use Redfish to clear out hard drives but storage is not standardize across different vendors. Dell has a secure erase. HPE gen10 has smart storage and anything older doesn't have any useful functionality in their Redfish API. What a mess. So I need to use PXE booting and probably winpe to do this.