Looking at the conversations almost 24 hours after posting the IP, licensing, ecosystem, political, and overall business aspects of this have been discussed to death. Oddly for Hacker News there has been little discussion of the potential technical aspects of this acquisition.<p>Pure speculation (of course)...<p>To me (from a tech standpoint) this acquisition centers around three things we already know about Nvidia:<p>- Nvidia is pushing to own anything and everything GPGPU/TPU related, from cloud/datacenter to edge. Nvidia has been an ARM licensee for years with their Jetson line of hardware for edge GPGPU applications:<p><a href="https://developer.nvidia.com/buy-jetson" rel="nofollow">https://developer.nvidia.com/buy-jetson</a><p>Looking at the architecture of these devices (broadly speaking) Nvidia is combining an ARM CPU with their current gen GPU hardware (complete with Tensor Cores, etc). What's often left out of this mention is that they utilize a shared memory architecture where the ARM CPU and CUDA cores share memory. Not only does this cut down on hardware costs and power usage, it increases performance.<p>- Nvidia has acquired Mellanox for high performance network I/O across various technologies (Ethernet and Infiniband). Nvidia is also actively working to be able to remove the host CPU from as many GPGPU tasks as possible (network I/O and data storage):<p><a href="https://developer.nvidia.com/gpudirect" rel="nofollow">https://developer.nvidia.com/gpudirect</a><p>- Nvidia already has publicly available software in place to effectively make their CUDA compute available over the network using various APIs:<p><a href="https://github.com/triton-inference-server/server" rel="nofollow">https://github.com/triton-inference-server/server</a><p>Going on just the name Triton is currently only available for inference but it provides the ability to not only directly serve GPGPU resources via network API at scale but ALSO accelerate various models with TensorRT optimization:<p><a href="https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/optimization.html#framework-specific-optimization" rel="nofollow">https://docs.nvidia.com/deeplearning/triton-inference-server...</a><p>Given these points I think this is an obvious move for Nvidia. TDP and performance is increasingly important across all of their target markets. They already have something in place for edge inference tasks powered by ARM with Jetson but looking at ARM core CPU benchmarks it's sub-optimal. Why continue to pay ARM licensing fees when you can buy the company, collect licensing fees, get talent, and (presumably) drastically improve performance and TDP for your edge GPGPU hardware?<p>In the cloud/datacenter, why continue to give up watts in terms of TDP and performance to sub-optimal Intel/AMP/x86_64 CPUs and their required baggage (motherboard bridges, buses, system RAM, etc) when all you really want to do is shuffle data between your GPUs, network, and storage as quickly and efficiently as possible?<p>Of course many applications will still require a somewhat general purpose CPU for various tasks, customer code, etc. AWS already has their own optimized ARM cores in place. aarch64 is more and more becoming a first class citizen across the entire open source ecosystem.<p>As platform and software as a service continues to eat the world cloud providers likely have already started migrating the underlying hardware powering these various services to ARM cores for improved performance and TDP (same product, more margin).<p>Various ARM cores are already showing to be quite capable for most CPU tasks but given the other architectural components in place here even the lowliest of modern ARM cores is likely to be asleep most of the time for the applications Nvidia currently cares about. Giving up licensing, die space, power, tighter integration, etc to x86_64 just seems to be foolish at this point.<p>Meanwhile (of course) if you still need x86_64 (or any other arch) for whatever reason you can hit a network API powered by hardware using Nvidia/Mellanox I/O, GPU, and ARM. Potentially (eventually) completely transparently using standard CUDA libraries and existing frameworks (see work like Apex):<p><a href="https://github.com/NVIDIA/apex" rel="nofollow">https://github.com/NVIDIA/apex</a><p>I, for one, am excited to see what comes from this.