I am very impressed with what Google has done for the state of machine learning infrastructure. I'm looking forward to future models based on OpenXLA which can run between Nvidia, Apple Silicon, and google's TPU's. My main limiter to using TPU more often is model compatibility. The TPU hardware is clearly the very best, just not always cost-effective for those of us who are starved of available engineering hours. OpenXLA may fix this if it lives up to its promise.<p>That said, it's also incredible how fast things move in this space:<p>> Midjourney, one of the leading text-to-image AI startups, have been using Cloud TPU v4 to train their state-of-the-art model, coincidentally also called “version four”.<p>Midjourney is already on v5 as of the date of publication of this press release.
I refuse to care about them until they sell them on PCIe Cards.<p>The lock-in is bad enough when dealing with niche hardware on-prem, i certainly won't deal with niche hardware in the cloud.
,,Midjourney, one of the leading text-to-image AI startups, have been using Cloud TPU v4 to train their state-of-the-art model, coincidentally also called “version four''<p>This sounds quite bad in a press release when Midjourney is at v5. Why did they move away?
They're so non confrontational. Their performance comparisons are against "CPU". Just come out and say it, even if it's not apples to apples. If the 3D-torus interconnect is so much better, just say how it compares to NVidia's latest and greatest. It's cool that midjourney committed to building on TPU, but I have a hard time betting my company on a technology that's so guarded that they won't even post a benchmark against their main competitor.
This is very impressive technology and engineering.<p>However, I remain a bit skeptical of the business case for TPUs for 3 core reasons:<p>1) 100000x lower unit production volume than GPUs means higher unit costs<p>2) Slow iteration cycle - these TPUv4 were launched in 2020. Maybe Google publishes one gen behind, but that would still be a 2-3 year iteration cycle from v3 to v4.<p>3) Constant multiple advantage over GPUs - maybe 5-10x compute advantage over off the shelf GPU, and that number isn't increasing with each generation.<p>It's cool to get that 5-10x performance over GPUs, but that's 4.5yrs of Moore's Law, and might already be offset today due to unit cost advantages.<p>If the TPU architecture did something to allow fundamentally faster transistor density scaling, it's advantage over GPUs would increase each year and become unbeatable. But based on the TPUv3 to TPUv4 perf improvement over 3 years, it doesn't seem so.<p>Apple's competing approach seems a bit more promising from a business perspective. The M1 unifies memory reducing the time commitment required to move data and switch between CPU and GPU processing. This allows advances in GPUs to continue scaling independently, while decreasing the user experience cost of using GPUs.<p>Apple's version also seems to scale from 8GB RAM to 128GB meaning the same fundamental process can be used at high volume, achieving a low unit cost.<p>Are there other interesting hardware for ML approaches out there?