This is very impressive technology and engineering.<p>However, I remain a bit skeptical of the business case for TPUs for 3 core reasons:<p>1) 100000x lower unit production volume than GPUs means higher unit costs<p>2) Slow iteration cycle - these TPUv4 were launched in 2020. Maybe Google publishes one gen behind, but that would still be a 2-3 year iteration cycle from v3 to v4.<p>3) Constant multiple advantage over GPUs - maybe 5-10x compute advantage over off the shelf GPU, and that number isn't increasing with each generation.<p>It's cool to get that 5-10x performance over GPUs, but that's 4.5yrs of Moore's Law, and might already be offset today due to unit cost advantages.<p>If the TPU architecture did something to allow fundamentally faster transistor density scaling, it's advantage over GPUs would increase each year and become unbeatable. But based on the TPUv3 to TPUv4 perf improvement over 3 years, it doesn't seem so.<p>Apple's competing approach seems a bit more promising from a business perspective. The M1 unifies memory reducing the time commitment required to move data and switch between CPU and GPU processing. This allows advances in GPUs to continue scaling independently, while decreasing the user experience cost of using GPUs.<p>Apple's version also seems to scale from 8GB RAM to 128GB meaning the same fundamental process can be used at high volume, achieving a low unit cost.<p>Are there other interesting hardware for ML approaches out there?