TechEcho

11 comments

leetharrisover 1 year ago

I've been working with people at GCP for months to get the right provisioning for TPUs for my company that spends many millions per year on GPU compute.They take weeks to respond to anything, they change their minds constantly, you can never trust anything anyone says, their internal communication is a complete disaster, and someone recently told me they outsource a lot of their GCP personnel.We went back to AWS and had a whole fleet of GPUs up and running within the week.This is my 3rd extremely bad experience on Google Cloud. My last unicorn startup had several GCP-caused P0 production issues. They would update something internally with no announcement to customers and our production workloads would completely break out of the blue. It would usually take them days to weeks to fix it even with us spending tens of millions with them and calling support constantly. Everyone at our company was baffled at how bad the experience was compared to every other cloud provider.I would not put anything serious there and I would never partner with GCP again.

评论 #38546643 未加载

评论 #38549795 未加载

评论 #38547348 未加载

评论 #38549095 未加载

评论 #38550882 未加载

评论 #38547504 未加载

评论 #38549225 未加载

lhlover 1 year ago

Recently I've been using GCP to train a model, some notes:* Like @leetharris, credits were pulled/not distributed, what was promised had to be cajoled out with most of it being sent to some weird SaaS product that we'll never use* The GCP rep literally ghosted us halfway through the month where it when we had some expiring credits and were in the middle of training* Not that the credits mattered, our quota requests for lifting GPU or TPU was rejected twice. It was impossible to get any GPUs that were within our credits, even writing a script to try to look for machines for weeks didn't work.* Right after the credits expired, suddenly our last quota request, which was hanging around for weeks was approved. I assume they have an internal system setup to do that, but like we literally couldn't pay for GCP if we wanted to.* Also, GCP rates are like 2-4X the market rate. Like you can get an H100-80 from Runpod (and actually get one) for what GCP charges for an A100-40.Basically, the lesson learned was that no one should ever depend on GCP unless your time is worthless and you're not serious about getting any work done. They can go suck eggs.

StephenSmithover 1 year ago

This is the bigger headline than their Gemini release. AI is all about how much compute dollars it can generate for the cloud providers. Google is trying to make sure Microsoft doesn't monopolize AI compute.

评论 #38545445 未加载

0cf8612b2e1eover 1 year ago

Serious question, are there open source designs available for RISC today that do “good enough” matrix multiplication?I have no doubt that Nvidia has extensive optimizations to get SOTA performance, but I am curious what is attainable off the shelf. If you could design a 5nm chip, would it be possible to hit 15% of a NVidia chip? Significantly more?Of course, there is more to a GPU than just the matrix multiplication, but I am wondering how much effort it would take to get something off the ground for the well financed organization. Presumably China is actively finding such efforts.

评论 #38549084 未加载

评论 #38551065 未加载

评论 #38551071 未加载

modelessover 1 year ago

> large LLM modelsI'm not usually one to point out redundancies like this but this one seems egregious.

评论 #38548688 未加载

DeathArrowover 1 year ago

How does it compare to Nvidia A100?

评论 #38545738 未加载

评论 #38546090 未加载

评论 #38548524 未加载

评论 #38553114 未加载

评论 #38547316 未加载

评论 #38546041 未加载

mgover 1 year ago

Does Google rely on TSMC to build the TPU chips?

评论 #38547644 未加载

pclmulqdqover 1 year ago

Still no FP8 from Google. Surprising, given how effective it seems to be for both training and inference. Although it's not that surprising given that the primary customer of TPUs is Google itself, and they tend to stick themselves on weird little tech islands.

评论 #38547736 未加载

riku_ikiover 1 year ago

I like how they launch hard before end of the year performance review.

camdenlockover 1 year ago

> To request accessGoogle is so fucking lame these days

评论 #38548110 未加载

az226over 1 year ago

Lol. Without benchmarks against H100 you can’t take this seriously.

11 comments

leetharrisover 1 year ago

评论 #38546643 未加载

评论 #38549795 未加载

评论 #38547348 未加载

评论 #38549095 未加载

评论 #38550882 未加载

评论 #38547504 未加载

评论 #38549225 未加载

lhlover 1 year ago

StephenSmithover 1 year ago

评论 #38545445 未加载

0cf8612b2e1eover 1 year ago

评论 #38549084 未加载

评论 #38551065 未加载

评论 #38551071 未加载

modelessover 1 year ago

> large LLM modelsI'm not usually one to point out redundancies like this but this one seems egregious.

Cloud TPU v5p and AI Hypercomputer

11 comments

Cloud TPU v5p and AI Hypercomputer

11 comments