As somebody who works along with Applied Scientist helping them with tasks related to model training and deployemnt; how does one get exposure to more lower level engineering work like optimization, performance etc.
We have an ML infra team; but their goal is building tools around the platform, not necessarily getting workloads run optimially
I really appreciate everything in the "Unsolicited Advice" in the AI Battlefield section [1]. It's a very realistic take on the frenetic pace of everything and the emotional tax that comes with feeling like one is always drowning in the relentlessly rapid advance of AI development.<p>Thanks!<p>[1] <a href="https://github.com/stas00/ml-engineering/blob/master/insights/ai-battlefield.md#unsolicited-advice">https://github.com/stas00/ml-engineering/blob/master/insight...</a>
I randomly clicked on repeatability and am still curious about how it's achieved with distributed training. Wouldn't deterministic synchronization make things slow? But I have heard that at least in a couple of big companies, their training is repeatable.