TechEcho

9 comments

jebarkerover 1 year ago

This is gold. I spend my days debugging LLM training setups in support of research and I'd have loved these notes when I started!

评论 #39104257 未加载

cyrux004over 1 year ago

As somebody who works along with Applied Scientist helping them with tasks related to model training and deployemnt; how does one get exposure to more lower level engineering work like optimization, performance etc. We have an ML infra team; but their goal is building tools around the platform, not necessarily getting workloads run optimially

评论 #39099139 未加载

评论 #39099531 未加载

HanClintoover 1 year ago

I really appreciate everything in the "Unsolicited Advice" in the AI Battlefield section [1]. It's a very realistic take on the frenetic pace of everything and the emotional tax that comes with feeling like one is always drowning in the relentlessly rapid advance of AI development.<p>Thanks!<p>[1] <a href="https://github.com/stas00/ml-engineering/blob/master/insights/ai-battlefield.md#unsolicited-advice">https://github.com/stas00/ml-engineering/blob/master/insight...</a>

legerdemainover 1 year ago

How widespread is Slurm?

评论 #39098818 未加载

评论 #39097946 未加载

评论 #39099125 未加载

评论 #39099141 未加载

评论 #39098102 未加载

评论 #39105946 未加载

Scene_Cast2over 1 year ago

I randomly clicked on repeatability and am still curious about how it's achieved with distributed training. Wouldn't deterministic synchronization make things slow? But I have heard that at least in a couple of big companies, their training is repeatable.

评论 #39100901 未加载

hahnchenover 1 year ago

How do you get experience in this stuff without having a job?

评论 #39100869 未加载

评论 #39102093 未加载

评论 #39108359 未加载

评论 #39105309 未加载

评论 #39105054 未加载

the_g0d_f4therover 1 year ago

I really want to start experimenting with this, but i don’t really have a solid gpu. How do you guys actually run these ?

mayilianover 1 year ago

What Twitter accounts to follow to stay updated?

ameliusover 1 year ago

Is there a pdf somewhere? I see there are instructions for building it, but not the actual file.

评论 #39126464 未加载

评论 #39105912 未加载

9 comments

jebarkerover 1 year ago

This is gold. I spend my days debugging LLM training setups in support of research and I'd have loved these notes when I started!

评论 #39104257 未加载

cyrux004over 1 year ago

评论 #39099139 未加载

评论 #39099531 未加载

HanClintoover 1 year ago

legerdemainover 1 year ago

How widespread is Slurm?

评论 #39098818 未加载

评论 #39097946 未加载

评论 #39099125 未加载

评论 #39099141 未加载

评论 #39098102 未加载

评论 #39105946 未加载

Scene_Cast2over 1 year ago

评论 #39100901 未加载

hahnchenover 1 year ago

How do you get experience in this stuff without having a job?

评论 #39100869 未加载

评论 #39102093 未加载

评论 #39108359 未加载

评论 #39105309 未加载

评论 #39105054 未加载

the_g0d_f4therover 1 year ago

I really want to start experimenting with this, but i don’t really have a solid gpu. How do you guys actually run these ?

mayilianover 1 year ago

What Twitter accounts to follow to stay updated?

ameliusover 1 year ago

Is there a pdf somewhere? I see there are instructions for building it, but not the actual file.

评论 #39126464 未加载

评论 #39105912 未加载

ML Engineering Online Book

9 comments

ML Engineering Online Book

9 comments