TE
TechEcho
Home
24h Top
Newest
Best
Ask
Show
Jobs
English
GitHub
Twitter
Home
YaFSDP: a sharded data parallelism framework, faster for pre-training LLMs
135 points
by
wiradikusuma
11 months ago
2 comments
codetrotter
11 months ago
Collapse
I was surprised to see that the Ya part meant “Yet another”. I mean, I’ve seen it before in many acronyms. But it’s pretty tongue in cheek of them to do that here since one would expect it was just because it was made by Yandex.
评论 #40720067 未加载
评论 #40720024 未加载
dayeye2006
11 months ago
Collapse
Any idea on what are the main tricks used to achieve gains over fsdp?
评论 #40719773 未加载