TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Ask HN: How do AI labs setup their infrastructure to train large models?

3 pointsby true2octavealmost 2 years ago
At my company I have to do this task, and so far I have seen slurm-based cluster setup (v100s or h100s), some fast distributed file system, Docker for containers and PyTorch with DDP strategy.<p>But I read somewhere kubernetes can also be used. And their singularity as Docker alternative.<p>Where can I learn more about this?

no comments

no comments