TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Show HN: TPI – Terraform provider for ML and self-recovering spot-instances

12 点作者 dmpetrov大约 3 年前

3 条评论

dmpetrov大约 3 年前
Hey all, we are launching Terraform Provider Iterative (TPI).<p>It was designed for machine learning (ML&#x2F;AI) teams and optimizes CPU&#x2F;GPU expenses:<p>1. Spot instances auto-recovery (if an instance was evicted&#x2F;terminated) with data and checkpoint synchronization<p>2. Auto-terminate instances when ML training is finished - you won&#x27;t forget to terminate your expensive GPU instance for a week :)<p>3. Familiar Terraform commands and config (HCL)<p>The secret sauce is auto-recovery logic that is based on cloud auto-scaling groups and does not require any monitoring service to run (another cost-saving!). Cloud providers recover it for you. TPI just unifies auto-scaling groups for all the major cloud providers: AWS, Azure, GCP and Kubernetes. Yeah, it was tricky to unify all clouds :)<p>We&#x27;d love to hear your feedback!
toisanji大约 3 年前
Awesome, this project is from the team behind data version control (dvc) and CML, I’ll give it a try!
评论 #31179999 未加载
ogazitt大约 3 年前
Auto-scaling for ML workloads, integrated with the TF workflow - very cool!