TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Show HN: TPI – Terraform provider for ML and self-recovering spot-instances

12 pointsby dmpetrovabout 3 years ago

3 comments

dmpetrovabout 3 years ago
Hey all, we are launching Terraform Provider Iterative (TPI).<p>It was designed for machine learning (ML&#x2F;AI) teams and optimizes CPU&#x2F;GPU expenses:<p>1. Spot instances auto-recovery (if an instance was evicted&#x2F;terminated) with data and checkpoint synchronization<p>2. Auto-terminate instances when ML training is finished - you won&#x27;t forget to terminate your expensive GPU instance for a week :)<p>3. Familiar Terraform commands and config (HCL)<p>The secret sauce is auto-recovery logic that is based on cloud auto-scaling groups and does not require any monitoring service to run (another cost-saving!). Cloud providers recover it for you. TPI just unifies auto-scaling groups for all the major cloud providers: AWS, Azure, GCP and Kubernetes. Yeah, it was tricky to unify all clouds :)<p>We&#x27;d love to hear your feedback!
toisanjiabout 3 years ago
Awesome, this project is from the team behind data version control (dvc) and CML, I’ll give it a try!
评论 #31179999 未加载
ogazittabout 3 years ago
Auto-scaling for ML workloads, integrated with the TF workflow - very cool!