TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Ask HN: Cheaper way to do model inference?

2 pointsby masterofall261212 months ago
Does anyone know of any solutions for saving GPU compute during server downtime? Is there a managed solution to turn off a pod and turn it back on when I need it? I'm currently doing model inference and most of the time I'm just paying for compute without serving any user requests.

2 comments

brianjking12 months ago
Huggingface Inference Endpoints can autoscale to 0 and cost nothing when not being used.
PaulHoule12 months ago
You are running inference on something like an EC2 instance?
评论 #40540931 未加载