Ask HN: Cheaper way to do model inference?

2 pointsby masterofall261212 months ago

Does anyone know of any solutions for saving GPU compute during server downtime? Is there a managed solution to turn off a pod and turn it back on when I need it? I'm currently doing model inference and most of the time I'm just paying for compute without serving any user requests.

2 comments

brianjking12 months ago

Huggingface Inference Endpoints can autoscale to 0 and cost nothing when not being used.

PaulHoule12 months ago

You are running inference on something like an EC2 instance?

评论 #40540931 未加载