TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Ask HN: How can I set up an ML model as a scalable API?

16 pointsby rococodeover 5 years ago
I have a custom ML (PyTorch) model that I would like to set up as a service &#x2F; API - it should be able to receive an input any time and promptly return an output. It should be able to scale up automatically to thousands of requests per second. The model itself takes around a minute to load, an inference step takes around 100ms. The model is being called only from my product&#x27;s backend, so I have a bit of control over request volume.<p>I&#x27;ve been searching around and haven&#x27;t found a clear standard&#x2F;best way to do this.<p>Here are some of the options I&#x27;ve considered:<p>- Algorithmia (came across this yesterday, unsure how good it is and have some questions about the licensing)<p>- Something fancy with Kubernetes<p>- Write a load balancer and manually spin up new instances when needed.<p>Right now I&#x27;m leaning towards Algorithmia as it seems to be cost-effective and basically designed to do what I want. But I&#x27;m unsure how it handles long model loading times, or if the major cloud providers have similar services.<p>I&#x27;m quite new to this kind of architecture and would appreciate some thoughts on the best way to accomplish this!

6 comments

calebkaiserover 5 years ago
I work on a free and open source project called Cortex that deploys PyTorch models (as well as other frameworks) as scalable APIs. It sounds perfect for what you&#x27;re looking for: <a href="https:&#x2F;&#x2F;github.com&#x2F;cortexlabs&#x2F;cortex" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;cortexlabs&#x2F;cortex</a><p>Cortex automates all of the devops work—from containerizing your model, to orchestrating Kubernetes, to autoscaling instances to meet demands. We have a bunch of PyTorch examples in our repo, if you&#x27;re interested: <a href="https:&#x2F;&#x2F;github.com&#x2F;cortexlabs&#x2F;cortex&#x2F;tree&#x2F;master&#x2F;examples" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;cortexlabs&#x2F;cortex&#x2F;tree&#x2F;master&#x2F;examples</a>
gautamkmr89over 5 years ago
You can use sagemaker to perform ml model deployment. <a href="https:&#x2F;&#x2F;aws.amazon.com&#x2F;sagemaker&#x2F;" rel="nofollow">https:&#x2F;&#x2F;aws.amazon.com&#x2F;sagemaker&#x2F;</a><p>Sagemaker takes care of infrastructure for you. It has also been integrated with various orchestrations like k8s, airflow etc.<p><a href="https:&#x2F;&#x2F;sagemaker.readthedocs.io&#x2F;en&#x2F;stable&#x2F;amazon_sagemaker_operators_for_kubernetes_jobs.html#real-time-inference" rel="nofollow">https:&#x2F;&#x2F;sagemaker.readthedocs.io&#x2F;en&#x2F;stable&#x2F;amazon_sagemaker_...</a>
rankamover 5 years ago
Maybe you could try to save the pre-trained model to a storage bucket (e.g. s3) and then use flask (or whatever framework you like) to create the endpoints. When the flask app starts, the model can be loaded into memory from the storage bucket, and then you could create, for example, a &#x2F;predict endpoint that accepts whatever data is needed to make the prediction. Deploy this to some PaaS (Heroku, AWS EBS, GCP App Engine) that has auto-scaling as a feature and you&#x27;re sorted.
streetcat1over 5 years ago
So you can go with Kubernetes. This is my preferred tool.<p>With Kubernetes, you can either wrap your model inside a container or mount it into the container from a persistent volume.<p>As for scaling you have two options:<p>1) Horizontal Pod Autoscaler <a href="https:&#x2F;&#x2F;kubernetes.io&#x2F;docs&#x2F;tasks&#x2F;run-application&#x2F;horizontal-pod-autoscale&#x2F;" rel="nofollow">https:&#x2F;&#x2F;kubernetes.io&#x2F;docs&#x2F;tasks&#x2F;run-application&#x2F;horizontal-...</a><p>2) Knative, which is Kubernetes serverless on-prem solution.
doppenheover 5 years ago
Algorithmia here . What are you concerned about license wise? You own all ip always. There is some restrictions if you choose to commercialize on our service (mostly guarantee you won&#x27;t take it down on users). System was built for this. Happy to answer questions
评论 #21860031 未加载
mailslotover 5 years ago
Tensorflow has a decent C++ layer, just sayin.<p>PyTorch? Dunno. Last I spoke to those ppl, they had a solution too.