TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Ask HN: Next steps for scaling a scikit-learn Flask ML API

1 pointsby frist45about 7 years ago
We currently have an internal API that&#x27;s core to our business. The models are loaded as .pkl files with scikit-learn joblib and served via Flask w&#x2F; Gunicorn using Gevent. We&#x27;ve tried Tornado as a worker class and Cherrypy as a replacement for Gunicorn -- none produce significant performance benefits.<p>We&#x27;re hosting it in a Kubernetes cluster with really large nodes (140GB). Each container user ~5GB of RAM And considering the response time (~750ms), we can only add about 30 req&#x2F;sec for each node we add ($1.5k). It appears the single request is CPU bound make it difficult to widely scale.<p>This is cost prohibitive and feels like we need to move towards other tools&#x2F;approaches.<p>As the person who&#x27;s managing the infrastructure, I&#x27;m less familiar with the current eco-system of larger-scale tooling. Ideally, the next iteration would keep the HTTP transport layer to allow for minimal changes to the rest of the system.<p>What would be a logical next step for us to scale the existing scikit-learn&#x2F;Flask API?

no comments

no comments