TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Too big to deploy: How GPT-2 is breaking servers

4 pointsby calebkaiserover 5 years ago

1 comment

heavyarmsover 5 years ago
Concerns like this are not appreciated enough in the data science&#x2F;ML community. And it&#x27;s not just the size and resource consumption of the final model. For any real enterprise&#x2F;business solution, the best ML model is usually not the one that has the highest benchmark scores, but the one that delivers the greatest value and can put online quickly by integrating with the existing systems and software in place.<p>You really have to start with performance and use case considerations from the beginning. Before you even try to train a model you have to know how scalable it is to load&#x2F;process the inputs and what you do with the output. For example, in a NLP use case for text categorization or conversational agents, do you have to load historical data like customer notes, emails, etc., that are sitting in a production SQL instance and have to be queried with complicated joins and where clauses? How performant is the current API for doing that? Will you have to run preprocessing on the raw inputs each time? Does it make sense to have a preprocessed copy of the data in a different data source? If so, how frequently should that data be synced? Depending on the answers to any of those questions what looks like a great model that works in a Jupyter Notebook suddenly becomes either not feasible or maybe too expensive to justify.<p>I understand that a data scientist or ML engineer can&#x27;t also be a cloud infrastructure expert or software architect who understands how all of the pieces are connected and how performant&#x2F;expensive certain options are. But it makes no sense to start training a model without at least having an answer for what data will be needed during inference, how fast it has to be to work in real time, how much traffic the model will get and how much that compute might cost in the cloud, etc.