TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Too big to deploy: How GPT-2 is breaking servers

4 点作者 calebkaiser超过 5 年前

1 comment

heavyarms超过 5 年前
Concerns like this are not appreciated enough in the data science&#x2F;ML community. And it&#x27;s not just the size and resource consumption of the final model. For any real enterprise&#x2F;business solution, the best ML model is usually not the one that has the highest benchmark scores, but the one that delivers the greatest value and can put online quickly by integrating with the existing systems and software in place.<p>You really have to start with performance and use case considerations from the beginning. Before you even try to train a model you have to know how scalable it is to load&#x2F;process the inputs and what you do with the output. For example, in a NLP use case for text categorization or conversational agents, do you have to load historical data like customer notes, emails, etc., that are sitting in a production SQL instance and have to be queried with complicated joins and where clauses? How performant is the current API for doing that? Will you have to run preprocessing on the raw inputs each time? Does it make sense to have a preprocessed copy of the data in a different data source? If so, how frequently should that data be synced? Depending on the answers to any of those questions what looks like a great model that works in a Jupyter Notebook suddenly becomes either not feasible or maybe too expensive to justify.<p>I understand that a data scientist or ML engineer can&#x27;t also be a cloud infrastructure expert or software architect who understands how all of the pieces are connected and how performant&#x2F;expensive certain options are. But it makes no sense to start training a model without at least having an answer for what data will be needed during inference, how fast it has to be to work in real time, how much traffic the model will get and how much that compute might cost in the cloud, etc.