TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

What do Data Scientists use to train models fast?

6 点作者 moridin007将近 10 年前
i&#x27;m training a machine learning model using SVM in python and it took aages for it to happen on my local machine. (with 10% of the data that i have) i&#x27;m getting a 80-90% correct prediction score on the same subjects data so now want to add in the rest of the data. (11 more subjects)<p>i thought of offloading it to my ec2 instance but i&#x27;m on a budget so cant just take a 30CPU instance.. on top of everything the code just uses 1 CPU at 100% always so i&#x27;m not sure about how effective it would be.<p>what do you guys use to train these models?

3 条评论

syllogism将近 10 年前
Speed comes from two things: implementation and algorithm. Algorithmically, the way to learn quickly is to use some sort of stochastic gradient method, i.e. learn from examples one-by-one, as opposed to as a batch.<p>As far as implementation goes, you need dense arrays. A native Python implementation will usually be lists of Python objects, which is very slow.<p>If you just need an SVM implementation, libsvm is pretty good. I&#x27;m assuming you need a non-linear kernel. If you&#x27;re using a linear kernel then there&#x27;s not really a difference between SVM and MaxEnt (well, there is but not much).<p>If your data is very sparse then there aren&#x27;t many general-purpose implementations that are any good. The scipy.sparse module has some key stuff implemented in pure Python, and doesn&#x27;t interoperate properly with the rest of the PyData ecosystem. I had to implement my own sparse data structures, in Cython.
facorreia将近 10 年前
One approach is to convert the code to use parallelism. For an example of how to do it in Python using joblib see this article: <a href="http:&#x2F;&#x2F;blog.dominodatalab.com&#x2F;simple-parallelization&#x2F;" rel="nofollow">http:&#x2F;&#x2F;blog.dominodatalab.com&#x2F;simple-parallelization&#x2F;</a><p>Even if you can&#x27;t afford a 32-core instance, you might get to use 4 cores in your laptop.
评论 #9992897 未加载
rajacombinator将近 10 年前
How much data and how long are you talking about? If it fits in memory, then the slowness is likely due to other coding errors causing a bottleneck, not the SVM training. (Unless you wrote that as well.)