TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

IBM scientists demonstrate 10x faster large-scale machine learning using GPUs

236 点作者 brisance超过 7 年前

7 条评论

web007超过 7 年前
&gt; We can see that the scheme that uses sequential batching actually performs worse than the CPU alone, whereas the new approach using DuHL achieves a 10× speed-up over the CPU.<p>I had to get down to the graph to realize they&#x27;re talking about SVM, not deep learning.<p>This could be pretty cool. Training a SVM has usually been &quot;load ALL the data and go&quot;, and sequential implementations are almost non-existent. Even if this was 1x or 0.5x speed and didn&#x27;t require the entire dataset at once it&#x27;s a big win.
评论 #15872220 未加载
评论 #15870364 未加载
评论 #15870351 未加载
dragontamer超过 7 年前
I&#x27;d love to see more details.<p>Ultimately, it seems like IBM has managed to make a generalized gather&#x2F;scatter operation over large datasets in this particular task. Yes, this is an &quot;old problem&quot;, but at the same time, its the kind of &quot;Engineering advancement&quot; that definitely deserves talk. Any engineer who cares about performance will want to know about memory optimization techniques.<p>As CPUs (and GPUs! And Tensors, and FPGAs, and whatever other accelerators come out) get faster and faster, the memory-layout problem becomes more and more important. CPUs &#x2F; GPUs &#x2F; etc. etc. are all getting way faster than RAM, and RAM simply isn&#x27;t keeping up anymore.<p>A methodology to &quot;properly&quot; access memory sequentially has broad applicability at EVERY level of the CPU or GPU cache.<p>From Main Memory to L3, L3 to L2, L2 to L1. The only place this &quot;serialization&quot; method won&#x27;t apply is in register space.<p>The &quot;machine learning&quot; buzzword is getting annoying IMO, but there&#x27;s likely a very useful thing to talk about here. I for one am excited to see the full talk.
评论 #15873402 未加载
LolWolf超过 7 年前
This is pretty fascinating! Though the concept seems to work only for convex problems (in particular, problems which have strong duality; this excludes NNs in almost their entirety, except 1 layer nets), but the application is nice and straightforward.<p>I wonder if there is a similar lower-bound which can be constructed for non convex problems which retain enough properties for this method to be useful?
panosv超过 7 年前
How about if you did the same on 8 or 16 core CPU that can have much more than 16 GB of memory and is not as expensive to move data around its own memory?
评论 #15870238 未加载
yters超过 7 年前
SVMs have better generalization possibilities than NNs, so this is neat.
samnwa超过 7 年前
How do I use this to mine bitcoin? K thanks.
WhitneyLand超过 7 年前
tldr: They made a caching algorithm.<p>Article was touched by pr dept, but still has actually information.<p>longer tldr:<p>They did the same thing that has been done for thousands of years. Back then the hot area of research was how to stage advance food and resource caches along a route for long journeys. They came up with algorithms to optimize cache hits.<p>In this case, the problem is GPUs can be fast for ML, but usually only have 16GB ram when dataset can be terabytes.<p>Simple chunk processing would seem to solve the problem, but it’s turns out overhead of cpu&#x2F;gpu transfers badly degraded performance.<p>Their claim here is they can on the fly determine how important different samples are, and make sure samples that yield better results are in the chance more often than those with less importance.
评论 #15870579 未加载
评论 #15872476 未加载
评论 #15871570 未加载