科技回声

7 条评论

web007超过 7 年前

> We can see that the scheme that uses sequential batching actually performs worse than the CPU alone, whereas the new approach using DuHL achieves a 10× speed-up over the CPU.I had to get down to the graph to realize they're talking about SVM, not deep learning.This could be pretty cool. Training a SVM has usually been "load ALL the data and go", and sequential implementations are almost non-existent. Even if this was 1x or 0.5x speed and didn't require the entire dataset at once it's a big win.

评论 #15872220 未加载

评论 #15870364 未加载

评论 #15870351 未加载

dragontamer超过 7 年前

I'd love to see more details.Ultimately, it seems like IBM has managed to make a generalized gather/scatter operation over large datasets in this particular task. Yes, this is an "old problem", but at the same time, its the kind of "Engineering advancement" that definitely deserves talk. Any engineer who cares about performance will want to know about memory optimization techniques.As CPUs (and GPUs! And Tensors, and FPGAs, and whatever other accelerators come out) get faster and faster, the memory-layout problem becomes more and more important. CPUs / GPUs / etc. etc. are all getting way faster than RAM, and RAM simply isn't keeping up anymore.A methodology to "properly" access memory sequentially has broad applicability at EVERY level of the CPU or GPU cache.From Main Memory to L3, L3 to L2, L2 to L1. The only place this "serialization" method won't apply is in register space.The "machine learning" buzzword is getting annoying IMO, but there's likely a very useful thing to talk about here. I for one am excited to see the full talk.

评论 #15873402 未加载

LolWolf超过 7 年前

This is pretty fascinating! Though the concept seems to work only for convex problems (in particular, problems which have strong duality; this excludes NNs in almost their entirety, except 1 layer nets), but the application is nice and straightforward.I wonder if there is a similar lower-bound which can be constructed for non convex problems which retain enough properties for this method to be useful?

panosv超过 7 年前

How about if you did the same on 8 or 16 core CPU that can have much more than 16 GB of memory and is not as expensive to move data around its own memory?

评论 #15870238 未加载

yters超过 7 年前

SVMs have better generalization possibilities than NNs, so this is neat.

samnwa超过 7 年前

How do I use this to mine bitcoin? K thanks.

WhitneyLand超过 7 年前

tldr: They made a caching algorithm.Article was touched by pr dept, but still has actually information.longer tldr:They did the same thing that has been done for thousands of years. Back then the hot area of research was how to stage advance food and resource caches along a route for long journeys. They came up with algorithms to optimize cache hits.In this case, the problem is GPUs can be fast for ML, but usually only have 16GB ram when dataset can be terabytes.Simple chunk processing would seem to solve the problem, but it’s turns out overhead of cpu/gpu transfers badly degraded performance.Their claim here is they can on the fly determine how important different samples are, and make sure samples that yield better results are in the chance more often than those with less importance.

评论 #15870579 未加载

评论 #15872476 未加载

评论 #15871570 未加载

7 条评论

web007超过 7 年前

评论 #15872220 未加载

评论 #15870364 未加载

评论 #15870351 未加载

dragontamer超过 7 年前

评论 #15873402 未加载

LolWolf超过 7 年前

panosv超过 7 年前

How about if you did the same on 8 or 16 core CPU that can have much more than 16 GB of memory and is not as expensive to move data around its own memory?

IBM scientists demonstrate 10x faster large-scale machine learning using GPUs

7 条评论

IBM scientists demonstrate 10x faster large-scale machine learning using GPUs

7 条评论