Ask HN: On Rob Pike's Concurrency is not Parallelism?

51 pointsby lazydonalmost 13 years ago

This is regarding slides by Rob Pike with above title. Every time I go thru this I feel like a moron. I'm not able to to figure out the gist of it. It's well understood that concurrency is decomposition of a complex problem into smaller components. If you cannot correctly divide something into smaller parts, it's hard to solve it using concurrency.But there isn't much detail in slides on how to get parallelism once you've achieved concurrency. In the Lesson slide (num 52), he says Concurrency - "Maybe Even Parallel". But the question is - When and How can concurrency correctly and efficiently lead to Parallelism?My guess is that, under the hood Rob's pointing out that developers should work at the level of concurrency - and parallelism should be language's/vm's concern (gomaxprocs?). Just care about intelligent decomposition into smaller units, concerned only about correct concurrency - parallelism will be take care by the "system".Please shed some light.Slides: http://concur.rspace.googlecode.com/hg/talk/concur.html#title-slide HN Discussion: http://news.ycombinator.com/item?id=3837147

12 comments

dangetsalmost 13 years ago

I believe the concept says to focus more on task-based parallelism rather than data-based parallelism.In Go, it is easy to create multiple tasks/workers each with a different job. This is implicitly parallelizable - each task can (but doesn't have to) work within their own thread. The only time when the workers can't run in parallel is when they are waiting on communication from another worker or outside process.This is opposed to data level parallelism where each thread is doing the nearly exactly same instructions on different input, with little to no communication between the threads. An example would be to increase the blue level on each pixel in an image. Each pixel can be operated on individually and be performed in parallel.So - the push is for more task-based parallelism in programs. It is very flexible in that it can run actually in parallel or sequentially and it won't matter on the outcome of the program.

评论 #4305794 未加载

arebopalmost 13 years ago

I think his point is that although people are often interested in parallel behavior, they should really focus more on concurrent design to avoid/remove the dependencies that ultimately limit parallelism. Slide 19 mentions automatic parallelization, but his point is that developers should think more about concurrency and not that Go will automatically maximally parallelize concurrent programs.

评论 #4313840 未加载

aphyralmost 13 years ago

Concurrency is more than decomposition, and more subtle than "different pieces running simultaneously." It's actually about causality.Two operations are concurrent if they have no causal dependency between them.That's it, really. f(a) and g(b) are concurrent so long as a does not depend on g and b does not depend on f. If you've seen special relativity before, think of "concurrency" as meaning "spacelike"--events which can share no information with each other save a common past.The concurrency invariant allows a compiler/interpreter/cpu/etc to make certain transformations of a program. For instance, it can take code like<pre><code> x = f(a) y = g(b) </code></pre> and generate<pre><code> y = g(b) x = f(a) </code></pre> ... perhaps because b becomes available before a does. Both programs will produce identical functional results. Side effects like IO and queue operations could strictly speaking be said to violate concurrency, but in practice these kinds of reorderings are considered to be acceptable. Some compilers can use concurrency invariants to parallelize operations on a single chip by taking advantage of, say, SIMD instructions or vector operations:<pre><code> PIPELINE1 PIPELINE2 x = f(a) y = g(b) </code></pre> Or more often:<pre><code> [x1, x2, x3, x4] = [f(a1), f(a2), f(a3), f(a4)] </code></pre> where f could be something like "multiply by 2".Concurrency allows for cooperative-multitasking optimizations. Unix processes are typically concurrent with each other, allowing the kernel to schedule them freely on the CPU. It also allows thread, CPU, and machine-level parallelism: executing non-dependent instructions in multiple places at the same wall-clock time.<pre><code> CPU1 CPU2 x = f(a) y = g(b) </code></pre> In practice, languages provide a range of constructs for implicit and explicit concurrency (with the aim of parallelism), ranging from compiler optimizations that turn for loops into vector instructions, push matrix operations onto the GPU and so on; to things like Thread.new, Erlang processes, coroutines, futures, agents, actors, distributed mapreduce, etc. Many times the language and kernel cooperate to give you different kinds of parallelism for the same logical concurrency: say, executing four threads out of 16 simultaneously because that's how many CPUs you have.What does this mean in practice? It means that the fewer causal dependencies between parts of your program, the more freely you, the library, the language, and the CPU can rearrange instructions to improve throughput, latency, etc. If you build your program out of small components that have well-described inputs and outputs, control the use of mutable shared variables, and use the right synchronization primitives for the job (shared memory, compare-and-set, concurrent collections, message queues, STM, etc.), your code can go faster.Hope this helps. :)

thebigshanealmost 13 years ago

I don't think he's saying that you shouldn't concern yourself at all with parallelism; only that you should focus on concurrency first and that will lead to easier parallelism. And he I think he is saying that decomposition and concurrency helps non-parallel programs stay simple and easy to understand. The benefits of concurrency are greater than just parallelism.I think that is about what you are saying.

jberrymanalmost 13 years ago

Obviously both terms get used in a variety of overlapping ways. Without looking at how the terms are used in the slides you refer to, I think the proper definitions are:Concurrency is a property of a program's semantics, usually seen in a 'thread' abstraction. The most important part of concurrency is nondeterminism. Concurrency might permit parallelism depending on hardware, language runtime, OS, etc.Parallelism is a property of program execution and means multiple operations happening at once, in order to speed up execution. A program written to take advantage of parallelism can be deterministic, but often is accomplished by way of concurrency in OS threads. Because most languages still suck.

Xantixalmost 13 years ago

The difference between Concurrency and Parallel is in my opinion somewhat subjective, depending on how you view the problem.Examples of Concurrency:1. I surf the web And I run an installer for another program.2. One gopher brings empty carts back, while another brings full carts to the incinerator.The idea of concurrency is that two completely separate tasks are being done at the same time. There may be synchronization points between the two tasks, but the tasks themselves are dissimilar.Viewed in one way moving empty wheelbarrows may be completely different from moving filled ones.Viewed in another way, they might seem very similar.Concurrency has to do with task parallelism.Parallel has to do with data parallelism.There's a gray line between the two where you can't clearly differentiate between them.

dkerstenalmost 13 years ago

IMO, they solve two different goals. Sometimes these goals overlap, but not always. Please someone correct me if I'm wrong, but this is how I see it:The goal of concurrency is to model a problem that is easier or better or more natural to model concurrently (that is, different parts are running simultaneously). For example, if you are simulating agents in some virtual world (eg a game), then it may make sense that these agents are being modeled in a way that they are all running simultaneously and the processing of one does not block the processing another. This could be done by timeslicing available processing between each agent (either by using the processor/OS pre-emptive multitasking, if available, or through cooperative multitasking[1]), or is could be done by physically running multiple agents at the same time on different processors or cores or hardware threads (parallelism). The main point is that concurrency may be parallel, but does not have to be and the reason you want concurrency is because it is a good way to model the problem.The goal of parallelism is to increase performance by running multiple bits of code in parallel, at the same time. Concurrency only calls for the illusion of parallelism, but parallelism calls for real actual multiple things running at the exact same time and so you must have multiple processors or cores or computers or whatever hardware resources for parallelism, while concurrency can be simulated on single core systems. Parallel code is concurrent code, but concurrent code is not necessarily parallel code.Distributed programming is parallel programming where the code is running in parallel, but distributed over multiple computers (possibly over the internet at distant locations) instead of running locally on one multi-core machine or a HPC cluster.From stackoverflow[2]:<pre><code> Quoting Sun's Multithreaded Programming Guide: Parallelism: A condition that arises when at least two threads are executing simultaneously. Concurrency: A condition that exists when at least two threads are making progress. A more generalized form of parallelism that can include time-slicing as a form of virtual parallelism. </code></pre> As for when can concurrency be turned into parallelism, that depends. Assuming that the hardware resources exist (or that it simply falls back to time-sliced concurrency if they do not), parallelism can be achieved if there are multiple things that can execute independently. There are at least three types of parallelism and if your problem, code and/or data fit one of these, then your concurrent code may be executed in parallel.1. You have a number of items of data and each one can be processed independently and at the same time. This is the classic embarrassingly parallel data parallelism. For example, you have a large array of input data and an algorithm that needs to run on each one, but they do not interact. Calculating pixel colours on your screen, or handling HTTP requests, for example.2. You have two or more independent tasks doing different things that are run in parallel. For example, you have one thread handling the GUI and another thread handling audio. Both need to run at the same time, but both run independent of each other with minimal communication (which can happen over a queue, perhaps).3. Sometimes you have a stream of data which must be processed by a number of tasks one after the other. Each task can be run in parallel so that if you have a stream of data, item[0], item[1], item[2], etc (where 0 is first in the stream, 1 is next and so on) and a number of tasks that need to run in order: A, B, C - then you can run A, B and C in parallel such that A processes item[0] while B and C are idle, then B processes item[0] an A processes item[1] and C is idle. Then C processes item[0] while B processes item[1] and A processes item[2] and so on. This is called pipelining and as you probably know is a very common technique inside processors.Of course, all three can be combined.[1] Could be a coroutine which is yielded or simply by executing some kind of update function which, by contract, must not block[2] <a href="http://stackoverflow.com/questions/1050222/concurrency-vs-parallelism-what-is-the-difference#1050257" rel="nofollow">http://stackoverflow.com/questions/1050222/concurrency-vs-pa...</a>

halaylialmost 13 years ago

Take a look at lthread:<a href="https://github.com/halayli/lthread" rel="nofollow">https://github.com/halayli/lthread</a>lthread supports concurrency and parallelism using pthreads. Each lthread scheduler runs its own lthreads concurrently, or better said, one at a time. But from an observer's perspective they look like they are running in parallel.Now if you create 2 pthreads on a 2 core machine and each runs an lthread scheduler then you have true parallelism because you can have 2 lthreads running in parallel at the same time. One by each scheduler.I feel this is a closer context to what Rob is discussing than what I found in the comments here.

评论 #4306331 未加载

Mr_T_almost 13 years ago

Related to this topic: <a href="http://existentialtype.wordpress.com/2011/03/17/parallelism-is-not-concurrency/" rel="nofollow">http://existentialtype.wordpress.com/2011/03/17/parallelism-...</a>

the1almost 13 years ago

Parallelism is when you run your program on multiple processors. Semantics of your program does not change whether you run it on single processor or multiple processors.Concurrency is when you write your program using multiple threads. Your program looks and means vastly different if you use threads.You use concurrency not for performance gain, but for clarity of your program. You use parallelism for performance gain, to utilize all your processors.

评论 #4305667 未加载

评论 #4306125 未加载

Mr_T_almost 13 years ago

I would love to see a video of the actual talk. I have searched for it multiple times but I couldn't find anything.

评论 #4321514 未加载

seunosewaalmost 13 years ago

I thought he was just giving an excuse for the abysmal multi-core scaling of idiomatic Go programs.

评论 #4306091 未加载

12 comments

dangetsalmost 13 years ago

评论 #4305794 未加载

arebopalmost 13 years ago

评论 #4313840 未加载

aphyralmost 13 years ago

thebigshanealmost 13 years ago

jberrymanalmost 13 years ago

Xantixalmost 13 years ago

dkerstenalmost 13 years ago

halaylialmost 13 years ago

评论 #4306331 未加载

Mr_T_almost 13 years ago

Related to this topic: <a href="http://existentialtype.wordpress.com/2011/03/17/parallelism-is-not-concurrency/" rel="nofollow">http://existentialtype.wordpress.com/2011/03/17/parallelism-...</a>

the1almost 13 years ago

评论 #4305667 未加载

评论 #4306125 未加载

Mr_T_almost 13 years ago

I would love to see a video of the actual talk. I have searched for it multiple times but I couldn't find anything.

评论 #4321514 未加载

seunosewaalmost 13 years ago

I thought he was just giving an excuse for the abysmal multi-core scaling of idiomatic Go programs.

评论 #4306091 未加载