TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Ask HN: How to *really* learn concurrency,parallelism?

10 点作者 bonobo3000将近 9 年前
Hello,<p>I&#x27;ve been interested in concurrency for a while now and I&#x27;ve been picking up bits and pieces by stumbling across great posts&#x2F;videos once in a while or working with them at my job. So far I&#x27;m sort of comfortable with threads (not the gory JVM details though) and Akka, and a tiny bit of futures&#x2F;promises and co-operative concurrency type stuff with pythons tornado.<p>I want to go deeper, really grok all the different models and be able to make informed decisions about which model us appropriate when, understand how they map to lower levels, generally gain a comprehensive understanding. I&#x27;m currently going through Java Concurrency in Practice.<p>My question - what resources will help me get a comprehensive understanding of concurrency &amp; parallelism? what toy project would you recommend where i can implement the same thing with lots of different models to understand the difference?<p>Thanks in advance!

3 条评论

nostrademons将近 9 年前
<i>Book rec</i>:<p><a href="https:&#x2F;&#x2F;www.amazon.com&#x2F;Concepts-Techniques-Models-Computer-Programming&#x2F;dp&#x2F;0262220695" rel="nofollow">https:&#x2F;&#x2F;www.amazon.com&#x2F;Concepts-Techniques-Models-Computer-P...</a><p><i>Language &amp; library recs</i>:<p>Java is actually a pretty shitty language to learn concurrency on, because the concurrency primitives built into the language &amp; stdlib are stuck in the 1970s. There&#x27;ve been some more recent attempts to bolt more modern concurrency patterns on as libraries (java.concurrent is one; Akka is another; Quasar is a third), but you&#x27;re still very limited by the language definition. Some other languages to study:<p>Erlang, for message-passing &amp; distributed system design patterns. Go has a similar concurrency model, but not as pure.<p>Haskell for STM.<p>Python3.5&#x2F;ES2017&#x2F;C#, for async&#x2F;await &amp; promises. Actually, for a more pure implementation of promises, check out E or the Cap&#x27;n Proto RPC framework.<p>Rust, for mutable borrowing. Rust&#x27;s concurrency story is fairly unique; they try to prove that data races <i>can&#x27;t</i> exist by ensuring that only one reference is mutable at once.<p>JoCaml for the join calculus. Indeed, learning formal models like CSP, the pi-calculus, or the join-calculus can really help improve your intuitions about concurrency.<p>Hadoop for MapReduce-style concurrency. In particular, learning how you might represent, say, graph algorithms on a MapReduce system is a great teacher. Also look at real-time generalizations of MapReduce paradigms like Storm or Spark.<p>Paxos &amp; Raft for the thorny problems in distributed consensus.<p>Vector clocks, operational transforms and CRDTs. One approach to concurrency is to make it not matter by designing your algorithms so that each stage can be applied in arbitrary order (or can compensate for other operations that have occurred in the meantime). That&#x27;s the idea behind this, and it perhaps has the most long-term promise.<p><i>Project &amp; job recs</i>:<p>The best way to really learn concurrency is to take a job at a company that has to operate at significant scale. Google or Facebook are the most prominent, but any of the recent fast-growers (AirBnB, Uber, Dropbox, probably even Instacart or Zenefits) will have a lot of problems of this type.<p>Failing that, I&#x27;ve found that implementing a crawler is one giant rabbithole in learning new concurrency techniques. The interesting thing about crawlers is that you can implement a really simple, sequential one in about 15 minutes using a language&#x27;s standard library, but then each step brings a new problem that you need to solve with a new concurrency technique. For example:<p>You don&#x27;t want to wait on the network I&#x2F;O, so you create multiple threads to crawl multiple sites at once.<p>You quickly end up exhausting your memory, because the number of URLs found on pages grows exponentially, and so you transition to a bounded thread pool.<p>You add support for robots.txt and sitemaps. Now you have immutable data that must be shared across threads.<p>You discover some URLs are duplicates; now you need shared mutable state between your fetch threads.<p>You start getting 429 and 403 request codes from the sites, telling you to back off and stop crawling them so quickly. Now you need a feedback mechanism from the crawl threads to the crawl scheduler, probably best implemented by message queues.<p>You want to process the results of the crawl. Now you need to associate the results of multiple fetches together to run analyses on it; this is what MapReduce is for.<p>You need to write out the results to disk. This is another source of I&#x2F;O, but with different latency &amp; locking characteristics. You either need another thread pool, or you want to start looking into promises.<p>You want to run this continuously and update a data store. Now you need to think about transactions.
评论 #12067518 未加载
评论 #12069262 未加载
评论 #12068294 未加载
ruslan_talpa将近 9 年前
Parallel and concurrent programming is easy in Haskell (because of the amazing work on lower levels) <a href="http:&#x2F;&#x2F;chimera.labs.oreilly.com&#x2F;books&#x2F;1230000000929" rel="nofollow">http:&#x2F;&#x2F;chimera.labs.oreilly.com&#x2F;books&#x2F;1230000000929</a><p>Switch to haskell and you will no longer view &quot;parallel&#x2F;concurrent&quot; as a hard problem that you need to solve
dudul将近 9 年前
There is a book &quot;7 concurrency models in 7 weeks&quot;. I haven&#x27;t read it (yet) but I assume that it&#x27;s a good fit for what you want.