Hello,
At my work place we have a lot of legacy C++ code that uses threads to parallelize things. We are considering to switch to Node.js and searching for sources comparing the Event loop model vs. a threads based one from perspectives such as the following: efficiency, ease of maintenance, etc. Would be thankful for any references to good constructive material.<p>EDIT: more details about the application - it is a trading app that communicates with multiple sources at high rates to gather information and send commands, but it also does quite alot of number crunching.
Take a look at the ring buffer data structure from the LMAX architecture (my talk on LMAX <a href="http://codemonkeyism.com/lmax-architecture-high-performance-seda-java/" rel="nofollow">http://codemonkeyism.com/lmax-architecture-high-performance-...</a> or <a href="http://martinfowler.com/articles/lmax.html" rel="nofollow">http://martinfowler.com/articles/lmax.html</a> from Martin Fowler).<p>They have incoming work, I/O for incoming and outgoing work is done multi threaded while work on events is single threaded.<p>The architecture is quite clever as more than one event processor can work on the data structure and event processors can have dependencies on each other. Independent processors can race past each other.<p>It was written for trading and might be portable to C++, but if you consider switching to JS it might also be ok for you to switch to Java. Their framework is called Disruptor and open source<p><a href="http://code.google.com/p/disruptor/" rel="nofollow">http://code.google.com/p/disruptor/</a><p>"LMAX aims to be the fastest trading platform in the world. Clearly, in order to achieve this we needed to do something special to achieve very low-latency and high-throughput with our Java platform. Performance testing showed that using queues to pass data between stages of the system was introducing latency, so we focused on optimising this area."
I think a lot has been written on why threads, locks etc. are hard to program (= i.e. hard to maintain) and are considered the "assembly of parallel programming".<p>As for efficiency: Threads and Processes are what the OS offers. Any parallel programming model will have to use these in one way or another, so does Node.js. The discussion is kind of similar to assembly vs. high level programming languages. You can always write an assembly program that is as fast as the program compiled from a high level language. However, it will take you a lot more time to write it.<p>In the end, it comes down to choosing the right tool for the job. Therefore you should try find out what people with a similar job choose and what their experiences are; and also for which job a framework has been created.<p>E.g. Node.js is good at a job where there are lots of events and I/O is involved. If your job is number crunching, then Node.js is the wrong tool.
I'm kind of playing in the same space, although I tend to be more a hardware/assembler guy. Im currently prototyping with nodejs and C/C++ with inline assembler for AVX crunching.<p>I would view Node.js as an effective tool for leveraging js/v8. What's attractive for me when prototyping is throwing js/HTML UI over the top without having to switch gears. Look at raw node as an IO multiplexer and dispatcher but not a compute capable platform. The GC is weak for large data sets and the CPU efficiency is extremely poor for any heavy processing - very hard to manage your cache lines efficiently. Node sweet spot is packet and stream switching in webby stacks where the solid http, ssl and so forth are invaluable. Check out fabric (I forget the name) if you want to look at extracting more from JavaScript.<p>Suggest you check out LMAX and kx systems. And think how close you can get to a pure event sourced or stream processing model.<p>You will need the equivalent of one thread per (hyper)core to maximise effective instructions per clock - whether you need to do that will depend on your sustained memory bandwidth. So there a few C/C++ threads is not necessarily
a bad thing.<p>Its possible to build a world class system in erlang with custom DSP, FPGA logic or using a GPU farm if you have the budget. This is the approach I would use if you have millions of decision sources.<p>If you want to maximise performance on x86 with a simple code base - and leverage SIMD it's hard to beat a combination of intel fortran and intel C/C++ compiler. You can roll your own messaging layer and put the compute node code in fortran - where you'll get great AVX throughput out of ifort.
Hi, I work at a company that makes heavy use of event loops. I'll try to accurately convey what I know:<p>From an efficiency standpoint, using event loops requires much less memory, but marginally more CPU time than threaded approaches.<p>In terms of maintenance, you're essentially writing your programs in continuation passing style which means error handling is explicit everywhere. There are tools that allow you to hide this like streamlinejs, Haskell's continuation monad transformer and to a lesser degree, promises/futures/deferrables. If you decide to stick with callback passing, then a good knowledge of functional programming is useful as all loops require recursion. If your application is single threaded, that tends to make finding race conditions easier, but on the whole I'd say it's harder to write good asynchronous code than it is to write good synchronous code.
Thought this was the definitive blog post on the issue:<p><a href="http://sheddingbikes.com/posts/1280829388.html" rel="nofollow">http://sheddingbikes.com/posts/1280829388.html</a><p>"epoll is faster than poll when the active/total FD ratio is < 0.6, but poll is faster than epoll when the active/total ratio is > 0.6."
Current implementation of node.js does not ensure in any way a parallelism mechanism like the system threads. Please do note that node.js event loop works within a context of a single event que.<p><i>What does that result in?</i><p>- If you have lots of code that is blocking on IO operations (like file/socket) then you will see some improvements in performance.
- If your code utilizes your cpu, then node.js will be slower then your current thread implementation.
Hi villagefool,<p>so you've lots of legacy C++, then I'd suggest you better stay there and rewrite performance critical parts effectively with ANSI C code.<p>Honestly I think StackOverflow is a better Platform for questions like this. You'll see that the answer to your question is: "Yes, use both."
source: <a href="http://stackoverflow.com/questions/953428/event-loop-vs-multithread-blocking-io" rel="nofollow">http://stackoverflow.com/questions/953428/event-loop-vs-mult...</a>
I think YCombinator is more of a community of entrepreneurs and investors, who can give you concrete advice on questions regarding technology decisions, but even though you'll also get quality answers to to CompSci questions, those type of questions are betters answered over there at SO.<p>Coding-Standards exist to allow "easier maintenance" of your application. You better check if there's an ISO-Standard for your branch, that defines the best practices in your business.<p>Here are "Google's C++ Guidelines" for example:
<a href="http://google-styleguide.googlecode.com/svn/trunk/cppguide.xml" rel="nofollow">http://google-styleguide.googlecode.com/svn/trunk/cppguide.x...</a> but you'll need Coding Guidelines that fit better to your branch. Maybe you'll find some guidelines on the pages of the SEC <a href="http://www.sec.gov" rel="nofollow">http://www.sec.gov</a><p>I think there are people who automatically build up prejudices when you say "App" to something large like a Trading-Platform, but that mustn't be the case. I believe there are also people who'll think that you know what they want when you talk about complex things in the form of an "App" (something they know). Just be aware of it.<p>Can you answer me a question?
How is Cuppertino, CA for a Software-Developer? (I won't work for Apple, just about to stay there for a while)
One year ago, I wrote an implementation of a concept I had. This implementation is a system daemon, which manages a lot of I/O work. The prototype was in bash, the actual system daemon was in C.<p>I am extremely proud of my recent decision to rewrite it in JavaScript and use Node.js. In the C implementation, I used standard FS functions (e.g. unlink, fwrite, fread) and of course, they are synchronous. In order to optimize the I/O, the entire daemon needed to be rewritten to use threads for the basic worker units, which were at least 5-6, each doing a very simple job. The alternative was to use async FS library in C, but I had to name every single callback.<p>So I scratched the C code and rewrote the daemon in Node.js. It is much faster because it manages to utilize the system resource much more efficiently, and the code is 3-4 times smaller.<p>The point is. If you have to do a lot of computation, DO NOT switch to Node or DO switch to Haskell. If you simply have to manage I/O operations, writing in Node might actually decrease complexity.
You might consider the Actor model. Event loop code can be a mess and threaded code while simpler because it appears linear eventually has such complicated runtime behaviours it also becomes a dangerous mess. And Actor model combines the best of both worlds. Messages are queued to an actor which combines a state machine plus its own thread or a slice of a thread. The state machine aspect and the centralization of the code around an actor object and the reliance only messages make it conceptually easy to understand and program. Actors can act as endpoints in protocols, services, publish/subscribe, timers etc so they offer a high level of abstraction away from lower level frameworks.
If one of my developers posted something like this, I would fire them on the spot.<p>You've got a "lot of legacy C++ code", and it seems like you're just randomly deciding whether or not to port it. You're basing this decision not on measurement, team considerations, or the needs of your project. Instead you're soliciting opinions from random people on the internet on what is a very religious issue.<p>If you have problems with your code (performance, maintainability, debugging, whatever), then go fix that. Maybe switching to Node is the right thing to do. But you'd be a fool to make that decision based on something someone said on Hacker News.
Erlang is something else you might want to look at, but you really do have to go into more detail... Node.js and Erlang are both potentially a lot slower than C++.
While not C++ related, this category page on the Tcl'ers wiki contains links to several articles discussing event loops: <a href="http://wiki.tcl.tk/_/ref?N=8558" rel="nofollow">http://wiki.tcl.tk/_/ref?N=8558</a><p>Tk has had an event loop based paradigm from its start, and straight Tcl also has the ability to explicitly enter an event loop.
It depends on many variables, including but not limited to anticipated workload, the nature of the hardware, and parallelism of the underlying workload. It would help if you gave more information about the task at hand.
Neither? _Bell Labs and CSP Threads_ by Russ Cox, <a href="http://swtch.com/~rsc/thread/" rel="nofollow">http://swtch.com/~rsc/thread/</a>, as used in #golang.
Most developers with no background in electroncis do not understand asynchronuous paradigm : Transitions = factorial(state).
In best case if they don't confuse states and transitions (wich is common) you'll end up with a spaghetti code where goto are replaced with callbacks on events. In common case they will make intricated state models without making the docs (state transition diagrams are a MUST have (like RFCs on network protocol)).
In the multi-threading context, most devs don’t fully grasp the concurrency problem (which is still an asynchronous problem).
So if you want to stay safe, use multithreading with disjoint data. Map Reduce is actually a pretty idiot proof paradigm for multi-threading. It only requires your data to be smartly shardable.<p>Executive note :
- if event model : have all state transition models DOCUMENTED ;
- if multi-threading : once you have shared a context (config), uncouple all the data passed to your thread (and handle a SIGHUP to reload the conf safely).<p>Event model done wrong will cry havoc on your code maintability the same as multitreading done wrong.
"I was thinking about switching to fad software X" is generally not a good line of thought. <i>Why</i> do you want to switch? Node.js doesn't offer parallelism, so it is obviously not a good choice for your app.<p>And "threads" is two things. There's programming in a threaded style, and there's using native operating system threads. They don't have to go together. For example, if you use a green thread library, you end up with the exact same benefits and limitations as an event loop, but with simpler, easier to understand code. A really good thread library could then handle multiplexing green threads over OS threads to get you parallelism too (see haskell).
WebWorkers are testament to the lack of mutual exclusivity between event loops and threads.<p><a href="http://en.wikipedia.org/wiki/Web_worker" rel="nofollow">http://en.wikipedia.org/wiki/Web_worker</a><p><a href="https://developer.mozilla.org/En/Using_web_workers" rel="nofollow">https://developer.mozilla.org/En/Using_web_workers</a><p>Once you start doing any heavy lifting or large amounts of parallelizable tasks you will find yourself drifting towards the threading model from within your event loop.