Async Iterators: These Promises Are Killing My Performance

71 pointsby akuberaover 7 years ago

15 comments

I agree that the decision that Promise continuations run always in a new eventloop iteration will have a serious impact in performance. Nevertheless I think it is a great design, because it provides you with lots of guarantees. I have worked a lot with C# Promises (Tasks), where the implemenation allows synchronous completions, and that gives way more room for errors. E.g. whenever you call TaskCompletionSource.SetResult(xyz) you might directly jump into the completion handler which executes arbitrary code (and may even call recursively back into your code). If you don't know that, don't know when it happens (understanding all the SynchronizationContext stuff) and don't take precautions against it (like checking for state modification after promise resolval) you might run into hard to find errors. Similar things can happen on the using side. In JS-land a continuation will always run in the next eventloop iteration, which guarantees that code after Promise.then() will still get executed before it. In other designs the continuation might run directly in .then sometimes, which is also another thing to account for. In my opinion the JS promise spec provides the most sensible and least error prone behavior. In my opinion that's great, because for the average programmer correctness is more important than performance. The smaller amount of coded that really needs to have a very high performance can still be hand-optimized, e.g. by falling back to synchronous callbacks where it matters, or for the usecases of the link reading bigger parts of the file at once and then performing more of the parsing synchronously.

评论 #15112673 未加载

jkremsover 7 years ago

Not sure which spec the article is referring to but it's not the one implemented by V8 natively.> The problem is that Promises have to be resolved in the next turn of the event loop. This is part of the spec.That statement is false unless I'm missing something major here. Promises aren't resolved in the next turn of the event loop. They should be resolved as part of the microtask queue which can happen at the end of the tick (recursively).Example (tested on node 8):<pre><code> Promise.resolve() .then(() => console.log('1')) .then(() => console.log('2')) .then(() => console.log('still the same tick!')); setTimeout(() => console.log('timeout happening in the next tick'), 0); </code></pre> Result:<pre><code> 1 2 still the same tick! timeout happening in the next tick </code></pre> The same will happen with async functions:<pre><code> (async () => { console.log(await Promise.resolve('1')); console.log(await Promise.resolve('2')); console.log(await Promise.resolve('still the same tick!')); })(); setTimeout(() => console.log('timeout happening in the next tick'), 0);</code></pre>

评论 #15113143 未加载

jerfover 7 years ago

"Unfortunately, generators and iterators are synchronous. Unless we’re willing to read our entire file into memory, they can’t help us."This doesn't seem to follow to me. Synchronous reading means you have to read a given chunk in before you can process it, which means that if you're not reading out of cache that you're going to stall on the spinning disk, but synchronous code doesn't require you to read the whole file at once. And you can easily write something that will process it into nice lines for you. The "standard library" may not do that because there isn't much standard library for Javascript, but it's not that hard.The proof is, that's what the Python snippet is doing. It's built into Python in this case, and I would expect it does some magic internal stuff to accelerate this common case (in particular doing some scanning at the C level to see how long of a Python string to allocate once, instead of using Python-level string manipulation), but the API ought to be one you can do in Node, even synchronously, with generators.

评论 #15112207 未加载

评论 #15112164 未加载

sp527over 7 years ago

If you're using Node.js as a stateless web server (the principle use case motivating its design), when are you reading files off the local disk other than in the initialization phase? What's the problem with applying the 'right tool for the job' mindset and defaulting to other platforms for more generalized computing tasks?

评论 #15113883 未加载

iamleppertover 7 years ago

Not a scientific test:<pre><code> var fs = require(‘fs’), csv = require(‘csv-parser’); var start = new Date().getTime(); fs.createReadStream(‘./Batting.csv’) .pipe(csv()) .on(‘data’, function() {}) .on(‘end’, function() { console.log(new Date().getTime() — start); }); </code></pre> This takes about 400 ms on my machine to parse a 6.1 MB CSV (15.25 MB/s). I didn’t do anything useful with the data, but I did verify it exercises the parsing function.Clearly all the work/time is spent in the actual parsing and handling corner-cases in CSV trying to be correct over high performance/throughput.

xpaulbettsxover 7 years ago

> (note: I’m using TypeScript to convert the async iterators and generators into something that node.js can run. See my code here.)This is your problem, TypeScript's Promises polyfill is really really slow. Switch TypeScript to emit ESNext and run it on Node 8 and you'll see a much much different situation.

评论 #15118572 未加载

sebcatover 7 years ago

I've written event-driven, non-blocking style C for a couple of different projects on POSIX platforms. When a file descriptor becomes readable, you read from it until read(2) returns EAGAIN[1] or until a condition is fulfilled, such as the read buffer containing a delimiter. People new to event-driven programming may just invoke read(2) once for each readable event, check for a condition and then yield to the event loop. This sounds like the same mistake, only with more turtles stacked on top of it.[1] This applies to sockets and pipes (incl. stdin), not regular file I/O which usually ends up being off-loaded to a thread/process pool (if needed)

评论 #15112403 未加载

评论 #15114246 未加载

skybrianover 7 years ago

re: "Promises have to be resolved in the next turn of the event loop. This is part of the spec."This seems imprecise? There are multiple queues and the details seem to be implementation-specific [1]. The Promises/A+ specification doesn't seem to specify a particular implementation [2].The reason this might matter is that if you use promises to do something CPU-intensive without doing I/O, it might be possible to starve the event queue, similar to what happens if you're just writing a loop.On the other hand, the example in the article is waiting on reads, so they'll be delivered through the event queue. Can reading larger chunks at a time and buffering be used to speed that up?[1] <a href="https://github.com/nodejs/node/issues/2736#issuecomment-138607657" rel="nofollow">https://github.com/nodejs/node/issues/2736#issuecomment-1386...</a>[2] "This can be implemented with either a 'macro-task' mechanism such as setTimeout or setImmediate, or with a 'micro-task' mechanism such as MutationObserver or process.nextTick. Since the promise implementation is considered platform code, it may itself contain a task-scheduling queue or 'trampoline' in which the handlers are called."

评论 #15113202 未加载

mighty_banderover 7 years ago

The number of excuses for node's poor performance here is shocking. Talk about high-level architectural concepts and quibble about the likelihood of this implementation being necessary if you like, but if you can't provide a decent implementation for this very basic use case, then node has a hole in it.

评论 #15114307 未加载

bluepnumeover 7 years ago

The biggest problem I had with forced promise asynchronicity was with browser promise polyfills, which use setTimeout to force success and error handlers to be called asynchronously.This is fine, unless you're:1. Modifying the DOM and hoping to avoid jankiness 2. Trying to work with popup windows, in a browser which deprioritizes setTimeout in all but the focused window.I ended up implementing <a href="https://github.com/krakenjs/zalgo-promise" rel="nofollow">https://github.com/krakenjs/zalgo-promise</a> to get around this. It intentionally "releases Zalgo" by allowing promises to be resolved/then'd synchronously, but my belief is this doesn't have to cause bugs if it's used consistently.

评论 #15118620 未加载

shinypotatoeover 7 years ago

Interesting and surprising. Node 8 supports async/await as well as iterator functions. So the target for the typescript transpiler can be set to 'es7'. Would be interesting if this has an inpact on the performance.

评论 #15112562 未加载

ilakshover 7 years ago

Would be nice to say what version of Node.js.

评论 #15112196 未加载

ameliusover 7 years ago

My problem with promises is that you can't "uninvoke" them. I.e., if all listeners unsubscribe, the original job they were waiting for could be killed, but the design of Promises doesn't allow that. Instead, the job will continue to be executed (without any listeners), and thus wasting resources.

评论 #15112774 未加载

评论 #15112881 未加载

fiboover 7 years ago

For efficient text processing the only language that I'd use is Perl5.

评论 #15113209 未加载

egeozcanover 7 years ago

In the documentation, there is a specific example for reading a file line by line, and it is very concise[1]. I'm too lazy to test how fast it is though.[1]: <a href="https://nodejs.org/api/readline.html#readline_example_read_file_stream_line_by_line" rel="nofollow">https://nodejs.org/api/readline.html#readline_example_read_f...</a>