Timeouts and cancellation for humans (2018)

134 pointsby vthrillerabout 5 years ago

12 comments

jrockwayabout 5 years ago

Yes, properly handling timeouts and cancellation is the next frontier for programmers to conquer. I was just thinking about this the other day because some program I was using locked up, and of course it worked fine when restarted, and I began to wonder why this happens so frequently. A lot of obscure things can cause hangs, but if every blocking operation has a timeout, the number goes way down.I think it's unfortunate that even new languages still treat timeouts and cancellation as an afterthought. For example, every Go program I've ever written says:<pre><code> select { case <-ctx.Done(): return nil, ctx.Err() case thing := <-thingICareAboutCh: return thing, nil } </code></pre> Instead of:<pre><code> return <-thingICareAboutCh, nil </code></pre> The language designers thought about needing to give up on blocking operations, and then said "meh, let the programmer decide on a case-by-case basis". And that's the state of the art.(Getting off topic, this is why I avoid mutexes and other concurrency operations that aren't channels; you can't cancel your wait on them. Not being able to cancel something means that if there are any bugs in your program, you'll find out when the program leaks thousands of goroutines that are stuck waiting for something that will never happen and runs out of memory. Even if the thing they're waiting for does happen, the browser that's going to tell someone about it has long been closed, and so you'll just die with a write error when you finally generate a response. If you have a timeout and a cancellation on every blocking task, your program gives up when the user gives up, and will run unattended for a lot longer.)

评论 #22591524 未加载

评论 #22592381 未加载

评论 #22593275 未加载

评论 #22592477 未加载

rauhlabout 5 years ago

I think that this is a good example of where dynamic scope is helpful.We’re used to lexical scope: it’s easy to reason about, and it is a really good default. But sometimes it makes sense for one function to apply settings for all the functions it calls, without interfering with other functions, scopes, threads or processes (like setting a global would).It’d be nice to be able to say ‘this function should timeout within 10 ms’ and then any function called will just automatically timeout.Go’s contexts integrate timeouts and cancellation, and permit one to add any value, should one wish to, but you have to be disciplined and add a context argument to every single function. It’d be better, I think, to support it natively in the language. Lisp does this: any variable declared with DEFPARAMETER or DEFVAR is dynamic, and you can locally declare something dynamic too.One can fake dynamic scoping with thread-local storage and stacks or linked lists, if one needs it, but it can get ugly.Dynamic scoping doesn’t get the attention or respect I think it deserves. It’s arguably the wrong thing by default, but when it’s useful, it’s really useful.

评论 #22593638 未加载

评论 #22593884 未加载

评论 #22594273 未加载

评论 #22595497 未加载

评论 #22593330 未加载

ekimekimabout 5 years ago

This is something that is really nice in gevent. Under the hood it's doing something similar to what the article says - every time you make a blocking call (in gevent, this means yielding to the event loop until your event occurs), you might have a gevent.Timeout raised.Since gevent is generally used by monkey-patching all standard IO, most code doesn't even need to be aware of this feature - it just treats a timeout as an unhandled exception.From the user's perspective, it can be used simply as a context manager that cancels the timeout on exit from the block:<pre><code> with gevent.Timeout(10): requests.get(...) </code></pre> By default this will cause the Timeout to be raised, which you can then catch and handle. As a shorthand, you can also give it an option to suppress the exception, effectively jumping to the end of the with block upon timeout:<pre><code> response = None with gevent.Timeout(10, False): response = requests.get(...) if response is None: # handle timeout</code></pre>

ameliusabout 5 years ago

If you think this is difficult, then try designing an abstraction that can reliably (best effort) report progress of any operation in your program.There is a reason why most progress indicators suck, and it's because it is in general surprisingly hard to write one.

ruslanabout 5 years ago

I often wonder why SO_RCVTIMEO/SO_SNDTIMEO avaiable for sockets but not for file descriptors. Setting timeout once then using classic read/write on blocking FDs is easy, meaning error code handling is appropriate.

评论 #22593271 未加载

评论 #22592266 未加载

kc0bfvabout 5 years ago

I definitely assumed this would contain clever tips about how to handle it when coworkers don't respond to your emails, don't accomplish things they promised, or don't follow-through. Maybe some automation methods to handle those situations.

jeffreygoestoabout 5 years ago

Very systematic and accessible description of the problem and various alternatives including their origins. I learned a lot reading the post, thank you!

dwohnitmokabout 5 years ago

I'm not convinced the task-based approach doesn't work. Perf-wise there's no reason that tasks have to have the overhead of threads.Syntactically, I think it is worth distinguishing between things that can time out and things that can't, because you usually need to do some sort of cleanup on timeout.In fact as far as I can tell, the cancel scopes provided by Trio with the async await syntax are exactly isomorphic to Scala's tasks from the cats library (where they are called IO).Also I'm not sure I understand the author's preference for thinking of timeouts as level-triggered rather than edge-triggered. While it's an interesting way of thinking about the problem, and would be the natural way a timeout is implemented in an e.g. FRP system (a lot of flavors of FRP are essentially entirely level-triggered systems), it doesn't seem like the way you'd implement things in a non-reactive system. What's wrong with just killing the entire tree of operations (as is usual when you propagate say, an exception) on a timeout, or from a top-down manner when you put a timeout on a task?Timeouts are fundamentally tied with concurrency (they are a concurrent operation: you're racing the clock against your code and seeing who wins) and to me the tricky thing about timeouts is exactly the same trickiness that you face with concurrency, namely shielding critical sections. How you decide to pass timeout arguments seems like a secondary concern. Just like with normal concurrency, you need to make sure that certain critical sections are atomic with respect to timing out, either by disallowing timeouts during that critical section (you therefore need to make the critical section as small as possible, ideally a single atomic swap operation) or implementing a reasonable form of rollback. (Of course you can always take the poll-based approach where you poll for timeout status, but again this is just a specialization of a general concurrency strategy)

noelwelshabout 5 years ago

FP libraries have pretty much solved this IMO. You create a value that describes what you want to happen and that description can include cancellation if some condition is met (e.g. it takes too long). There are limitations imposed by the runtime on what you can actually cancel (e.g. I don't believe all OS calls can be interrupted) but beyond that it works as specified.Here's one example of such a library, though without a bit of FP background it probably doesn't make a great deal of sense:<a href="https://typelevel.org/cats-effect/typeclasses/concurrent.html" rel="nofollow">https://typelevel.org/cats-effect/typeclasses/concurrent.htm...</a>

评论 #22592064 未加载

dirtydroogabout 5 years ago

Boost.ASIO (C++) does not expose the SO_RCVTIMEO socket options and instead makes you use a deadline_timer explicity. It's very annoying but this article kind of explains why it is that way.

BiteCode_devabout 5 years ago

My first though was "oh, another covid-19 article". The human mind is funny.

评论 #22590935 未加载

评论 #22590964 未加载

saurikabout 5 years ago

I have spent way too much of my time as a developer over the years hacking on software to remove ill-conceived timeouts where some developer said--sometimes not even in one place but for some insane reason at every single level of the entire codebase--"this operation couldn't possibly take longer than 10 seconds"... and then it does, because my video is longer than they expected or I have more files in a single directory than they expected or my network is slower than they expected (whether because I have more packet loss or more competition or more indirection) or my filesystem had more errors to fix during fsck than they expected or I had activated more plugins than they expected or I had installed more fonts than they expected or I had more email that matches my search query than they expected or more people tried to follow me than then expected (for months back when Instagram was new I seriously couldn't open the Instagram app because it usually took more than the magic 10 seconds--an arbitrary timeout from Apple--to load my pending follower request list for my private account; the information would get increasingly cached every load so if I ran the app over and over again eventually it would work) or my DNS configuration was more broken than they expected or I had a more difficult-to-use keyboard than they expected or I had more layers of security on my credit card than they expected or any number things that they didn't expect (can you appreciate how increasingly specific these examples started becoming, as I started having horrifying flashbacks of timeouts I had to remove because some idiot developer decided they could predict how long something could take and then aborted the operation, which seems like the worst possible way of handling that situation? :/). Providing the user a way to cancel something is great, but programming environments should make timeouts maximally difficult to implement, preferably so complex that no one ever implements them at all (and yes, I appreciate that this is a pipe dream, as a powerful abstraction tends to make timeouts sadly so easy people strew them around liberally... but certainly no timeout arguments should be provided on any APIs lest someone arbitrarily guess "10 seconds"): if the user, all the way up at the top of the stack, wants to give up, they can press a cancel button. And to be clear: I don't think timeouts are something mostly just amateur programmers tend to get wrong and which can be used effectively by experts (as is the case with goto statements or random access memory or multiple inheritance)... I have never seen a timeout--a true "timeout" mind you, as opposed to an idempotent retry (where the first operation is allowed to still be in flight and the second will, without restating, merge with the first attempt as opposed to causing a stampede; these make sense when you have lossy networks, for example)--in a piece of software that was a feature instead of a bug, where the software would not have been trivially improved by nothing more than simply deleting the timeout, and I would almost go so far as to say they are theoretically unsound.

评论 #22592758 未加载

评论 #22592529 未加载

评论 #22592437 未加载

评论 #22592254 未加载