Always use [closed, open) intervals

211 点作者 smukherjee19超过 2 年前

35 条评论

rav超过 2 年前

Half-open intervals are why I try as much as possible to stay away from languages that use 1-based indexing (Lua, Julia, Matlab, R, ...) - 1-based indexing lends itself to closed intervals because an array of N elements has [1,N] as its index range, whereas 0-based indexing lends itself to half-open intervals because an array of N elements has [0,N) as its index range.-------However, I know of one case where closed intervals really shine. Consider displaying a zoomable map in tiles. On a given zoom level, each tile has some coordinates (x;y) where x and y are integers denoting the virtual column and row. Suppose that we allow zooming out by a factor 2, so that two-by-two tiles are aggregated into a single tile. Then a natural choice for the coordinates of the zoomed-out tile are (floor(x/2);floor(y/2)), that is, divide by two and round down. Suppose that a dataset has data on tile coordinates [x1,x2]×[y1,y2], meaning that there's only data on tiles (x;y) where x1≤x≤x2 and y1≤y≤y2. These are closed intervals, but stay with me - the reason they are nice in this case is because of how you compute the range of valid tile coordinates when you zoom out: The range becomes [floor(x1/2),floor(x2/2)]×[floor(y1/2),floor(y2/2)] - that is, you simply divide the range endpoints by two and round down. If you try to do this with half-open intervals, then you need some +1/-1 shenanigans, which are normally what I try to avoid by going for half-open intervals.

评论 #33707466 未加载

评论 #33707246 未加载

评论 #33713789 未加载

评论 #33706162 未加载

评论 #33708389 未加载

ChrisMarshallNY超过 2 年前

> Never, ever, ever use [closed, closed] intervalsI’m not really a fan of “never,” or “always” rules, when it comes to programming. I’ve found it’s usually better to have a “make sure to justify deviations from” heuristics.I usually use [closed..open) ranges (as they are called in Swift), but sometimes, an inclusive range is a lot more appropriate, for expressing an operation (for example, I may express a range as start...end, as opposed to start..<(end + 1)).

评论 #33706184 未加载

评论 #33708577 未加载

评论 #33704259 未加载

评论 #33705223 未加载

评论 #33706097 未加载

评论 #33707231 未加载

elcaro超过 2 年前

I'll repeat the sentiment that "always" and "never" rules usually have their fair share of exception.And while I agree that [closed, open) intervals are often the best choice... sometimes what you want is [closed, closed], or (open, open), and it's nice to use a language that makes that easy.For example, Raku makes it easy to do:<pre><code> [closed, closed] with ($a .. $b) [closed, open) with ($a ..^ $b) (open, open) with ($a ^..^ $b) (open, closed] with ($a ^.. $b)</code></pre>

dragontamer超过 2 年前

Ehh, this post misses the important tidbid.You should keep your code consistent across your organization, so that a large number of programmers knows how your code works. You should have a "default writing style", and the "default writing style" should be used unless you have very, very, very good reasons to avoid it. (And an errant +1 or -1 here and there isn't a good enough reason to switch).There are four styles of intervals. Lets say you want to represent a loop of 5 iterations numbered [555, 556, 557, 558, 559]. You've got:* [closed, closed] -- [555, 559]* [closed, open> -- [555, 560>* <open, closed] -- <554, 559]* <open, open> -- <554, 560>There's not much difference to any of these four. As long as you pick a singular choice, get comfortable with its quirks, and make it consistent across your organization, you get benefits.The main reason we do [closed, open> is because Dijkstra (father of structured programming back in the 1960s), argued to use [closed, open>, when presented with all these options. (and argued for zero-indexed as well).The "[closed, closed]" set is one-too-small (559 - 555 == 4), so you need to add +1 to the representation.The <open, open> set is one-too-large: (560 - 554) == 6. So this too seems prone to off by one error.[closed, open> and <open, closed] are both the correct size of 5 when you subtract, but both "includes" a number that doesn't exist. In [closed, open>, the latter number isn't part of the array (560 is "one past the end), while in <open, closed>, the first number isn't part of the array.Make that what you will of it. [closed, open> became a programmer convention because of these reasons. The important bit is to know all the quirks / off by one errors associated with this representation.

评论 #33706624 未加载

评论 #33707059 未加载

评论 #33707286 未加载

评论 #33707187 未加载

jmull超过 2 年前

Half-closed is generally good because it tends to reduce off-by-one errors.But this article overstates the case. Especially for floating-point, where the distinction between a The “splitting by time” section should probably just be removed, since it confuses the point and doesn’t add anything. The scenario doesn’t really make sense (if you wanted this you’d store the registration time and get the hour you’d be better off by truncating the value, not using an interval). Also, if you’re going to be doing math on time values, you better know the precision of the values you’re working with (among many other things). Intervals, however closed or open aren’t going to help you there.Maybe I shouldn’t criticize so much, since I agree with the general point. But this makes the case awkwardly.

评论 #33706357 未加载

评论 #33708283 未加载

eigenspace超过 2 年前

There are convincing cases one can make in favour of half-open intervals (at least in certain circumstances), but this isn’t it.This is just a rambling, absolutist mess.

评论 #33704274 未加载

评论 #33703431 未加载

igammarays超过 2 年前

Why would you want a half-open interval when booking an AirBnB or flight? If I search for flights from February 24 to February 24 I don’t expect the empty interval.

评论 #33705322 未加载

评论 #33705154 未加载

评论 #33706758 未加载

评论 #33705637 未加载

评论 #33706030 未加载

评论 #33706124 未加载

chenglou超过 2 年前

Quite a few APIs use a pair of `{start, length}` instead, which in the context of the post's example, is even clearer. Empty interval would be `length == 0`, time interval would be a single array of `starts`, etc. Fewer subtractions (to get length) usually end up nicer too.

BiteCode_dev超过 2 年前

It's the default in Python, and it's more helpful that not, so I would say it's a good design decision. But "always" is a dangerous word in engineering, and in this case, definitly not warranted.Case in point, last week I worked with list ofdates, and I needed the last date to bracket my sliding windows as a time period cleanly.

knorl超过 2 年前

I think this advice could be better summed up into: to minimise off-by-one errors, choose a consistent strategy for describing intervals, and stick with it as much as is sensible.

评论 #33703882 未加载

branko_d超过 2 年前

Just for completeness I'll mention that there is another style:<pre><code> start, count </code></pre> This seems to be popular in .NET ecosystem.

评论 #33708752 未加载

评论 #33708107 未加载

vglocus超过 2 年前

I have recently been on the other side of this argument for specific case.In our case we (users of our API) are to specify date ranges, representing a list of partitions. So we are not counting nights between dates, but rather a set of daily or hourly buckets.Here (maybe even only here) I argue that inclusive ranges feel more intuitive.I find it much more intuitive to represent the 1st 7 days of January as['2022-01-01', '2022-01-07']compared to['2022-01-01', '2022-01-08').Another very common example is to specify the last 7 days (incl 'today') in which case I find[today().minusDays(6), today()]to be a clearer representation than['today().minusDays(6)', 'today().plusDays(1)')

评论 #33706368 未加载

andreareina超过 2 年前

*half-open intervals. There have been times I needed an (open, closed] interval. But there have also been times when I wanted a fully closed interval because I was setting an arbitrary limit and it's easier to tell a user "100 is the maximum" versus "it must be below 100”, so what value should be put to max out out, 99? 99.99? etcTo add some nuance I'd say that if you're dividing a larger interval into smaller subintervals then a half-open one is probably what you want.

zasdffaa超过 2 年前

I know Dijkstra's paper and it's short, good and should be read but this article is wrong in saying always. It feels like a newbie programmer came across a good thing then lost all proportion; use the right tool for the right job, as ever.

评论 #33717986 未加载

评论 #33706345 未加载

the_cramer超过 2 年前

This depends on use case. If you are doing a frontend layer on top of closed-open, then the frontend will have to handle the points the article is rambling about.Users are used to selecting a daterange in closed closed format for example.

评论 #33703623 未加载

ur-whale超过 2 年前

Wholeheartedly agree that sticking - where possible - to [closed, open) is a good idea. It has helped me tremendously when implementing, e.g. computational geometry algorithm. Robust triangle rasterization comes to mind.Another interesting point: in the weird corner of the world where I grew up, half-open intervals were always denoted : [low_bound, hi_bound[I am of course completely biased, but I've always found this notation much more elegant and intuitively obvious than the [low_bound, hi_bound) that seems to be the prevalent norm in the anglo world.Using '[' after the upper bound clearly shows that we're open at the top whereas the ')' is fairly arbitrary.And while I'm on the topic of weird culture-induced quasi-arbitrary biases: I had a math teacher that would bark (and I mean BARK!) at us if we ever used '>' in inequalities.The justification was that with this constraint, all inequalities ended up written and laid out with its two members respecting the standard "left-to-right" drawing of the real line, which made it much easier to picture what was going on geometrically.It also enforced consistency throughout a long demonstration - one less thing added to the cognitive load.He was made fun of a lot by the student body, of course, but later in life, as a programmer, I have found myself sticking to the habit and I always force myself to mostly use '<', very rarely '<=, and almost never '>' and ">".I find this makes code much more readable, just like back in the days of my old teacher with math inequalities., and pretty much for the exact same reasons.Of course, doing that does not help at all when reading other folks code, those uncivilized heretical users of the 'greater than' form.

chkas超过 2 年前

Python uses "closed open" intervals with `range(0, n)`, the reverse is then `range(n - 1, -1, -1)`, which is then highly unintuitive. This in connection with 0-based array indexing makes certain algorithms then very cumbersome. For example Knuth-Shuffle. In Python this is:<pre><code> from random import randrange x = [10, 20, 30, 40, 50 ] for i in range(len(x) - 1, 0, -1): r = randrange(i + 1) x[i], x[r] = x[r], x[i] print(x) </code></pre> With 1-based indexing and inclusive ranges it would be much more understandable:<pre><code> a[] = [ 10 20 30 40 50 ] for i = len a[] downto 2 r = random i swap a[i] a[r] end print a[]</code></pre>

评论 #33705556 未加载

评论 #33706107 未加载

评论 #33711211 未加载

评论 #33705614 未加载

评论 #33705572 未加载

sshine超过 2 年前

This is one of those "well, of course!" pieces.But I've bookmarked it, in case I run into someone who thinks they disagree, in which case I can offload the explanation.I had a similar incident with colleagues who had discovered the "Default" trait and starting adding defaults to everything, including things that didn't have good defaults, and things where they didn't mean default but actually something quite specific such as "empty". The canonical "don't do that!" blog post didn't exist, so I had to create one.

评论 #33704109 未加载

enqk超过 2 年前

I think this has to do with the nature of the metric underneath. Closed-open intervals are the way to go for integers. However they don’t seen to be a good fit for sampling from continuuous space

评论 #33704769 未加载

personalityson超过 2 年前

Closed/open makes sense for continuous measures, for integers closed/closed is more readable

评论 #33706517 未加载

评论 #33708316 未加载

bheadmaster超过 2 年前

I prefer [a, b) intervals because they encode two pieces of important information directly:1. The first element (a)2. The length (b-a)Which are what we most often need.

评论 #33704415 未加载

zx8080超过 2 年前

Am I the only one who noticed a bit unusual (or not?) ligature for 'st'?

评论 #33705606 未加载

评论 #33704586 未加载

评论 #33704439 未加载

Gehinnn超过 2 年前

Don't call you integer bound vars `start` and `end` please. Either use `start` and `endExclusive` or start and length - this greatly reduces confusion.In my experience, half opened integer intervals lead to fewer `- 1` in the code.

评论 #33709363 未加载

评论 #33722083 未加载

评论 #33708352 未加载

评论 #33708365 未加载

jwilk超过 2 年前

> You could try [T, T-1], but that's a bit clunky and it won't work if T is a decimal number.Huh? What do they mean by "decimal number"?

评论 #33714142 未加载

评论 #33709816 未加载

hans_castorp超过 2 年前

FWIW, Postgres converts all ranges (=interval in this context) for discrete data types to half open intervals even when a closed one was requested.So the daterange '[2022-01-01, 2022-01-07]' will result in [2022-01-01,2022-01-08) and the integer range '[1,7]' will result in '[1,8)'So it seems the Postgres devs agree with the author.Edit: typo fixed for integer range

评论 #33706807 未加载

return_to_monke超过 2 年前

I am surprised I can't see this here. Wasn't anyone else taught [closed; closed] and ]open; open[ notation in school?

评论 #33708171 未加载

评论 #33707072 未加载

评论 #33707541 未加载

评论 #33707165 未加载

nvartolomei超过 2 年前

E. W. Dijkstra constructed a similar argument to argue that numbering should start at 0 <a href="https://www.cs.utexas.edu/users/EWD/transcriptions/EWD08xx/EWD831.html" rel="nofollow">https://www.cs.utexas.edu/users/EWD/transcriptions/EWD08xx/E...</a>

green-in-gold超过 2 年前

The other nice property of [closed, open) intervals is that they concatenate perfectly: [a, b) ++ [b, c) == [a, c)

评论 #33710225 未加载

评论 #33708514 未加载

alexchantavy超过 2 年前

Dumb question but why is ‘[‘ called ‘closed’ and ‘)’ called ‘open’? I somehow would have thought the reverse.

评论 #33714242 未加载

评论 #33725258 未加载

nequo超过 2 年前

As it goes with maxims like this, it depends on the problem domain. In probability theory and statistics, a cumulative distribution function is defined as F(x) := Pr(X <= x), not Pr(X < x).Like others are saying, it is consistency within the code base that probably matters the most.

评论 #33708273 未加载

runarberg超过 2 年前

The notation here really bothered me. The author defines the [closed, open) interval [a, b) as the list of all numbers number x that fulfill a ≤ x < b. Good so far, but when they talk about the empty interval [a, a) we get into problem because a ≤ x < a is a nonsensical statement. a cannot be equal to and strictly less then it self.I think this is a problem when borrowing math concepts to programming. What the author is really talking about here is slicing, not intervals, and the slicing behavior is hopefully well defined on the construct you are working with, most of the time in a manner that makes sense to each construct, or in a consistent manner to other related constructs in the language.If the author would stick with programming concepts, I don’t think this is a rule we should abide to, rather, a guideline which can be employed. And I think most programmers value consistency, so this really isn’t that much of an issue.

评论 #33708761 未加载

评论 #33709771 未加载

评论 #33717568 未加载

bob1029超过 2 年前

I was just watching YouTube series on arithmetic coding and I was wondering why the intervals were specified this way. Makes a lot more sense now. The "b - a" use case is quite prevalent there.

danans超过 2 年前

What's with the funny connected "st" in the article? I've never seen that before.

评论 #33708342 未加载

phoe-krk超过 2 年前

How does [2, 2) make sense?

评论 #33714387 未加载

eurasiantiger超过 2 年前

This is not universal advice. If it’s nonsensical for your app to have start-end events of zero length, nothing in the article applies, and using closed intervals does make things a bit cleaner.