Two's complement integers with only sign bit set should be a trap representation

65 pointsby luuover 1 year ago

22 comments

eruover 1 year ago

From the linked discussion:> @jrose @fclc People who want NaN in their integers are welcome to do that in the privacy of their own bedrooms with new types they invent for this purpose.That reminds me of OCaml's 63 bit integers. In the privacy of OCaml's own bedroom, they use one bit of their integers as a flag to distinguish pointers from integers.See <a href="https://blog.janestreet.com/what-is-gained-and-lost-with-63-bit-integers/" rel="nofollow">https://blog.janestreet.com/what-is-gained-and-lost-with-63-...</a>The results are quite.. interesting. However, it does go to show that having slightly weird private integer types is actually totally workable, so the quote from the discussion might be meant as trolling, but it's less crazy than it sounds like.

评论 #39285182 未加载

评论 #39286130 未加载

flohofwoeover 1 year ago

At that point, why not introduce special i63 and u63 types and use the hidden bit 63 as 'soft carry flag' to detect over/undeflow. 63 bits of integer range ought to be enough for anyone ;)(as soon as bit 63 is set, the number would be in 'overflow state', but at the same time the lower 63 bits would preserve the wrap-around value in case that's useful).On the other hand, compilers are already free to check the CPU carry flag bit and trap on it. Those flag bits are just not exposed to programming languages (which IMHO is a serious 'dissonance' between assembly and high level languages), also CPU flags can't be carried around with their associated result value and go 'viral' like NaNs (but whether that's even useful instead of trapping immediately is another question).

评论 #39291373 未加载

评论 #39286931 未加载

femtoover 1 year ago

DSPs already do something like this, offering the option of saturating arithmetic operations, in addition to the normal modulo arithmetic. It's the operation that's special, not the representation in memory.The normal way to do it is to use the "ETSI Basic Operator" library, which provides saturating arithmetic operations, among other things [1]. If the hardware supports saturated arithmetic the compiler maps the operations to the relevant instructions, otherwise it provides a software implementation.Example:Modulo arithmetic: int16_t a,b,c; a = b + c;Saturating 16-bit arithmetic using ETSI: int16_t a,b,c; a = add(b, c);[1] <a href="https://www.itu.int/rec/T-REC-G.191/en" rel="nofollow">https://www.itu.int/rec/T-REC-G.191/en</a>

评论 #39286474 未加载

tialaramexover 1 year ago

This is what nook:Balanced{I8,I16,I32,I64} are for. <a href="https://crates.io/crates/nook" rel="nofollow">https://crates.io/crates/nook</a>Of course today in Rust we're not allowed to bless such types for ourselves, this is reserved to the standard library (core::num::NonZeroU32 is a 4-byte unsigned integer which can't be zero for example) because it's relying on a nasty permanently unstable compiler-only feature to say "Hey, not all possible representations the right size for this type are valid".Rust has one very obvious set of types with such a niche, the references &T, but it also has not only the particular custom integers NonZero{U8,U16,U32,U64,I8,I16,I32,I64} and the Non-null pointers, but OS specific types (for Unix file descriptors, and Windows handles respectively) and of course char, it is 4 bytes but Unicode is explicitly limited to far fewer values, so that's a niche too.But today making more is not allowed in stable Rust, so while my nook crate is an interesting curiosity (and some day I'll implement integer math traits etc.) it cannot be stabilised and I was persuaded that the path forward is a whole lot of work to deliver the Pattern Types feature which can do this stably.

评论 #39288447 未加载

kazinatorover 1 year ago

> should be a trap representationIn a new, green field programming language design: maybe.You will have FFI interop issues: what if some foreign gives you such a value, which is correct in its domain. It will have to be specially handled and widened/promoted to a type that can hold it. It looks like friction, though.In an existing language, like C? Making currently correct programs blow up with a trap exception is a nonstarter.I think there is no hardware support for treating 0x80... as a trap, so if you actually want it to trap, rather than leaving it at undefined behavior, your compiler has to emit check code, at least in code paths where it is not obvious whether the value is or isn't a trap.Also, it would be strangely inconsistent for the 0x80... to predictably trap wherever it occurs, but arithmetic overflow to have undefined consequences. The two want to be tied together.

评论 #39288038 未加载

评论 #39285364 未加载

评论 #39284798 未加载

评论 #39285000 未加载

simionesover 1 year ago

Even if the current INT_MIN were redefined to be a NaN or a trap, that would do almost nothing to prevent signed integer overflow bugs. Sure, INT_MAX + 1 would be an error, but INT_MAX+2, INT_MAX+100 and so on would still be equal to INT_MIN + 1, INT_MIN + 99 etc.Besides, -fwrapv was basically the default in most C compilers (and probably C++ as well) before the last 10 years or so. Many large systems thus use it today as well (e.g. the Linux kernel, I believe).

评论 #39293034 未加载

drhagenover 1 year ago

R does this [1]. It is the sentinel that it uses internally to represent a missing value (i.e. NA) in integer arrays. Integer operations overflow to this value.[1] <a href="https://stackoverflow.com/a/56507840/1485877" rel="nofollow">https://stackoverflow.com/a/56507840/1485877</a>

torstenvlover 1 year ago

Wait so we're going to give up -fwrapv (a UB mitigation) for the explicit purpose of introducing more undefined behavior?I don't see any way this change wouldn't lead to an enormous number of bugs, all for the (dubious) goal of having the range of signed integers be symmetrical.

评论 #39287734 未加载

评论 #39284982 未加载

评论 #39284988 未加载

评论 #39287879 未加载

评论 #39286664 未加载

kevin_thibedeauover 1 year ago

Computers are implemented in hardware, not maths proofs. The Java generation is infatuated with magic numerics without realizing their performance cost.

评论 #39284976 未加载

评论 #39284501 未加载

评论 #39287698 未加载

评论 #39287760 未加载

silvestrovover 1 year ago

Rather than a trap representation I'd like a real trap, i.e. exception at runtime that can be caugth.Propagated NULL is useful in SQL but likely much less for integers as they are used as indexes. It is difficult to say when the "NaN" should result in a trap when doing pointer arithmetic using such numbers as the CPU does not have seperate pointer/integer data types.There are so many security problems from integer overflow. Also some easy way of checking an integer is in the range [0...n[ as a single instruction that traps if it is not.

adgjlsfhk1over 1 year ago

the version of this I've really wanted is hardware support for integers mod 2^64-1 with 0x1000000000000000 as a nan value. the addition and subtraction and division would be trivial. multiplication would be a tiny bit more complicated, but should be relatively doable.there are 3 main advantages with this system1. ability to properly represent zero division or overflow without flags/exceptions 2. abs(x) is always greater than 0 3. a mix of these and regular Ints could be used with the Chinese remainder theorem to provide faster (almost) 128 bit math.

评论 #39288992 未加载

snickerbockersover 1 year ago

This person's mind is going to be blown when he learns about bitmasks.

gregfjohnsonover 1 year ago

Here is my favorite example of errors due to overflow:std::vector<int> vec;// test for sorted..bool sorted = true;for (size_t i = 0; i < vec.size() - 1ULL; ++i)<pre><code> if (vec[i] > vec[i+1]) { sorted = false; break; } </code></pre> I believe the google c++ style guide rues the selection of unsigned integers as the return type of size() in the standard library for reasons like the above. Personally, my preferred behavior would be to have a compiler flag that could be used to enable a trap if unsigned arithmetic resulted in wrap-around, as in the above case when the vector is of length zero.

评论 #39287917 未加载

评论 #39288052 未加载

planedeover 1 year ago

A trap representation is not a NaN-like sentinel value. The post also didn't discuss the updated semantics of integer operations, but I assume it just simply shrinks the range of representable values by removing -2^(N-1), and operations with a resulting value that lie outside are still undefined.Having a NaN-like sentinel with is another option, with overflowing integer operations defaulting to that value instead, but that's not what is proposed here.The post does mention optional<int>, which could have the NaN-like sentinel (empty optional) by reusing the bit-pattern of the trap representation of the underlying int.

lifthrasiirover 1 year ago

I would welcome something like sNaN but actually fully enforced and widely implemented in hardwares. Otherwise it would be just a pipe dream.

nynxover 1 year ago

Please no. A process dying automatically and immediately dying because of an arithmetic overflow would be terrible.

评论 #39287814 未加载

评论 #39284817 未加载

scotty79over 1 year ago

All is fun and games until your negative number is suprisingly off by one because somebody, somewhere didn't convert between two complement and "symmetrical" ints correctly

dreamcompilerover 1 year ago

One of the things I love about Lisp is that i+1 is always greater than i. You can work hard to make that not be true, but that's the default and IMHO it's the proper default.

评论 #39285009 未加载

z29LiTp5qUC30nover 1 year ago

So they want symmetric complement of the ancient Knight Processor where there was NaN encoded so that divide by zero wasn't an exception but just a return NaN.

adrian_bover 1 year ago

In the past I have also been tempted by this idea that perhaps it would be better to redefine the current INT_MIN to be a NaN.Nevertheless, after thinking more about it, my conclusion was that this is not useful.The strongest argument in favor of it is that then an optional integer needs only the same size as an integer.Against this argument is the fact that in decades of programming I have never encountered the need for an optional integer type which needs for its base integer type all possible values.What I have encountered very frequently were optional integer types whose base integer type should have been defined as an integer interval a.k.a. integer range, like it is possible in languages like Ada or Pascal.For such optional integers with the valid values restricted to an interval, the optional type does not require extra storage over the base type, because the NaN can be represented by any value outside the permissible interval.The second argument for redefining INT_MIN to be a NaN is that this eliminates the possibility of overflows in several integer functions, like absolute value and negation.This argument is weak, because overflows can still happen in the more frequent arithmetic operations, so most non-trivial programs must still provide means to handle such exceptional cases.Moreover, the exceptions are avoided in absolute value and the like, by still having a (possibly implicit, i.e. inserted by the compiler) check that an integer value is not a NaN, somewhere before the code that uses negation, divide by -1, etc., perhaps in the invoking function, not in the invoked function to which integer arguments are passed, so this in the best case replaces multiple checks interspersed with the operations with a single check in the beginning, at the origin of the integer data. On the other hand, implicit checks that an integer is not a NaN would have to be added after additions and the like.The same is available now, in much of the code where integer overflows are possible one can check the range of the input integer data somewhere in the beginning, verifying that there are no values that can result in an overflow, avoiding further checks at each operation.The redefining of INT_MIN to be a NaN could help by relying on implicit checks added by a compiler in a few cases, but most expressions contain additions, subtractions and multiplications, so the program must still rely either on the compiler adding implicit overflow trap checks after each operation, for which adding the integer NaN does not give any improvements, or in the programmer writing explicit range checks that depend on the expressions that are computed, and for the majority of the expressions the explicit checks would still be needed.

hosejaover 1 year ago

None of signed integer representations are very good, each has severe drawbacks.

fargleover 1 year ago

this strikes me as a chesterton's fence situation - "what kind of weird nonsense is this 2's complement where it's asymmetric (yuck) and you can't negate -0x8000"i seriously doubt the folks who think this is a good idea know enough about the math and computer architecture to understand why in fact it's a terrible idea.- programming languages are abstractions. but (old joel) leaky ones sometimes. we don't like this, but you have to live with it.- in this case, the choice might be between reality-leaks such as "integers have limits", "the limits are asymmetric, leading to this negation pseudo-paradox" and a much worse leak such as "why does this spiffy safe integer type run 100,000 times slower than the weird C int"- actually math itself is a series of ever complicated abstractions. first counting numbers. then ZERO. then NEGATIVE numbers. ACK! now wait until you get to Fields and Rings and Galois Fields, which are very close to the concept of 2's complement integers. i believe that if ignoring the botched C standard that makes signed integer wraparound UB, "normal" 2's complement integers (and unsigned) form a finite ring.- so what? well even these fancy, and frankly, impenetrable math abstractions are just abstractions. they leak too and aren't "perfect" if you take into account the real universe which doesn't always match the perfect math. doesn't mean it isn't very useful.- these concepts can be used to reason about the behavior physical things, like the behavior of an array of flip-flops representing a 64 bit number and especially how you can create circuits and simplify them to do things that implement addition, multiplication, etc.- the problem with a sorta-1's-complement-without-two-zeros idea is it has none of these connections to grounded theoretical basis. so how do you simplify an adder? can you perform algebraic simplifications to it? what is the model for this thing?- symmetry is in the eye of the beholder. draw a 4-bit signed 2's complement drawing using "clock diagram". it works. now do it with 1's complement or this suggestion which is even worse with an odd, sometimes prime, number of nodes.- implementation: 2's complement is a very clever hardware technique that lets you interpret N bits as N digits of a binary number. each digit has weight 2^n: 1, 2, 4, 8, etc. except the last digit which has weight -2^n. this simple trick lets the signed and unsigned integers use the same addition, subtraction, and with minor processing, multiplication and division logic unchanged. it's very unlikely that you can implement the suggested scheme at even 1/10th the performance or cost.- and, yes, i think C really got it wrong calling "overflow" or wraparound UB. if it was defined to wrap, at least it would be theoretically analyzable. you'd still have overflow flaws, but that's provably impossible to avoid with finite sized integers of any design.