SQL nulls are weird

214 点作者 subomi4 个月前

35 条评论

SQL NULLs aren't weird, they're just based off of Kleene's TRUE-FALSE-UNKNOWN logic!If you want you can read NULL as UNKNOWN and suddenly a whole bunch of operations involving them become a lot more intuitive:1. TRUE OR UNKNOWN = TRUE, because you know you have at least one TRUE already.2. TRUE AND UNKNOWN = UNKNOWN, because you don't know whether you have two TRUEs or not. It's just out there.3. UNKNOWN XOR UNKNOWN = UNKNOWN, because it could darn near be anything: TRUE XOR TRUE, TRUE XOR FALSE, FALSE XOR FALSE, FALSE XOR TRUE... Internalizing this is where SQL's use of NULL / UNKNOWN really becomes intuitive.4. (TRUE AND FALSE) XOR (TRUE OR UNKNOWN) = (FALSE) XOR (TRUE) per #1 = TRUE. See, it's consistent, you just need to keep in mind that if you have a lot of known UNKNOWNs they're quite parasitic and your final outcome is likely to be, itself, an UNKNOWN. Just like in real life!

评论 #42651862 未加载

评论 #42654192 未加载

评论 #42653803 未加载

评论 #42648007 未加载

评论 #42652344 未加载

评论 #42656291 未加载

评论 #42654887 未加载

评论 #42657689 未加载

评论 #42651791 未加载

评论 #42662725 未加载

评论 #42652444 未加载

demurgos4 个月前

> select null = null; returns NULL, because each NULL is basically a placeholder representing any “unknown value”. Two unknown values are not necessarily the same value; we can’t say that they are equal, because we don’t know the value of either of them.Agreed with all of this, it would probably have been better if they were named `unknown` instead of reusing the `null` keyword.Note also that since Postgresql 15, you can use `NULLS NOT DISTINCT` when creating a unique index [0]. I'm less familiar with other databases.0: <a href="https://www.postgresql.org/docs/15/sql-createtable.html" rel="nofollow">https://www.postgresql.org/docs/15/sql-createtable.html</a>

评论 #42647944 未加载

评论 #42653602 未加载

评论 #42647845 未加载

评论 #42647483 未加载

评论 #42647573 未加载

bunderbunder4 个月前

> ... and this is even less obvious if you’re used to using ORMs.Which is why I continue to be such an ORM skeptic. I agree that they're convenient. But I do worry that we've now got an entire generation of engineers who regularly interact with relational databases, but have largely been spared the effort of learning how they actually work.As another commenter pointed out, if you've learned basic relational algebra then the way SQL nulls behave seems obvious and logically consistent. The logic is the same as the logic behind the comparison rules for NaN in IEEE floats. It's the behavior of C-style nulls that is, always and forever, a billion-dollar mistake.

评论 #42653770 未加载

评论 #42648345 未加载

评论 #42653318 未加载

评论 #42655719 未加载

irrational4 个月前

I expected the article to mention how in Oracle NULLs are equal to empty strings. Now that is weird.

评论 #42648978 未加载

评论 #42655389 未加载

评论 #42647517 未加载

datadrivenangel4 个月前

SQL NULLs are not weird once you consider how you want relational logic to work when they is a record with non-existent values.

评论 #42647548 未加载

评论 #42666812 未加载

评论 #42647843 未加载

iefbr144 个月前

When the null concept was introduced to me in the seventies, the only thing I could say was that it would be causing a lot of unnecessary confusion in the future. If you have missing values in your datarecord then that datarecord belongs in an exception-queue. And now some 45 years later people are still discussing it like we did then..

评论 #42655958 未加载

评论 #42662879 未加载

评论 #42656049 未加载

zokier4 个月前

SQL nulls in some ways behave in similar to floating point nans. Of course nans are also weird in their own way, but it is a bit comforting that its not so completely singularly weird.

评论 #42647833 未加载

评论 #42647728 未加载

galaxyLogic4 个月前

In Object Oriented Context "null" is useful to indicate that some object doesn't have value for that property.What's interesting is, do we mean that in our data that attribute has no value? Or do we mean the real-world object represented by the data does not have that attribute?Does null meana) We don't know the value of this attribute for this object, orb) We do know that there is no value for this attribute in the real-world object represented by our data.In JavaScript because there is both null and undefined it is easy to assume that undefined means we don't know the value and null means we do know it has no value.EXAMPLE: The attribute 'spouse'. Some people have a spouse some don't. So what does it mean if the value of the field 'spouse' is null? That we know there is no spouse, or that we don't know who the spouse is if any.In practical terms we can say null means "We don't know" which includes the case that there is no spouse.

评论 #42659245 未加载

评论 #42653125 未加载

评论 #42652759 未加载

评论 #42653563 未加载

评论 #42652815 未加载

kopirgan4 个月前

Recall this really funny dialogue in one of the Blackadder episodes.The princess eyes are as blue as the stone of GalvestonHave you seen the princess eyes?No!Have you seen the blue stone of Galveston?No!So you're comparing something you've never seen with something else you've never seen!That's NULL comparison

ludwik4 个月前

Even if we set three-value logic aside for a moment, this behavior of NULL still makes sense intuitively.The value of NULL in a particular table cell is simply a way to indicate 'no value'. If you want the values in a column to be unique, cases where there are no values shouldn't be considered.This plays out similarly in practice. For example, you may want to allow users to optionally reserve a username, and if they do, those usernames should be unique. It's hard to imagine a use case where by wanting a field to be both optional (nullable) and unique, you mean that the field should be optional for a single record (!) and required for all the rest. Of course, you mean that IF there is a value, THEN it should be unique.

Recursing4 个月前

Surprised that this doesn't mention "IS DISTINCT FROM" ( <a href="https://modern-sql.com/caniuse/is-distinct-from" rel="nofollow">https://modern-sql.com/caniuse/is-distinct-from</a> )(Although in rare cases that is even weirder: <a href="https://stackoverflow.com/a/58998043" rel="nofollow">https://stackoverflow.com/a/58998043</a> )

评论 #42651757 未加载

lolpanda4 个月前

I actually like how NULLs behave in SQL. They mean "I don't know" In the modern programming language we all care about Null safety. But no matter how you model your data, you will always run into the situations when you don't know everything. So I believe NOT NULL is not very practical. NULLs in SQL handle these case very well - when the input is unknown your output is unknown

评论 #42652118 未加载

评论 #42652098 未加载

评论 #42652241 未加载

branko_d4 个月前

NULLs are weird because they are basically two different types under the same name. The 3-value logic type is useful for representing "missing" foreign keys, but 2-value logic type is arguably more useful when searching/sorting/aggregating.I think we would have been better-off by treating FKs (and maybe outer JOINs) as a special case, and using 2-value logic everywhere else.

al2o3cr4 个月前

FWIW, you can explicitly change this behavior in Postgres as of version 15 - include "NULLS NOT DISTINCT" when creating the unique index.

kurtbuilds4 个月前

If you want equality testing with nulls, you want to use `is (not) distinct from` instead of `=` and `<>` / `!=`.`1 is not distinct from NULL` => false`NULL is not distinct from NULL` => true`0 is not distinct from 1` => false

评论 #42651574 未加载

getnormality4 个月前

I don't see why this is weird. Unique means no duplicates. Nulls can't be duplicates of each other because they're not equal to each other.If you don't like null semantics, you're free to use sentinel values. You can make all the sentinel values the same, or you can make them all different. Either way, you or someone who has to use your system will be back here tomorrow complaining about how weird it is.

otteromkram4 个月前

There's another good, technical write-up on NULL behavior in SQL at modern-sql.com<a href="https://modern-sql.com/concept/null" rel="nofollow">https://modern-sql.com/concept/null</a>(Note: I am not affiliated with that bloh/website in any way, shape, or form.)

gxt4 个月前

This has always made queries unpredictable in many scenarios and it should be a feature to turn nulls off entirely and swap them out with Option<T> instead.

评论 #42652540 未加载

hot_gril4 个月前

Weird as they seem at first, SQL null handling ends up being convenient the way it is. Part of this is because left/right join give you nulls.

tzury4 个月前

For Postgres specific approach, you may refer to<a href="https://blog.rustprooflabs.com/2022/07/postgres-15-unique-improvement-with-null" rel="nofollow">https://blog.rustprooflabs.com/2022/07/postgres-15-unique-im...</a>Practically speaking, I go with not null, and always set default value.

whartung4 个月前

This reminds me back in the day when I was writing a DSL for a project.Since the data we were getting was sourced from an RDBMS, I wanted NULL to be a first class concept in the DSL, with similar traits.Early on, I simply made any expression that involved a NULL result in NULL. Naively this was all well and good, but it failed spectacularly in condition statements.Instead of A = NULL == false, I had A = NULL == NULL. And, as you can imagine, a single NULL in the expression would just pollute the entire thing, and since NULL was considered as FALSE for conditionals, any NULL in an expression made the entire thing, eventually, FALSE.Naturally I went back and made the comparison operators always return booleans. But it was a fun little side effect at the time.

at_a_remove4 个月前

I have deep but vague thoughts around the concept. My first intuition is that we have put too many things under NULL and None and such.Partially, we use ... and I'll be very broad here ... "variables" as boxes we look in for answers. Answers to questions, answers to "Hey I put something in there for the time being to reference later." If I went into programming terms rather than just SQL, sometimes we get meta-answers.You haven't made the box yet (declared the variable).You haven't decided how the box is structured (picked a type or a length or something).Okay, you did those but the box is virgin (nothing has been placed in the box yet).That kind of thing. An empty set for "yes, you asked but nothing meets those criteria."

criloz24 个月前

It is not supposed that null is the bottom value in the universe of all the values that your program can recognize? Why people need to complicate it?, and yeah in that definition `null == null`, but a `null_pointer != null` because null pointer is at the bottom of all the possible pointer value, and null by itself is not a pointer. The same for (0,null), (false, null) and ("", null). null should only be equal to itself.And lastly undefined != null, because undefined is related with structures indicating that a field was not defined when the structure was created

jmyeet4 个月前

NULL is the absence of a value. If you try and treat it as a value, you're going to have a bad time. So an attempted UNIQUE(email_address, deleted_at) constraint is fundamentally flawed. If you treated NULL as a value that could be unique, you're going to break foreign keys.But let's continue the logic of deleted_at being NULL indicating an active account, which seems to the intent here. You end up doing things like:<pre><code> SELECT /* ... */ FROM accounts WHERE email_address = '...' AND deleted_at IS NOT NULL </code></pre> Depending on your database, that may or may not index well. More problematic, you may end up with privacy leaks if someone forgets the last conditional.If anything, you want to reverse this so someone has to go out of their way to explicitly select deleted accounts. There are multiple strategies for this eg using an active_accounts view or table.Lastly, there are lots of potential reasons for an account to be disabled or otherwise not visible/accessible. Takedowns, court orders, site safety, hacked accounts and so on.Overloading deleted_at to have a semantic meaning for an active account is just fundamentally bad design.

评论 #42648207 未加载

评论 #42647928 未加载

评论 #42647568 未加载

qwertydog4 个月前

SQL NULL is also coerced to different boolean values depending on context e.g. in a WHERE clause NULL is coerced to false, whereas in a CHECK constraint NULL is coerced to true<a href="https://dbfiddle.uk/C5JqMP8O" rel="nofollow">https://dbfiddle.uk/C5JqMP8O</a>

jakubmazanec4 个月前

That's why I like EdgeDB and EdgeQL: no null [1][1] <a href="https://www.edgedb.com/blog/we-can-do-better-than-sql" rel="nofollow">https://www.edgedb.com/blog/we-can-do-better-than-sql</a>

trollbridge4 个月前

And SQL null shares the feature with many other languages that any type can be a NULL (although a column can be set NOT NULL). Much like Java, it is no end of grief that a type that claims to be, say, a “NUMBER” is actually “NUMBER | NULLType”.

boxed4 个月前

All of this would be avoided if NULL in sql was just called "UNKNOWN". Which is what it is. Terrible name :/Imo, SQL should add "NOTHING", add "UNKNOWN" as a synonym for "NULL", and deprecate "NULL".

exabrial4 个月前

null != null is pretty bizarre at first, until you understand the reason the did it was to try to make sense of null-able indexed columns. Not sure why we couldnt have our cake and eat it, but instead we got IS NOT NULL is not the same as != NULL

xd4 个月前

To me, "Unknown" almost implies the possiblity of a value, whereas I've always thought of NULL as being an absence of a value.edit: an empty string, false, 0 are all values.

anonnon4 个月前

The simplest end-run around this is to avoid NULLs entirely, which normalization (even just the first normal form) requires.

osigurdson4 个月前

Weirder still are floating point numbers in SQL.

wruza4 个月前

I think (blasphemous hot take ahead) that the standards of implementation of relational models are wrong.NULLs still have their (rare) place, but the foremost issue with query results is that they are tabular rather than hierarchical. The main culprits being (1) outer joins that represent or induce nonsensical operations and (2) lack of non-null “zero” values for types like date. Of course hierarchies can make querying more complex, but mostly in cases where the relational logic goes crazy itself and you had to go tabular anyway.If you think of it, distinct, group by and windowing feel like workarounds in tabular mode but would be natural to hierarchies, because everything is naturally distinct and grouped-by by design and windows are basically subtables in these rows.Bonus points you could fetch “SELECT FROM a, b_rows LEFT JOIN b AS b_rows …” in a single query without duplicating `a`s and nullifying `b`s when N <> 1. And when you aggregate through a column in `b`, there’s no headache what to do with join-produced NULLs (unless `b` columns are nullable by your design, then it’s on you). And when it all arrives to a client, it’s already well-shaped for ui, processing, etc. No more:<pre><code> last_a_id = undefined for (row of rows) { if (row.id != last_a_id) { … last_a_id = row.id } … } </code></pre> I’m pretty sure you recognize this programming idiom immediately.Before you criticize, I’m not talking about hierarchical/OO tables. Only about ways of getting and handling query results. You still can reshape a relation like you want. The difference is that a database engine doesn’t have to put it all onto a (N x M x …) table and instead creates sort of a subset of relations which is efficient in space and natural to walk through. It already does that when walking through indexes, selects are naturally hierarchical. All it has to do is to track relations it went through rather than just dumping rows from a set of cursors that it knows the start and end points of, but loses this knowledge by writing into a plain table.

drzaiusx114 个月前

Ah yes, someone discovering the existence of three value logic in SQL and expecting 2VL behavior. Classic. We've all been there, right?Personally I wish more languages were like python or ruby and had chosen None or Nil over Null which alleviates the confusion a bit, as those names better indicates that it's NOT an "unknown" (1 unknown value != 1 other unknown, which intuitively makes sense.) In ruby or python it's more obvious that None and Nil are "nothing" types and therefore equivalence makes sense (nil == nil, None == None are both true)

ungut4 个月前

The NULLs in unique constraints quirk actually works differently in ORACLE databases, which is infuriating to say the least. Apparently this comes from some ambiguity in some sql standard, anyone know more about this?

评论 #42663683 未加载