SQL NULLs aren't <i>weird</i>, they're just based off of Kleene's TRUE-FALSE-UNKNOWN logic!<p>If you want you can read NULL as UNKNOWN and suddenly a whole bunch of operations involving them become a lot more intuitive:<p>1. TRUE <i>OR</i> UNKNOWN = TRUE, because you know you have at least one TRUE already.<p>2. TRUE <i>AND</i> UNKNOWN = UNKNOWN, because you don't know whether you have two TRUEs or not. It's just out there.<p>3. UNKNOWN <i>XOR</i> UNKNOWN = UNKNOWN, because it could darn near be anything: TRUE XOR TRUE, TRUE XOR FALSE, FALSE XOR FALSE, FALSE XOR TRUE... Internalizing this is where SQL's use of NULL / UNKNOWN really becomes intuitive.<p>4. (TRUE AND FALSE) XOR (TRUE OR UNKNOWN) = (FALSE) XOR (TRUE) per #1 = TRUE. See, it's consistent, you just need to keep in mind that if you have a lot of known UNKNOWNs they're quite parasitic and your final outcome is likely to be, itself, an UNKNOWN. Just like in real life!
> select null = null; returns NULL, because each NULL is basically a placeholder representing any “unknown value”. Two unknown values are not necessarily the same value; we can’t say that they are equal, because we don’t know the value of either of them.<p>Agreed with all of this, it would probably have been better if they were named `unknown` instead of reusing the `null` keyword.<p>Note also that since Postgresql 15, you can use `NULLS NOT DISTINCT` when creating a unique index [0]. I'm less familiar with other databases.<p>0: <a href="https://www.postgresql.org/docs/15/sql-createtable.html" rel="nofollow">https://www.postgresql.org/docs/15/sql-createtable.html</a>
> ... and this is even less obvious if you’re used to using ORMs.<p>Which is why I continue to be such an ORM skeptic. I agree that they're convenient. But I do worry that we've now got an entire generation of engineers who regularly interact with relational databases, but have largely been spared the effort of learning how they actually work.<p>As another commenter pointed out, if you've learned basic relational algebra then the way SQL nulls behave seems obvious and logically consistent. The logic is the same as the logic behind the comparison rules for NaN in IEEE floats. It's the behavior of C-style nulls that is, always and forever, a billion-dollar mistake.
When the null concept was introduced to me in the seventies, the only thing I could say was that it would be causing a lot of unnecessary confusion in the future.
If you have missing values in your datarecord then that datarecord belongs in an exception-queue.
And now some 45 years later people are still discussing it like we did then..
SQL nulls in some ways behave in similar to floating point nans. Of course nans are also weird in their own way, but it is a bit comforting that its not so completely singularly weird.
In Object Oriented Context "null" is useful to indicate that some object doesn't have value for that property.<p>What's interesting is, do we mean that in our data that attribute has no value? Or do we mean the real-world object represented by the data does not have that attribute?<p>Does null mean<p>a) We don't know the value of this attribute
for this object, or<p>b) We do know that there is no value for this
attribute in the real-world object
represented by our data.<p>In JavaScript because there is both null and undefined it is easy to assume that undefined means we don't know the value and null means we do know it has no value.<p>EXAMPLE:
The attribute 'spouse'. Some people have a spouse some don't. So what does it mean if the value of the field 'spouse' is null? That we know there is no spouse, or that we don't know who the spouse is if any.<p>In practical terms we can say null means "We don't know" which includes the case that there is no spouse.
Recall this really funny dialogue in one of the Blackadder episodes.<p>The princess eyes are as blue as the stone of Galveston<p>Have you seen the princess eyes?<p>No!<p>Have you seen the blue stone of Galveston?<p>No!<p>So you're comparing something you've never seen with something else you've never seen!<p>That's NULL comparison
Even if we set three-value logic aside for a moment, this behavior of NULL still makes sense intuitively.<p>The value of NULL in a particular table cell is simply a way to indicate 'no value'. If you want the values in a column to be unique, cases where there are no values shouldn't be considered.<p>This plays out similarly in practice. For example, you may want to allow users to optionally reserve a username, and if they do, those usernames should be unique. It's hard to imagine a use case where by wanting a field to be both optional (nullable) and unique, you mean that the field should be optional for a single record (!) and required for all the rest. Of course, you mean that IF there is a value, THEN it should be unique.
Surprised that this doesn't mention "IS DISTINCT FROM" ( <a href="https://modern-sql.com/caniuse/is-distinct-from" rel="nofollow">https://modern-sql.com/caniuse/is-distinct-from</a> )<p>(Although in rare cases that is even weirder: <a href="https://stackoverflow.com/a/58998043" rel="nofollow">https://stackoverflow.com/a/58998043</a> )
I actually like how NULLs behave in SQL. They mean "I don't know" In the modern programming language we all care about Null safety. But no matter how you model your data, you will always run into the situations when you don't know everything. So I believe NOT NULL is not very practical. NULLs in SQL handle these case very well - when the input is unknown your output is unknown
NULLs are weird because they are basically two different types under the same name. The 3-value logic type is useful for representing "missing" foreign keys, but 2-value logic type is arguably more useful when searching/sorting/aggregating.<p>I think we would have been better-off by treating FKs (and maybe outer JOINs) as a special case, and using 2-value logic everywhere else.
If you want equality testing with nulls, you want to use `is (not) distinct from` instead of `=` and `<>` / `!=`.<p>`1 is not distinct from NULL` => false<p>`NULL is not distinct from NULL` => true<p>`0 is not distinct from 1` => false
I don't see why this is weird. Unique means no duplicates. Nulls can't be duplicates of each other because they're not equal to each other.<p>If you don't like null semantics, you're free to use sentinel values. You can make all the sentinel values the same, or you can make them all different. Either way, you or someone who has to use your system will be back here tomorrow complaining about how weird it is.
There's another good, technical write-up on NULL behavior in SQL at modern-sql.com<p><a href="https://modern-sql.com/concept/null" rel="nofollow">https://modern-sql.com/concept/null</a><p>(Note: I am not affiliated with that bloh/website in any way, shape, or form.)
This has always made queries unpredictable in many scenarios and it should be a feature to turn nulls off entirely and swap them out with Option<T> instead.
For Postgres specific approach, you may refer to<p><a href="https://blog.rustprooflabs.com/2022/07/postgres-15-unique-improvement-with-null" rel="nofollow">https://blog.rustprooflabs.com/2022/07/postgres-15-unique-im...</a><p>Practically speaking, I go with not null, and always set default value.
This reminds me back in the day when I was writing a DSL for a project.<p>Since the data we were getting was sourced from an RDBMS, I wanted NULL to be a first class concept in the DSL, with similar traits.<p>Early on, I simply made any expression that involved a NULL result in NULL. Naively this was all well and good, but it failed spectacularly in condition statements.<p>Instead of A = NULL == false, I had A = NULL == NULL. And, as you can imagine, a single NULL in the expression would just pollute the entire thing, and since NULL was considered as FALSE for conditionals, any NULL in an expression made the entire thing, eventually, FALSE.<p>Naturally I went back and made the comparison operators always return booleans. But it was a fun little side effect at the time.
I have deep but vague thoughts around the concept. My first intuition is that we have put too many things under NULL and None and such.<p>Partially, we use ... and I'll be very broad here ... "variables" as boxes we look in for answers. Answers to questions, answers to "Hey I put something in there for the time being to reference later." If I went into programming terms rather than just SQL, sometimes we get meta-answers.<p>You haven't made the box yet (declared the variable).<p>You haven't decided how the box is structured (picked a type or a length or something).<p>Okay, you did those but the box is <i>virgin</i> (nothing has been placed in the box yet).<p>That kind of thing. An empty set for "yes, you asked but nothing meets those criteria."
It is not supposed that null is the bottom value in the universe of all the values that your program can recognize? Why people need to complicate it?, and yeah in that definition `null == null`, but a `null_pointer != null` because null pointer is at the bottom of all the possible pointer value, and null by itself is not a pointer. The same for (0,null), (false, null) and ("", null). null should only be equal to itself.<p>And lastly undefined != null, because undefined is related with structures indicating that a field was not defined when the structure was created
NULL is the absence of a value. If you try and treat it as a value, you're going to have a bad time. So an attempted UNIQUE(email_address, deleted_at) constraint is fundamentally flawed. If you treated NULL as a value that could be unique, you're going to break foreign keys.<p>But let's continue the logic of deleted_at being NULL indicating an active account, which seems to the intent here. You end up doing things like:<p><pre><code> SELECT /* ... */
FROM accounts
WHERE email_address = '...'
AND deleted_at IS NOT NULL
</code></pre>
Depending on your database, that may or may not index well. More problematic, you may end up with privacy leaks if someone forgets the last conditional.<p>If anything, you want to reverse this so someone has to go out of their way to explicitly select deleted accounts. There are multiple strategies for this eg using an active_accounts view or table.<p>Lastly, there are lots of potential reasons for an account to be disabled or otherwise not visible/accessible. Takedowns, court orders, site safety, hacked accounts and so on.<p>Overloading deleted_at to have a semantic meaning for an active account is just fundamentally bad design.
SQL NULL is also coerced to different boolean values depending on context e.g. in a WHERE clause NULL is coerced to false, whereas in a CHECK constraint NULL is coerced to true<p><a href="https://dbfiddle.uk/C5JqMP8O" rel="nofollow">https://dbfiddle.uk/C5JqMP8O</a>
That's why I like EdgeDB and EdgeQL: no null [1]<p>[1] <a href="https://www.edgedb.com/blog/we-can-do-better-than-sql" rel="nofollow">https://www.edgedb.com/blog/we-can-do-better-than-sql</a>
And SQL null shares the feature with many other languages that any type can be a NULL (although a column can be set NOT NULL). Much like Java, it is no end of grief that a type that claims to be, say, a “NUMBER” is actually “NUMBER | NULLType”.
All of this would be avoided if NULL in sql was just called "UNKNOWN". Which is what it is. Terrible name :/<p>Imo, SQL should add "NOTHING", add "UNKNOWN" as a synonym for "NULL", and deprecate "NULL".
null != null is pretty bizarre at first, until you understand the reason the did it was to try to make sense of null-able indexed columns. Not sure why we couldnt have our cake and eat it, but instead we got IS NOT NULL is not the same as != NULL
To me, "Unknown" almost implies the possiblity of a value, whereas I've always thought of NULL as being an absence of a value.<p>edit: an empty string, false, 0 are all values.
I think (blasphemous hot take ahead) that the standards of implementation of relational models are wrong.<p>NULLs still have their (rare) place, but the foremost issue with query results is that they are tabular rather than hierarchical. The main culprits being (1) outer joins that represent or induce nonsensical operations and (2) lack of non-null “zero” values for types like date. Of course hierarchies can make querying more complex, but mostly in cases where the relational logic goes crazy itself and you had to go tabular anyway.<p>If you think of it, distinct, group by and windowing feel like workarounds in tabular mode but would be natural to hierarchies, because everything is naturally distinct and grouped-by by design and windows are basically subtables in these rows.<p>Bonus points you could fetch “SELECT FROM a, b_rows LEFT JOIN b AS b_rows …” in a single query without duplicating `a`s and nullifying `b`s when N <> 1. And when you aggregate through a column in `b`, there’s no headache what to do with join-produced NULLs (unless `b` columns are nullable by your design, then it’s on you). And when it all arrives to a client, it’s already well-shaped for ui, processing, etc. No more:<p><pre><code> last_a_id = undefined
for (row of rows) {
if (row.id != last_a_id) {
…
last_a_id = row.id
}
…
}
</code></pre>
I’m pretty sure you recognize this programming idiom immediately.<p>Before you criticize, I’m not talking about hierarchical/OO tables. Only about ways of getting and handling query results. You still can reshape a relation like you want. The difference is that a database engine doesn’t have to put it all onto a (N x M x …) table and instead creates sort of a subset of relations which is efficient in space and natural to walk through. It already does that when walking through indexes, selects are naturally hierarchical. All it has to do is to track relations it went through rather than just dumping rows from a set of cursors that it <i>knows</i> the start and end points of, but loses this knowledge by writing into a plain table.
Ah yes, someone discovering the existence of three value logic in SQL and expecting 2VL behavior. Classic. We've all been there, right?<p>Personally I wish more languages were like python or ruby and had chosen None or Nil over Null which alleviates the confusion a bit, as those names better indicates that it's NOT an "unknown" (1 unknown value != 1 other unknown, which intuitively makes sense.) In ruby or python it's more obvious that None and Nil are "nothing" types and therefore equivalence makes sense (nil == nil, None == None are both true)
The NULLs in unique constraints quirk actually works differently in ORACLE databases, which is infuriating to say the least. Apparently this comes from some ambiguity in some sql standard, anyone know more about this?