I think that in some sense the <i>correct</i> solution to the reference type problem is the Scala solution, which depends on subtyping. This is pretty tricky to reason through, and adding subtyping makes type inference a lot more difficult, so it's hardly surprising that SML avoided this, but it provides a much more satisfying solution. (OCaml later embraced subtyping, but in the particular case of mutable references, it instead adopted the same value-restriction approach as SML.)<p>Here's the way I understand it; be warned that I'm just starting to understand this stuff, so I might have got something wrong. I'd welcome corrections.<p>S is a subtype of T (S <: T) iff you can use an object of type S wherever you need an object of type T. This gives us a partial order on types, in the sense that<p><pre><code> ∀T: T <: T
∀S:∀T: S <: T ∧ T <: S ⇒ S = T
∀S:∀T:∀U: S <: T ∧ T <: U ⇒ S <: U
</code></pre>
which all follow from the above informal definition. Sometimes we make it a lattice, for example by adding a ⊤ type ("top") that everything is a subtype of and a ⊥ type ("bottom") that is a subtype of everything.<p>In general in the presence of subtyping we can only infer type <i>bounds</i> on most things, not exact types. For example, consider that {3} <: ℤ <: ℝ <: ℂ; if we see a function being applied to the value 3, we can infer that it must be at least applicable to {3}, but a function that is applicable to ℤ or ℝ would also work.<p>Given this informal definition, surprisingly, α → α (the type of the identity function) is a "subtype" of bool → bool, not vice versa. That is, α → α <: bool → bool. Whenever we want a function of type bool → bool, we can use a function of type α → α, but not vice versa.<p>α → α is a weird type because it includes an implicit universal quantifier: ∀α: α → α. As it turns out, the judgment above that α → α <: bool → bool is not because α <: bool or bool <: α. It's because bool → bool is what you get when you instantiate the quantifier with a particular value, α = bool.<p>There's <i>also</i> a function-specific rule for subtyping, and it's a real mindbender: B → C <: A → D if A <: B and C <: D. We say functions' argument types are <i>contravariant</i> and their results are <i>covariant</i>. Considering the A <: B case, if we need a function from integers to booleans, we can use a function from real numbers to booleans—we just won't happen to invoke it with any non-integer real numbers, but it will still work, assuming there aren't incompatible binary representations at play (as in the case of the OCaml +. and + operators MJD mentions.). Considering the C <: D case, we can also use a function from integers to true: it will never return a value we cannot use as a boolean.<p>We can decompose the operation of taking a reference to x, ref x, into a step of creating a reference r and then applying the reseating operation r := x to it. This operation is valid iff the assignee is of a subtype of the referent type; that is, (r : T ref) := (x : S) is valid iff S <: T. That's because later when we read r we will be using its value as a T, so any subtype of T is fine. So for ref (fn x ⇒ x), we have that S = α → α, so we can derive the type bound that T must be some <i>supertype</i> of α → α. As mentioned above, this includes bool → bool, but it also includes ℤ → ℤ, (ℤ × ℂ) → (ℤ × ℂ), and the polymorphic type α → α itself. So reseating a reference is contravariant: we can write a value we know is an integer <i>or anything more specific</i> (a subtype such as a positive integer) into a reference we know to hold an integer <i>or anything more general</i> (a supertype such as a real number).<p>The dereferencing operation !r turns out to instead be covariant: !(r: S ref): T is valid iff S <: T. That is, we're going to use the result as type T, so the reference can hold any subtype of T. So if we see ((!m) 23) we have (!m) : {23} → ⊤, from which we can deduce a type bound (m : U) where U <: ({23} → ⊤) ref. That is, the type of the referent of m is a subtype of a "function that can take 23 and return something". For example, (ℤ → string) <: ({23} → ⊤), because {23} <: ℤ and functions are contravariant in their arguments, and string <: ⊤ (because every type is a subtype of ⊤), so if m has type (ℤ → string) ref, we're good. (bool → bool) ref would not work, because (bool → bool) ref is not a subtype of ({23} → ⊤) ref, because bool → bool is not a subtype of {23} → ⊤, because, even though bool <: ⊤, {23} is not a subtype of bool.<p>How does this resolve the problem? If m is of type (α → α) ref, ((!m) 23) <i>would</i> work; for example, if m happens to be a reference to the identity function, it evaluates to 23. (In many languages that's the only function of type α → α, in the interests of making the Curry–Howard correspondence meaningful.) But if we've previously seen (m := not), we have an incompatible type bound: (not : bool → bool), so we know that m is of type (bool → bool) ref or a ref to <i>some supertype of</i> bool → bool, U ref where bool → bool <: U. And, as I said above, it is <i>not</i> the case that bool → bool <: α → α; it's the other way around.<p>This seems to give us a nice, clean solution to the problem of types for references. Aside from being confusing as hell, the cost is that instead of inferring a <i>type</i> for every expression we can now only infer an <i>infinite set of possibilities</i> for its type. This sounds ridiculous but it is precisely what OCaml does for polymorphic variants and objects (though not mutable refs; details are in <a href="https://v2.ocaml.org/manual/polymorphism.html" rel="nofollow">https://v2.ocaml.org/manual/polymorphism.html</a>). With polymorphic variants:<p><pre><code> # let f a = match a with 1 -> `P (3, 4) | _ -> `Q (5) ;;
val f : int -> [> `P of int * int | `Q of int ] = <fun>
# let g b = match b with `P (c, d) -> c + d | `Q (c) -> c ;;
val g : [< `P of int * int | `Q of int ] -> int = <fun>
</code></pre>
The "<" here means that what's being given is an "upper bound": it's perfectly acceptable to apply g to an expression whose type is a <i>subtype</i> of the type given, like this:<p><pre><code> # `Q 2 ;;
- : [> `Q of int ] = `Q 2
# g (`Q 2) ;;
- : int = 2
</code></pre>
But the ">" on f's return type is a "lower bound": you can only use f in a context where its return type <i>is a subtype of</i> the expected type. Including the same type, so we can pass it to g:<p><pre><code> # g (f 1);;
- : int = 7
# g (f 2);;
- : int = 5
</code></pre>
Similarly we can write a function that calls a couple of methods on its arguments:<p><pre><code> # let mf o = (o#foo ; o#bar + 1) ;;
val mf : < bar : int; foo : 'a; .. > -> int = <fun>
</code></pre>
Here OCaml infers that the object needs to have a method "bar" returning int and a method "foo" of some type, but the .. makes this an upper-bound object type: it's okay for the object to be of some subtype that has more methods. And similarly if you construct an object with some methods, it can be used in a context that requires some supertype that doesn't have all of them. (There's a very close analogy between an object with a certain set of methods and a function that can be applied to terms with a certain finite set of polymorphic variant tags.)<p>The subtyping approach would seem to provide a logical solution to the problem of mutability, but it isn't the approach OCaml took, for reasons I don't understand. It's confusing and difficult to understand, but less so than OCaml's current set of rules. I suspect the answer is that OCaml had to be backward-compatible with earlier versions of Caml before subtyping was added, but possibly it's instead to keep the type inference problem decidable or something.