> The simple reason is that that illustration shows how we regularize models conceptually, with hard constraints, not how we actually implement regularization, with soft constraints!<p>Note that these are equivalent. In particular, assuming the two problems are (for some loss "l" and regularizer "r")<p><pre><code> minimize l(θ) + λr(θ),
</code></pre>
and<p><pre><code> minimize l(θ)
subject to r(θ) ≤ M,
</code></pre>
for every λ there exists an M and for every M there exists a λ such that the resulting problems are equivalent, in the sense that a solution for one is a solution for the other. (Of course, under some fairly general regularity conditions, but these hold for all given examples). I agree that this is not often stated in many introductory texts, but the intuitive image is the same.