A very natural explanation of "wikipedia proof 2" for differentiable functions seems to be missing:<p>By linearity of expectation, both sides are linear in f, and for linear f we have equality. Let's subtract the linear function whose graph is the tangent hyperplane to f at E(X). By above, this does not change the validity of the inequality. But now the left hand side is 0, and right hand side is non-negative by convexity, so we are done.<p>It's also now clear what the difference of the two sides is -- it's the expectation of the gap between f(X) an and the value of the tangent plane at X.<p>Now in general replace tangent hyperplane with graph of a subderivative, to recover what wiki says.