Is there a reason to perform linear regression using an optimizer like SGD or Adam as opposed to using least squares? For large matrices is optimization more scalable than solving the linear equation? Or is it because since optimizers are more expressive, it's a programmatic convenience/readibility thing?