Your<p>"I'm more than willing to spend an hour per page and do all the exercises, but I would like good exposition."<p>is essentially necessary and sufficient.<p>If an exercise takes more than two hours, then swallow your pride and skip the exercise (it may be <i>misplaced</i>, have an error, or just be way too difficult for effective education).<p>For linear algebra:<p>(1) Work through a few introductory texts.<p>(2) Work carefully through the long time, unchallenged, world-class classic,<p>Halmos, <i>Finite Dimensional Vector Spaces</i>.<p>and there note near the back his cute ergodic convergence theorem.<p>The glory here is the polar decomposition.<p>(3) Get some contact with some applications, including in elementary multi-variate statistics, numerical techniques, optimization, etc.<p>(4) Pick from<p>Horn and Johnson, <i>Matrix Analysis.</i><p>and<p>Horn and Johnson, <i>Topics in Matrix Analysis.</i><p>My first course was an "advanced course" from one of Horn, Johnson, and I knocked the socks off all the other students. How'd I do that? Brilliant? Worked hard? Learned a lot? Nope. Instead the key was just my independent work with (1) -- (3).<p>So if you do (1) -- (4), then you will be fine.<p>For analysis, (Baby Rudin)<p>Walter Rudin, <i>Principles of Mathematical Analysis.</i><p>Note in the back that a function is Riemann integrable if and only if it is continuous everywhere except on a set of Lebesgue measure 0.<p>Also know cold that a uniform limit of continuous functions is continuous.<p>Royden, <i>Real Analysis.</i><p>and the first, real, half of (Papa Rudin)<p>Rudin, <i>Real and Complex Analysis.</i><p>Of course, emphasize the Radon-Nikodym theorem; I like the easy steps in Royden and Loeve (below), but see also the von Neumann proof in Papa Rudin.<p>For probability based on measure theory and the limit theorems,<p>Breiman, <i>Probability.</i><p>Note his result on regular conditional probabilities.<p>Neveu, <i>Mathematical Foundations of the Calculus of Probability.</i><p>If you can work all the Neveu exercises, then someone should buy you a <i>La Tache</i> 1961.<p>Loeve, <i>Probability Theory.</i><p>Note the classic Sierpinski counterexample exercise on regular conditional probabilities (also in Halmos, <i>Measure Theory</i>),<p>Cover the Lindeberg-Feller version of the central limit theorem as well as simpler versions. Do the weak law of large numbers as an easy exercise. Cover the martingale convergence theorem (I like Breiman here) and use it to give the nicest proof of the strong law of large numbers. Cover the ergodic theorem (Garcia's proof) and its (astounding) application to Poincare recurrence. Cover the law of the iterated logarithm and its (astounding) application to the growth of Brownian motion.<p>Of course apply the Radon-Nikodym theorem and conditioning to sufficient statistics and note that order statistics are always sufficient. Show that sample mean and variance are sufficient for i.i.d. Gaussian samples and extend to the exponential family.<p>Give yourself an exercise: In Papa Rudin, just after the Radon-Nikodym theorem, note the Hahn decomposition and use it to give a quite general proof of the Neyman-Pearson lemma.<p>To appreciate the law of large numbers in statistics, read the classic Halmos paper on minimum variance, unbiased estimation.<p>For tools for research in statistics, might want to get going in stochastic processes. So, for elementary books, look for authors Karlin, Taylor, and Cinlar and touch on some applications, e.g., Wiener filtering and power spectral estimation. Note the axiomatic derivation of the Poisson process and the main convergence theorem in finite Markov chains (also a linear algebra result). Then for more, note again the relevant sections of Breiman and Loeve and then:<p>Karatzas and Shreve, <i>Brownian Motion and Stochastic Calculus.</i>