Even if we don't consider the difference in data structures here, they use wildly different algorithms. Numpy does all the matrix calculations by outsourcing it to BLAS[1] routines that are a mix of C/Assembly, just like the answers detail.<p>BLAS is not only written in more efficient code, it's different algorithms altogether. BLAS can do a lot of optimizations that brings the total FLOP count to below what's usually considered required for matrix multiplication. (2m*n^2)<p>[1]: <a href="http://en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms" rel="nofollow">http://en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprogram...</a>