The usual problem with matrix multiplication is that both are stored in either row or column order. That means that one of the two memory access patterns will be inefficient.<p>However, it would be easy to create a "matrix memory" that allowed linear memory addressing depending on the choice of memory port. The hardware could have a row-access and a column-access, both of which were linear because of the hardware wiring. That way both row and column access could be efficient.<p>All that would be required would be some "memory configuration registers" that specified the matrix shape.