I can imagine kindergartens using legos to teach children "And this is , children, how the multi-head attention works". Matrix algebra as used in AI is very good fit for geometric visualizations. But in the end , it doesn't explain Why it works so good or so human-like. Valuable kindergarten lesson though