I was very impressed with Anthropic's paper on Concept mapping.<p>Post <a href="https://www.anthropic.com/news/mapping-mind-language-model" rel="nofollow">https://www.anthropic.com/news/mapping-mind-language-model</a><p>Paper <a href="https://transformer-circuits.pub/2024/scaling-monosemanticity/index.html" rel="nofollow">https://transformer-circuits.pub/2024/scaling-monosemanticit...</a><p>This seems like a very good starting point for alignment. One could almost see a pathway to making something like the laws of robotics from here. It's a long way to go, but a good first step.