My focus has been shifting towards the ML alignment space recently, and in particular the ability to translate large transformer models into human understandable circuits and algorithms. This problem potentially isn't solvable, but it is one that some groups have had success with after large amounts of effort.<p>In attempting to address this issue, I've been developing Transpector. A tool scaling up and reducing the barrier to entry of techniques that these teams have been showing success with. Techniques aiming to understand the internal mechanics of the model. Currently this tool is focused on model activations but with more free time willing I'm planning to expend it into the gradient and weight spaces as well.<p>If you have some free time of your own, I encourage you to give it a try, I've found it's not only a bit of fun but its been a good way to help others build intuition of these models.