My focus has been shifting towards the ML alignment space recently, and in particular the ability to translate large transformer models into human understandable circuits and algorithms. This problem potentially isn't solvable, but it is one that some groups have had success with after large amounts of effort.<p>In attempting to address this issue, I've been developing Transpector. A tool scaling up and reducing the barrier to entry of techniques that these teams have been showing success with. Techniques aiming to understand the internal mechanics of the model. Currently this tool is focused on model activations but with more free time willing I'm planning to expend it into the gradient and weight spaces as well.<p>If you have some free time of your own, I encourage you to give it a try, I've found it's not only a bit of fun but its been a good way to help others build intuition of these models.
hey, this looks pretty cool. I was about to start research into the tools you use to do stuff like find hyper parameters, debug the network and so on. Karpathy’s YT series aludes to the need to do such things but he hasn’t yet dug into that rabbit hole. I hope I get some time to try this out. But the visuals look great and make me think this would be worth trying out as a learning (as in me learning!) tool.