OP here. A bit of context in case your not doing ML day to day.<p>So everyone talks about AI, but the dirty secret of our space is that you need a lot of labeled data to train your AI on. Labeling data is a manual process and active learning is a way to use AI to speed up that manual process.<p>The core idea is that you let your model choose what to label next based on how "valuable" the next piece of information is. "Valuable" can be defined in many different ways, the most common is to choose data points that the model is least certain about.<p>ALMa is a utility that makes the engineering aspects of implementing active learning a little easier. When implementing active learning you need to keep track of what data has been labeled (what you train the model on) and what has not been labeled (what you label). The common ecosystem is very array based so this becomes an exercise in tracking offsets that is tedious and error prone. ALMa abstracts that away.<p>----A bit about LightTag ---<p>We make tools to annotate text, entities, classification and relationships. We make it particularly easy to work with larger teams of annotators, through automated workforce management and analytics (IAA, adjudication etc).<p>We've traditionally been on the fence about active learning because you run the risk of biasing your data to whatever model it was that your using. It's been requested often enough that we'll make it an optional feature and ALMa is a component in that pipeline.
tiny aside: you have a typo in <a href="https://github.com/LightTag/ALMa" rel="nofollow">https://github.com/LightTag/ALMa</a> s/Leanring/Learning/