TechEcho

I am looking for software to classify documents into 10-20 categories. The documents are about half-screen to screen long.There are some labeled data (about 50-80 labeled documents per category. not 500 per category), so a few-shot learning might be an option.Algorithms used: it might be something like KNearestNeighbor or some ML/Neural networks (transformers? LLM?). Should just do the proper classification.Some restrictions: It should be a "ready to use" pipeline with documentation about training the model, parameter optimization etc. If possible - there should be some way to use this framework/library without Python (I'm not a Python developer) For example, the [1] and [2] allow to use command-line interface for everything - it seems using Python is optional for these frameworks. The SetFit framework (see [3] and [4]) looks quite promising (good results with 8 labeled samples per class!). But requires doing everything in Python.[1] https://fasttext.cc/docs/en/supervised-tutorial.html[2] https://neuml.github.io/txtai/pipeline/text/labels/[3] https://github.com/huggingface/setfit[4] https://www.philschmid.de/getting-started-setfit

1 comment

txtaiabout 2 years ago

SetFit is a great framework for building a text classifier.This is a pretty straight forward problem and a good fit for a standard text classifier as well.Here is an example of fine-tuning a model with txtai: <a href="https://colab.research.google.com/github/neuml/txtai/blob/master/examples/16_Train_a_text_labeler.ipynb" rel="nofollow">https://colab.research.google.com/github/neuml/txtai/blob/ma...</a>

评论 #35504355 未加载

Ask HN: What's the best framework for text classification (few-shot learning)?

1 comment

Ask HN: What's the best framework for text classification (few-shot learning)?

1 comment