A simple R package which helps with annotation of single cell experiments such as single cell RNA-seq. With up and down regulated genes per cell cluster, the local LLM guesses the cell type annotation and creates an overall extensive report.
Can anyone explain how an LLM is useful here? The clustering is done traditionally right? Then the llm is given the centroids and asked to give a label? Assumption being that the llm corpus already contained some mapping from gene up/down regulations to clusters of differentiation?
How easy is it to check the results of cell annotations for mistakes?<p>Is it easy for a person to do, and this will save them a bunch of time getting a baseline? Or could this lead to a bunch of mislabeled data?
I'm surprised that this is using plain llama3.1 rather than a fine-tune. Have you checked the accuracy of the results on the common benchmarks? Also, given it provides just the answers just based on the up/down lists, (or did I miss something?) isn't that something that could be extracted into a more efficient lookup with only a 2d grid of weights? (Or 3d if we there are group-of-genes effects)
This is really useful thanks for sharing. My students and myself tend to waste a lot of time annotating clusters and have not found a reasonable solution yet. This will be fun to try.