I believe this wouldn't be meaningful, since any size LLM can be trained on any amount of data.<p>You could measure how well it memorizes via prediction accuracy on the training set, but this wouldn't indicate whether it generalizes well.
LLaMa 3.1 has been pre-trained on 15 trillion tokens, plus some more millions for the fine-tuning. About 60 terabytes.<p><a href="https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/MODEL_CARD.md">https://github.com/meta-llama/llama-models/blob/main/models/...</a><p>The heaviest quantised LLaMa 3.1 8B is about 3.4GB.<p>So 0.005% compression rate, if you don't mind the intelligence of a heavily quantised 8B model.
OpenAI’s GPT-3 model (175B) has an archive size of about 350 GB, with training data estimated in the hundreds of terabytes, resulting in a highly compressed ratio.