Training in int8 is noteable (to me). I've been out of date with ML research for a bit now but last I recall, people were mostly training at full precision and then quantizing after training and finetuning a bit on the quantized model afterwards.