Human supervision in LLM pre-training is a Herculean task.<p>E.g., the LLM spits out an essay based on a prompt. A human has to read it and decide how close it is to the output expected from the prompt.<p>For an LLM with 200 billion parameters, this above process had to (presumably) be done Quadrillions of times.<p>How are they achieving such scale?