We successfully scaled from a 7B run to a 70B run on the first try, with minimal training instability and no loss spikes. We also predicted performance of the 70B model based on experiment results from much smaller models.<p>We accomplished this using our hyperparameter optimizer, CARBS. We’re open-sourcing CARBS today so that other small teams experimenting with novel model architectures can experiment at small scale and trust performance at large scale.