weird title, note that the tweet said "so yes, GPT4 is *technically* 10x the size of GPT3, and all the small circle big circle memes from January were actually... in the ballpark?"<p>It's really 8 models that are 220B, which is not the same as one model that is 1.7T params. There have been 1T+ models via mixtures of experts for a while now.<p>Note also the follow up tweet: "since MoE is So Hot Right Now, GLaM might be the paper to pay attention to. Google already has a 1.2T model with 64 experts, while Microsoft Bing’s modes are different mixes accordingly"<p>There is also this linked tweet <a href="https://twitter.com/LiamFedus/status/1536791574612303872" rel="nofollow noreferrer">https://twitter.com/LiamFedus/status/1536791574612303872</a> -
"They are all related to Switch-Transformers and MoE. Of the 3 people on Twitter, 2 joined OpenAI. Could be related, could be unrelated"<p>Which links to this tweet:
"Today we're releasing all Switch Transformer models in T5X/JAX, including the 1.6T param Switch-C and the 395B param Switch-XXL models. Pleased to have these open-sourced!"<p>Anyway... remember not to just read the headlines, they can be misleading.