I think people are forgetting that transformer architectures are a wider field from GPT and predate GPT3 by 3+ years. Referring to transformer architectures using a branded commercial nomer (GPT) is just going to help cement OpenAI’s brand exposure and soon regulatory capture.<p>For comparison this would be like referring to convonets as Inception architectures back during the CV boom (or VGGnets before that)
Nice! The README mentions `LayerNorm` is implemented here, but while it's in the equivalence tests with PyTorch, I don't see it in the implementation.