Hi all,<p>I've created a Jupyter notebook with everything you need to convert+run GPT J from Jax over to work with the new HuggingFace PR for GPT J. I've also got the model working on our production environment that you can play around with/use in production here:<p>https://hub.getneuro.ai/model/nlp/gpt-j-6B-text-generation<p>Average inference speed is surprisingly fast running on our T4s, around 5s for 50 tokens. Will be trying with a V100, and Quadro 8000 (full precision model) tomorrow. To fit the model on GPUs that are sub ~24GB the model in the demo and notebook are half precision in torch. This was kinda painful to get working, so hopefully you find it useful.<p>https://github.com/paulcjh/gpt-j-6b/blob/main/gpt-j-t4.ipynb<p>Cheers