The MACAW model is quite impressive -- it significantly outperforms GTP3 on Q&A/reasoning tasks (~10%) while requiring x10 less parameters. It is based on T5 , which is a well known model from Google. The novel innovation here is the training paradigm.<p>Also did I mention it's OSS:
<a href="https://macaw.apps.allenai.org/" rel="nofollow">https://macaw.apps.allenai.org/</a><p>Here is the paper:
<a href="https://arxiv.org/abs/2109.02593" rel="nofollow">https://arxiv.org/abs/2109.02593</a>