That's Gemma-7Bx8<p>He goes on about the journey, the future, the collaboration that happens somewhere. Nothing about the current result, like is it any good or unexpectedly great at anything?<p>I suppose it was more of an exercise in readying the tools to make it possible and cheap to fine tune and it is now left up to the community to tune and push the benchmarks. I also wonder if it could benefit from today's latest paper drop (from meta) on dirt cheap MoE creation. <a href="https://huggingface.co/papers/2403.07816" rel="nofollow">https://huggingface.co/papers/2403.07816</a>