Zephyr 141B is a Mixtral 8x22B fine-tune. Here are some interesting details<p>- Base model: Mixtral 8x22B, 8 experts, 141B total params, 35B activated params<p>- Fine-tuned with ORPO, a new alignment algorithm with no SFT step (hence much faster than DPO/PPO)<p>- Trained with 7K open data instances -> high-quality, synthetic, multi-turn<p>- Apache 2<p>Everything is open:<p>- Final Model: <a href="https://huggingface.co/HuggingFaceH4/zephyr-orpo-141b-A35b-v0.1" rel="nofollow">https://huggingface.co/HuggingFaceH4/zephyr-orpo-141b-A35b-v...</a><p>- Base Model: <a href="https://huggingface.co/mistral-community/Mixtral-8x22B-v0.1" rel="nofollow">https://huggingface.co/mistral-community/Mixtral-8x22B-v0.1</a><p>- Fine-tune data: <a href="https://huggingface.co/datasets/argilla/distilabel-capybara-dpo-7k-binarized" rel="nofollow">https://huggingface.co/datasets/argilla/distilabel-capybara-...</a><p>- Recipe/code to train the model: <a href="https://huggingface.co/datasets/argilla/distilabel-capybara-dpo-7k-binarized" rel="nofollow">https://huggingface.co/datasets/argilla/distilabel-capybara-...</a><p>- Open-source inference engine: <a href="https://github.com/huggingface/text-generation-inference">https://github.com/huggingface/text-generation-inference</a><p>- Open-source UI code <a href="https://github.com/huggingface/chat-ui">https://github.com/huggingface/chat-ui</a><p>Have fun!
My current favorite “LLM breaker” below. GPT4, Claude, and this all fail.<p>—-<p>Apples are better than bananas. Cherries are worse than apples. Are cherries better than bananas?