> we are seeing improvement on that front thanks to the absence of web data<p>Bingo. This (not the parameter count) is the amazing thing to me.<p>Garbage in garbage out, and there is a <i>ton</i> of garbage in the Falcon/Llama (and OpenAI?) datasets. It feels like such a waste of compute and parameter space.
Textbooks Are All You Need II: phi-1.5 technical report<p>"Perhaps achieving ChatGPT’s level of capability at the one billion parameters scale is actually achievable?"
Llama generally acheives much higher accuracy with very small amount of fine-tuning(on similar quality of dataset like this paper) on lot of tasks. So the model understanding is present in llama to get higher accuracy. e.g Hellaswag, ARC and MMLU for 7b model is 0.8, 0.57 and 0.52 respectively[0], while phi-1 is 0.48, 0.45 and 0.38.<p>I don't think finetuning phi-1 on good quality synthetic data will increase its accuracy as it is only trained on that.<p>[0]: <a href="https://huggingface.co/pankajmathur/orca_mini_v3_7b" rel="nofollow noreferrer">https://huggingface.co/pankajmathur/orca_mini_v3_7b</a>
The model can be downloaded here: <a href="https://huggingface.co/microsoft/phi-1_5" rel="nofollow noreferrer">https://huggingface.co/microsoft/phi-1_5</a>