TechEcho

> we are seeing improvement on that front thanks to the absence of web dataBingo. This (not the parameter count) is the amazing thing to me.Garbage in garbage out, and there is a ton of garbage in the Falcon/Llama (and OpenAI?) datasets. It feels like such a waste of compute and parameter space.

Textbooks Are All You Need II: phi-1.5 technical report"Perhaps achieving ChatGPT’s level of capability at the one billion parameters scale is actually achievable?"

Llama generally acheives much higher accuracy with very small amount of fine-tuning(on similar quality of dataset like this paper) on lot of tasks. So the model understanding is present in llama to get higher accuracy. e.g Hellaswag, ARC and MMLU for 7b model is 0.8, 0.57 and 0.52 respectively[0], while phi-1 is 0.48, 0.45 and 0.38.I don't think finetuning phi-1 on good quality synthetic data will increase its accuracy as it is only trained on that.[0]: <a href="https://huggingface.co/pankajmathur/orca_mini_v3_7b" rel="nofollow noreferrer">https://huggingface.co/pankajmathur/orca_mini_v3_7b</a>

The model can be downloaded here: <a href="https://huggingface.co/microsoft/phi-1_5" rel="nofollow noreferrer">https://huggingface.co/microsoft/phi-1_5</a>

The title is clickbait and not the title of the paper.

Textbooks Are All You Need II: phi-1.5 technical report"Perhaps achieving ChatGPT’s level of capability at the one billion parameters scale is actually achievable?"

The model can be downloaded here: <a href="https://huggingface.co/microsoft/phi-1_5" rel="nofollow noreferrer">https://huggingface.co/microsoft/phi-1_5</a>

The title is clickbait and not the title of the paper.

The tide is shifting: 1.3B outperforms 7B Llama 2

5 comments

The tide is shifting: 1.3B outperforms 7B Llama 2

5 comments