3 pointsby allmakerabout 1 year ago

1 comment

unravellerabout 1 year ago

The stats show an overall few point quality drop From using 4k context to using 128k context on the mini 3.8B. So it looks like the extra context helps in some cases in hinders in others. I wonder if that means it would be an even bigger drop on the hypothetical "large" 70B model making it ill suited for what you'd want to use it for.

After Llama3 here we have Phi-3: Small Language Models beating LLMs

1 comment

After Llama3 here we have Phi-3: Small Language Models beating LLMs

1 comment