The article concludes that the overall translation score of Llama 4 is below that of Llama 3.3.
However, the included table shows that Llama 4 scores better on all subcategories included in the test - coherence, idiomaticity and accuracy.<p>Something does not add up. The conclusion just states "...downgrade from LLama 3.3 in every respect" without further explanation.