The headline reads very oddly, and I was wondering how GPT4 got significantly less intelligent. It's that the cost of GPT4-level intelligence has dropped 1000x in 18 months
Lmsys is not a measure of intelligence. It's a measure of human preference. People prefer correct answers (assuming they are qualified to identify the correct one), but they also prefer answers formatted nicely for reading, for example, which has nothing to do with "intelligence". That is why "reasoning" models, which often do better on benchmarks, do not necessarily do correspondingly well on lmsys.