Intuitively, regex or json grammar have a much lower "semantic dimension" than what today LLMs allow. Maybe the observed performance gains result from such lower dimensionality.
That whole structured generation line of work looks promising. I hope someone else takes this and runs evaluations on other benchmarks. Curious to see if the results translate!