An example with different fine-tuned models (especially smaller/cheaper ones) would probably be more interesting than running against a bunch of similar foundation models. For example throwing in some code-generation models and demonstrating that it picks those for coding problems.