Compare image models side-by-side by running the same prompt set and collecting outputs, latency, and success rates.