We’ve just completed the world’s first benchmark of LLM creativity in advertising. Our industry-sourced benchmark tested 16 leading models across thousands of head-to-head comparisons. The findings show there’s no single “best” model and that human instinct and variation still matter most.
-
No clear model winner.
Model rankings shift by brand, culture, and task.
-
LLMs aren’t good judges.
Machines showed strong bias toward reasoning models and were 5–10× more confident than humans
-
Variance matters.
Some models deliver broader creative spread than others
-
Best practice
Use models for volume, humans for selection

Who Participated?
678
Advertising Industry Professionals
How Many Matches?
11,012
Idea Comparisons
Which Tools?
16
AI Models
First part of a three-part series on LLM creativity featuring Creativity Benchmark.
"For those of us still grappling with our professional identity in an AI era, it’s reassuring to know that ultimately creativity is still, in large part, a human pursuit. While LLMs can help us get creative results faster, we are still the directors in this human-AI collaboration process. "
Our GM, Americas, Carolyn Murphy, joined Jeremy Lockhorn, SVP, Creative Technologies & Innovation from the 4As for a discussion exploring how leading AI models stack up against human creativity and what that means for the future of creative work.