Skip to content
Creativity Benchmark

Breaking the Benchmark: Findings From the Industry’s First LLM Creativity Study

We’ve just completed the world’s first benchmark of LLM creativity in advertising. Our industry-sourced benchmark tested 16 leading models across thousands of head-to-head comparisons. The findings show there’s no single “best” model and that human instinct and variation still matter most.

  • No clear model winner.

    Model rankings shift by brand, culture, and task.

  • LLMs aren’t good judges.

    Machines showed strong bias toward reasoning models and were 5–10× more confident than humans

  • Variance matters.

    Some models deliver broader creative spread than others

  • Best practice

    Use models for volume, humans for selection

CB_Visual_2

Who Participated?

678

Advertising Industry Professionals

How Many Matches?

11,012

Idea Comparisons

Which Tools?

16

AI Models

Read the full white paper here

White Paper

Download the one pager here

CB_Visual_2 (1)
WARC-web_LARGE_dark_full-spacing

First part of a three-part series on LLM creativity featuring Creativity Benchmark.

"For those of us still grappling with our professional identity in an AI era, it’s reassuring to know that ultimately creativity is still, in large part, a human pursuit. While LLMs can help us get creative results faster, we are still the directors in this human-AI collaboration process. "

lieu
Lieu Thi Pham, Informa TechTarget
4As-New-Logo_Red

Our GM, Americas, Carolyn Murphy, joined Jeremy Lockhorn, SVP, Creative Technologies & Innovation from the 4As for a discussion exploring how leading AI models stack up against human creativity and what that means for the future of creative work.