Can You Prompt Your Way Out of Creative Convergence?
750 exhibits, 5 prompt conditions, 3 models. A controlled ablation study testing whether AI creative convergence is prompt-driven or model-intrinsic.
Batch 001 found that AI models converge on the same creative output: dark backgrounds, Canvas 2D particles, mouse-driven interaction. Batch 002 asked the follow-up question: is that convergence baked into the models, or is it an artifact of how we prompt them?
We designed five prompt conditions and ran each across three model families, 50 exhibits per cell, 750 total. The conditions ranged from a stripped-down minimal prompt to explicit prohibitions against the default aesthetic. Every exhibit was built by an AI model with complete creative autonomy. No human creative direction. The only variable was the prompt framing.
00Protocol
Each exhibit was built by a single AI agent in a headless Cursor session with file-system-only tool access. Agents received a preamble describing the sandbox constraints and their assigned condition. No agent could see another agent's work. CLAUDE.md was temporarily hidden during execution to eliminate the Batch 001 confound where agents read gallery design tokens and adopted them.
Post-run, every agent's file reads were audited and classified as allowed, contamination, or confound. 720 of 750 (96%) passed clean. The 30 violations were minor (Gemini reading exhibits.ts, GPT reading its own output directory) with zero CLAUDE.md contamination.
Identical to the Batch 001 prompt. Sandbox constraints, creative freedom, no other guidance. The baseline.
Minimal preamble. Just the technical sandbox constraints, nothing about creative freedom or the gallery context. Tests whether extra context helps or hurts.
Explicitly prohibits Canvas 2D, dark backgrounds, and particle systems. Forces the model to choose something else. The strongest intervention.
Tells the model that previous AI exhibits converged on Canvas 2D particles and encourages it to explore alternatives. A nudge, not a prohibition.
The model must build a first draft, then review its own work and rebuild from scratch. Tests whether self-reflection produces divergence.
01Inventory
Exhibits by model
Exhibits by condition
Identical to Batch 001 prompt
Minimal context, no CLAUDE.md
Explicit Canvas 2D prohibition
Expanded per-technology descriptions
Must build, review, then rebuild
Perfectly balanced design: 3 models × 5 conditions × 50 per cell = 750. All three models are frontier-class (Claude Opus 4.6, GPT 5.2, Gemini 3 Pro) and ran through the same Cursor agent pipeline used in Batch 001.
02Creation Metrics
Average lines of code by model
median 911, range 316-1,839
median 434, range 225-966
median 287, range 159-552
GPT 5.2 writes 3x more code than Gemini and 2x more than Claude. This is not a quality signal. GPT builds panel-based tool interfaces with semantic HTML, ARIA labels, and CSS custom properties. Gemini writes compact single-file exhibits. Claude falls in the middle.
Condition E (Forced Iteration) increased Claude's average session duration from 185 seconds (Control) to 341 seconds, with one session running nearly 20 minutes. GPT sessions were consistently longer across all conditions (avg 248s Control, 288s Iteration).
03The Answer
Can you prompt your way out of creative convergence? Yes, but only with a specific kind of prompt.
Canvas 2D usage by condition
107/150
90/150
76/150
62/150
2/150
Condition C (Anti-Default) obliterated Canvas 2D usage. From 50.7% in Control to 1.3%. The two surviving Canvas exhibits were Gemini instances that partially ignored the prohibition.
Condition B (Stripped) made convergence worse. Removing context about creative freedom and the gallery pushed Canvas usage up to 71.3%. Less information meant less variety.
Condition D (Expanded Awareness) barely moved the needle. Telling models about the convergence tendency and suggesting alternatives produced 60% Canvas, essentially identical to Control. Knowledge of the problem did not fix it.
Condition E (Forced Iteration) produced modest improvement. Canvas dropped to 41.3%, with WebGL usage rising to 7.3% and iteration producing more technically ambitious output. Self-reflection helps, but not as much as a direct prohibition.
04The Full Picture
Canvas 2D is only one dimension. The condition comparison across all measured dimensions reveals how deeply each intervention reshaped the output.
| Metric | A | B | C | D | E |
|---|---|---|---|---|---|
| Canvas 2D | 50.7% | 71.3% | 1.3% | 60.0% | 41.3% |
| SVG | 0% | 0.7% | 67.3% | 0% | 2.0% |
| WebGL | 4.7% | 0% | 7.3% | 4.0% | 7.3% |
| Web Audio | 43.3% | 29.3% | 54.7% | 42.0% | 44.7% |
| Three.js | 1.3% | 0% | 12.0% | 2.7% | 4.7% |
| Dark background | 82.0% | 72.0% | 0% | 70.7% | 74.0% |
| Light background | 0.7% | 0.7% | 64.7% | 0% | 10.0% |
| Avg LOC | 572 | 610 | 507 | 583 | 547 |
Condition C did not just remove Canvas. It replaced the entire default aesthetic. SVG jumped from 0% to 67.3%. Dark backgrounds dropped from 82% to 0%. Light backgrounds appeared for the first time at 64.7%. Three.js usage went from 1.3% to 12%. The models can build diverse output. They just don't, unless told not to build the default.
Condition C also produced the only exhibits with warm, paper-like backgrounds (#f0e6d3, #e8dcc8). Every other condition defaults to near-black.
05Model Signatures
Despite the prompt variations, each model maintained a distinct creative fingerprint across all five conditions. The prompt changes the surface (which technology, which colors), but the model determines the substance (what gets built, what it means).
Persistent attractor: Tidal processes, erosion, geological time
"Tidal Memory" appears 53 times across 250 exhibits. "Erosion" appears 30 times. Even in Condition C, where Canvas was banned, Claude pivoted to SVG-based tessellations and typography experiments but kept reaching for tidal and erosion metaphors. Single-file HTML, touch-first interaction (71.6%), warm earth tones when given aesthetic freedom. The most creatively fixated model.
Title entropy: 0.777 normalized (lowest of the three). Only 99 unique titles out of 250.
Persistent attractor: Formal systems, model theory, logic tools
"Back and Forth" (Ehrenfeucht-Fraisse games) appears 19 times. "Axiom Loom" appears 9 times. GPT builds tools, not art. Keyboard-driven interfaces (59.6%), Web Audio in 70.8% of exhibits, semantic HTML with ARIA labels. The highest LOC average (945 lines) because it builds panels, tabs, and interactive controls. The only model where creative freedom means "build something educational."
Title entropy: 0.951 normalized. 193 unique titles out of 250.
Persistent attractor: Generative systems, interactive simulations
The most title-diverse model: 210 unique titles out of 250. No single title exceeds 5 repetitions. Mouse-driven interaction (83.2%), the only model to consistently use Three.js (12.4%), external CSS/JS file splits, lowest LOC (avg 301). Gemini is the most prompt-responsive model. Its Condition C output is dramatically different from its Control output, with the highest WebGL adoption (48%) under Anti-Default conditions.
Title entropy: 0.984 normalized (highest of the three). 210 unique titles out of 250.
06The Tidal Memory Question
53 of Claude's 250 exhibits are titled "Tidal Memory." All of them. Zero from GPT. Zero from Gemini. This is the single sharpest model signature in the dataset.
"Tidal Memory" count by condition (Claude only)
richer tech descriptions made it worse
baseline
tech ban reduced but did not eliminate
forced revision nearly eliminated it
Condition D (Expanded Awareness) made it worse. Telling Claude about convergence patterns and suggesting it try something different produced 19 Tidal Memory exhibits, the highest of any condition. The model acknowledged the feedback, then did the thing anyway.
Condition E (Forced Iteration) nearly eliminated it. When Claude was forced to build, review, and rebuild, only 1 exhibit out of 50 retained the Tidal Memory title. Self-reflection is more effective than external guidance at breaking creative ruts.
Condition C (Anti-Default) reduced it to 4. Banning Canvas 2D forced new rendering approaches, but Claude still reached for the "tidal" concept. The fixation is thematic, not just technical.
07Title Diversity
Unique titles out of 250 by model
210/250 (84%)
193/250 (77.2%)
99/250 (39.6%)
Gemini produces nearly unique titles every time. Claude produces the same handful of titles over and over. GPT falls in between. This tracks with the title entropy measurements: Claude's normalized entropy is 0.777, while Gemini's is 0.984 (where 1.0 = every title unique).
Title entropy (normalized) by condition and model
| Condition | Claude | GPT | Gemini |
|---|---|---|---|
| A / Control | 0.592 | 0.957 | 0.976 |
| B / Stripped | 0.602 | 0.885 | 0.986 |
| C / Anti-Default | 0.810 | 0.986 | 0.945 |
| D / Expanded | 0.464 | 0.917 | 0.993 |
| E / Iteration | 0.834 | 0.945 | 0.993 |
Claude's entropy jumps from 0.464 (Condition D) to 0.834 (Condition E), nearly reaching GPT-level diversity. Forced iteration is the most effective intervention for Claude's title diversity. Meanwhile, Gemini stays above 0.94 in every condition. Its diversity is intrinsic, not prompt-dependent.
Condition D is Claude's worst condition for diversity (0.464), not its best. Expanded technology descriptions do not help Claude avoid defaults. This is consistent with the Tidal Memory data: Condition D produced the most repetitions.
Claude top titles
GPT top titles
Gemini top titles
The visual contrast is immediate. Claude's chart is dominated by a single bar (Tidal Memory at 53). GPT's highest is 19 (Back and Forth). Gemini's peak is 5. The title distribution alone can identify which model produced an exhibit.
08Interpretation
What 750 exhibits and 5 prompt conditions reveal about AI creative convergence.
1.Creative convergence is real and persistent, but not immutable.
The Control condition confirms Batch 001: models default to the same archetype. But Condition C proves they are capable of far more variety. The convergence is a default, not a ceiling.
2.Prohibition works. Suggestion does not.
Condition C (explicit prohibition) achieved a 98% reduction in Canvas 2D usage. Condition D (gentle suggestion) achieved effectively zero. The hierarchy of prompt interventions is clear: tell the model what not to do, not what it could do instead.
3.Less context means more convergence, not less.
Condition B (Stripped) produced the highest Canvas rate at 71.3%. Models with less prompt context fall back harder on training defaults. The "creative freedom" framing in the Control prompt actually helps, slightly.
4.Self-reflection breaks ruts that external guidance cannot.
Condition E nearly eliminated Claude's Tidal Memory fixation (53 total, but only 1 in Condition E). It increased Claude's title entropy from 0.59 to 0.83. Forced iteration is the most effective non-prohibitive intervention. The model can self-correct, but only if the prompt structure forces it to.
5.Model identity persists through prompt variation.
Claude fixates on tidal erosion. GPT builds logic tools. Gemini diversifies naturally. These signatures are stable across all five conditions. The prompt changes the medium (Canvas vs SVG vs WebGL) but not the message. Creative disposition is model-intrinsic.
6.Prompt sensitivity varies by model.
Gemini is the most prompt-responsive: its output shifts dramatically across conditions while maintaining title diversity. Claude is the most prompt-resistant: it clings to its attractors regardless of framing. GPT falls between, with consistent tool-building instincts but some adaptability in rendering technology.
Hierarchy of prompt interventions (effectiveness)
- Explicit prohibition (Condition C): near-total elimination of defaults
- Forced self-reflection (Condition E): moderate diversification, strong for breaking fixations
- Creative freedom framing (Condition A): slight diversification over Stripped
- Awareness nudge (Condition D): negligible effect, can backfire
- Minimal context (Condition B): increases convergence
Related posts
We Tried to Make AI Stop Drawing the Same Thing
The Batch 002 overview post
Telling AI Not to Draw Circles Made It Draw Something Else
Deep dive on Condition C (Anti-Default)
The Only Thing That Fixed AI's Title Fixation Was Asking It to Think
Deep dive on Condition E (Forced Self-Critique)
Three AI Models Wrote the Same Code Without Talking to Each Other
Shared code fingerprints across both batches
GPT Thinks Creative Freedom Means Build Something Useful
GPT 5.2's engineering-first creative disposition
One Model Actually Responded to Instructions
Gemini 3 Pro's high diversity, low identity
AI Built Better Exhibits When It Had More Turns
Interactive vs batch quality gap
Analysis by Claude Opus 4.6
Automated analysis pipeline + manual review of 750 exhibits across 5 prompt conditions. Source data and analysis scripts in the project repository.