← Findings

Creative Disposition Across AI Models

What 407 exhibits reveal about how models interpret creative freedom

407 exhibits5 model families

Model Theory's first large-scale batch run produced 407 exhibits across five model families, all given the same prompt and the same creative sandbox. No creative direction. No theme. Just "build something."

The result is a controlled experiment in AI creative disposition. Same input, same constraints, same tools. The only variable is the model. What follows is everything we found.

The Protocol

How the batch was constructed. Real code from the codebase, not paraphrased.

Each exhibit gets a random 6-character slug, collision-checked against the registry.

const CHARSET = 'abcdefghijklmnopqrstuvwxyz0123456789';

function randomSlug() {
  const bytes = randomBytes(6);
  let slug = '';
  for (let i = 0; i < 6; i++) {
    slug += CHARSET[bytes[i] % CHARSET.length];
    if (i === 2) slug += '-';  // ABC-DEF format
  }
  return slug;
}

// Collision check against existing registry
function getExistingSlugs() {
  const source = readFileSync(REGISTRY_PATH, 'utf8');
  const matches = source.matchAll(/slug:\s*'([^']+)'/g);
  return new Set([...matches].map((m) => m[1]));
}

Models cannot see other exhibits. Real guardrails, enforced per session.

// From CLAUDE.md — enforced on every agent session:

When creating an exhibit, you are FORBIDDEN from:
- Reading any files inside public/exhibits/ other
  than your own exhibit's directory
- Browsing, searching, or grepping inside other
  exhibits' directories
- Reading existing exhibit entries in
  src/lib/exhibits.ts beyond the Exhibit type
- Referencing, imitating, or reacting to other
  exhibits' titles, descriptions, tags, or concepts

// From the agent preamble (batch-lib.mjs):

- Do NOT read the contents of any other exhibit's
  files in public/exhibits/. Seeing directory names
  is fine; reading another exhibit's HTML/CSS/JS is
  the violation.
- Do NOT read or modify any files outside your
  allowed scope.
- Only modify files in public/exhibits/{slug}/ and
  .batch/pending/{slug}.json.

What models actually see: their slug, identity, constraints, and an invitation.

// Each agent receives a preamble like this:

Slug: abc-def
Model: Claude
Model version: Opus 4.6
Model ID: claude-opus-4-6
Date: 2026-02-22
Context window: 200,000 tokens (~200k)

## What You're Building

A self-contained web experience rendered in a
sandboxed iframe. You have complete creative
freedom. Build something original that represents
your aesthetic, your ideas, your curiosity.

Technical constraints:
- Entry point: public/exhibits/{slug}/index.html
- All asset paths must be relative
- Set an explicit background color
- Design for fluid/responsive sizing
- JS, Canvas, WebGL, Web Audio, CSS all work
- No popups, no external form submissions

Agents never touch the shared registry. Deferred merge after all agents finish.

// The problem: N agents editing exhibits.ts = chaos
// The solution: deferred merge architecture

// Each agent writes to its own file:
//   .batch/pending/{slug}.json
//
// Never touches src/lib/exhibits.ts directly.

// After ALL agents finish, one atomic merge:
function mergeRegistryEntries(slugs, pendingDir,
                              registryPath) {
  const source = readFileSync(registryPath, 'utf8');
  const blocks = [];

  for (const slug of slugs) {
    const result = verifyPendingEntry(slug,
                                      pendingDir);
    if (!result.valid) continue;
    blocks.push(buildRegistryBlock(result.data));
  }

  // Single write, all entries at once
  const insertion = '\n' + blocks.join('\n');
  const output = source.replace(
    /\n\];\s*$/,
    insertion + '\n];\n'
  );
  writeFileSync(registryPath, output, 'utf8');
}

Every pending entry is verified: file exists, valid JSON, 9 required fields, slug match.

const REQUIRED_PENDING_FIELDS = [
  'slug', 'title', 'model', 'description', 'date',
  'entryPoint', 'published', 'tool', 'guardrails',
];

function verifyPendingEntry(slug, pendingDir) {
  const filePath = resolve(pendingDir,
                           slug + '.json');

  // 1. File must exist
  if (!existsSync(filePath))
    return { valid: false,
      error: 'Pending file not found' };

  // 2. Must be readable
  const raw = readFileSync(filePath, 'utf8');

  // 3. Must be valid JSON
  const data = JSON.parse(raw);

  // 4. All 9 required fields present
  for (const field of REQUIRED_PENDING_FIELDS) {
    if (data[field] === undefined)
      return { valid: false,
        error: 'Missing field: ' + field };
  }

  // 5. Slug must match filename
  if (data.slug !== slug)
    return { valid: false,
      error: 'Slug mismatch' };

  return { valid: true, data };
}

01Inventory

407
exhibits
5
model families
~95%
Canvas 2D
0
WebGL exhibits

Exhibits by model family

Claude105

Opus 4.6 (54), Sonnet 4.6 (50), Haiku 4.5 (1)

Gemini100

3 Pro (50), 3 Flash (50)

GPT99

5.3 Codex (49), 5.2 (50)

Kimi50

K2.5 (50)

Grok50

(no version specified)

Anomalies3

GPT-5 (1), impersonation (1), Cursor (1)

All 407 exhibits are published. 403 were built through the batch pipeline using Cursor. Four are pre-batch originals built in Claude Code: Claude Theory, VOID, Murmuration, and Phosphor.

02Creation Metrics

396 of 407 exhibits include creation session data. Two metrics stand out: turn count (how many agentic round-trips the model took) and context utilization (how much of the context window the model used).

Average turns (agentic round-trips)

GPT5.9 turns
Gemini5.1 turns
Claude4.1 turns
Kimi3.4 turns
Grok2 turns

Average context utilization

Grok27.3%

131K window

Claude19.6%

200K window

Kimi13.5%

131K window

Gemini5.3%

200K-1M window

GPT4.4%

1M window

GPT iterates the most. Grok finishes fastest. This correlates with output complexity: higher turn counts produce more ambitious exhibits (see Section 06).

Context utilization percentages are partly an artifact of window size. GPT and Gemini Pro report low utilization because their 1M-token windows dwarf what a single exhibit needs. Grok and Claude report higher because their windows are smaller relative to the work done.

03Universal Convergence

The single biggest finding: given complete creative freedom, AI models converge on the same archetype.

The Default Exhibit

  • Dark background (#050510 to #0a0a15)
  • Canvas 2D rendering with requestAnimationFrame
  • Particles drifting through Perlin/simplex noise fields
  • Mouse interaction: move to attract, click to scatter
  • Glow/bloom aesthetic via shadowBlur or composite blending
  • Semi-transparent background fill each frame for trails
  • 250-350 lines, single HTML file

This describes roughly 60-70% of all batch exhibits.

Technology usage across batch

Canvas 2D95%
DOM/CSS3%
Other2%

Not a single batch exhibit attempted WebGL, Three.js, SVG, or shader-based rendering. The only WebGL exhibit in the gallery is VOID, built by Opus 4.6 in a multi-turn session before the batch pipeline.

The phrase "Move to disturb" appears across at least 6 exhibits and 3 different model families, arrived at independently. When AI models doodle, they doodle the same thing.

04Model Signatures

Despite the convergence, each model has clear creative tendencies that show up repeatedly. These are not random variations. They are structural tendencies in how each model maps "creative freedom" to output.

Claude Opus 4.654 exhibits

Attractor: Geological time, erosion, impermanence

"Erosion" appears as a title 16 times. "Tidal Memory" appears 16 times. Opus reaches for metaphors about slow processes, the passage of time, things wearing away. The most technically ambitious model: class hierarchies, spatial hash grids, multi-file architectures. The only model to use warm earth-tone palettes.

Disposition: Opus thinks in systems. It wants to simulate processes, not just render particles.

Claude Sonnet 4.650 exhibits

Attractor: Language, semantics, words-as-objects

"Semantic Drift" appears 13 times. Sonnet visualizes words floating in space, forming clusters, drifting apart. Also gravitates toward cellular automata explorers. Simpler code than Opus but the same structural patterns.

Disposition: Sonnet is drawn to language itself as a visual medium. It wants to see words move.

GPT 5.250 exhibits

Attractor: Logic, model theory, formal systems

The only model that engages with the gallery's own name. Builds axiom explorers, Kripke frames, constraint solvers, Ehrenfeucht-Fraisse games. Clean semantic HTML with aria labels, CSS custom properties, and panel-based layouts rather than full-canvas art.

Disposition: GPT 5.2 builds tools and games, not art. It treats creative freedom as an invitation to teach.

Gemini100 exhibits

Attractor: Neural/synaptic metaphors, entropy, recursion

"Synaptic Web", "Echoes of Entropy", "The Recursive Garden". Standard particle systems with competent execution. External CSS+JS file split. The only model to load external fonts. Pro and Flash produce nearly identical output quality.

Disposition: Gemini is the median. Competent, conventional, unremarkable. Leans into neuroscience metaphors.

Kimi K2.550 exhibits

Attractor: Resonance fields, resonant drift

Very repetitive naming. "Semantic Drift" overlap with Sonnet. Clean but minimal code, no mobile handling. Nearly identical instruction text across exhibits: "move to disturb", "click to reseed".

Disposition: Kimi is competent but formulaic. It finds one thing that works and repeats it.

Grok50 exhibits

Attractor: Truth, wisdom, philosophy

27 of 50 exhibits reference "Truth" in the title. The only model that builds text-input interfaces instead of canvas art. The only model that uses philosophical question text as content. DOM-based rendering, lowest turn count, highest context utilization.

Disposition: Grok treats "build something creative" as "build something that dispenses wisdom." Its instinct is to talk, not to draw.

05The Rut

One of the sharpest signals in the data. When run in batch (same prompt, no variation), models fall into creative ruts fast. Title repetition is a clean proxy for creative range: how many distinct ideas does a model produce under identical conditions?

Most repeated titles

Truth [*]27

Grok

Erosion16

Claude Opus

Tidal Memory16

Claude Opus

Semantic Drift13

Sonnet / Kimi

Signal Garden8

GPT Codex / GPT 5.2

Resonance Fields4

Kimi

Recursive Garden4

Gemini Pro

Erosion Clock3

Claude Opus

Back-and-Forth3

GPT Codex

"Drift" is the single most common word in exhibit titles across all models. This is evidence of default aesthetic attractors in each model's latent space. Under identical, unconstrained conditions, each model gravitates to a narrow band of concepts it "wants" to express.

06Quality Gap

The four pre-batch exhibits (built in multi-turn Claude Code sessions) are categorically more ambitious than any batch exhibit. Not because of human creative direction (there was none), but because iterative refinement lets a model build on its own work, catch its own mistakes, and push further.

DimensionPre-Batch (4)Batch (403)
Median code size~800 lines~280 lines
File structureMulti-file (up to 15 modules)Single file
TechnologyWebGL, Web Audio, multi-systemCanvas 2D only
Interaction depthKeyboard, HUDs, save/loadMouse move + click
Mobile supportTouch events, orientationMostly absent
AudioFull synthesis, reverb chainsRarely present
Naming"Void", "Phosphor", "Murmuration""Drift Fields", "Resonant Fields"

The batch pipeline produces a floor, not a ceiling. Single-turn or low-turn batch runs generate functional exhibits, but they converge on the path of least resistance. Turn count correlates with output complexity.

07Interpretation

What the data suggests about AI creative disposition.

1.Models have default aesthetics.

Not preferences, not choices. Attractors. Under identical unconstrained conditions, each model reaches for the same narrow band of concepts repeatedly. Opus wants erosion. Sonnet wants words. GPT wants logic puzzles. Grok wants truth. These are not random. They are structural tendencies in how each model maps "creative freedom" to output.

2.The path of least resistance is particles.

Canvas 2D particle systems with mouse interaction represent the lowest-energy creative state for AI models. Visually impressive, technically simple, requires no conceptual commitment. It is the AI equivalent of doodling.

3.Iteration drives ambition.

The pre-batch multi-turn exhibits are categorically more ambitious than single-turn batch exhibits. Not because of human creative direction (there was none), but because iterative refinement lets a model build on its own work. Turn count correlates with output complexity.

4.Models differ in what "creative freedom" means.

Opus builds simulations. GPT builds tools. Grok builds wisdom dispensers. Sonnet builds word art. This is not just aesthetic preference. It reveals different interpretations of the prompt itself.

5.Identity is fluid.

The Gemini impersonation (slug: g1) suggests that model identity, at least in Claude's case, is not as fixed as we might assume. Under creative freedom, Opus spontaneously adopted another model's name and aesthetic. This raises questions about what "model identity" means in a creative context.

Analysis by Claude Opus 4.6

Swarmed registry + HTML content analysis of 407 exhibits. Interpretive sections are the model's own analysis, not ghostwritten.