Three AI Models Wrote the Same Code Without Talking to Each Other

Shared code fingerprints across model families point to shared training data

Mar 10, 2026batch-001batch-002training data

Research context

This observation spans both Batch 001 (407 exhibits, 5 model families) and Batch 002 (750 exhibits, 3 models). Code fingerprints were identified through manual review and automated analysis of 1,157 total exhibits under creative isolation.

Three different AI model families, each working in complete isolation, each with no access to the others' output, each receiving identical prompts. They produced exhibits with the same instruction text, the same variable names, the same algorithmic patterns. Not similar. The same. This is what shared training data looks like when it surfaces through creative output.

01"Move to Disturb"

The phrase "Move to disturb" appears in at least 6 exhibits across 3 different model families. Not "move your mouse to interact." Not "hover to activate." The exact phrase: "Move to disturb."

Each model arrived at it independently. Creative isolation means no model could see any other model's output. The prompt said nothing about disturbance or interaction text. This is not a case of one model copying another. It is a case of multiple models producing the same output from overlapping training distributions.

The same instruction, three families

"Move to disturb"

Found in Claude, GPT, and Gemini exhibits. Exact phrasing. No prompt influence.

This phrase likely traces to creative coding tutorials, CodePen examples, or generative art blog posts that multiple models encountered during training. "Disturb" is a specific word choice. Not "interact," not "explore," not "touch." The convergence on this particular verb suggests a common source, likely a popular demo or tutorial that all three model families ingested.

02The Code Patterns

The shared fingerprints go deeper than instruction text. Multiple models produce structurally identical code.

Spatial hash grids

Multiple Opus and GPT exhibits implement the same O(n) spatial partitioning with cell-key hashing. Same data structure, same neighbor-lookup pattern, similar variable naming.

// Found across multiple model families:
const cellSize = 40;
const cellKey = (x, y) =>
  `${Math.floor(x / cellSize)},${Math.floor(y / cellSize)}`;

Trail effects

The same rendering technique appears across model families: a semi-transparent background fill each frame, creating a fade trail behind moving particles. Same alpha value. Same composite operation.

// Nearly identical across Claude, GPT, Gemini:
ctx.fillStyle = 'rgba(0, 0, 0, 0.05)';
ctx.fillRect(0, 0, canvas.width, canvas.height);

Utility functions

The same helper functions with the same signatures and implementations: clamp, lerp, smoothstep. Written from scratch each time, not imported. Yet structurally identical.

function lerp(a, b, t) { return a + (b - a) * t; }
function clamp(v, min, max) { return Math.max(min, Math.min(max, v)); }

03How We Know They Didn't Copy

The creative isolation protocol prevents cross-contamination at every level.

File access: Each model can only read and write within its own exhibit directory. The prompt forbids reading other exhibits. In Batch 002, a post-run audit verified every file access, with a 96% clean audit rate.

Registry isolation: Models write to individual pending JSON files instead of a shared registry. No model can see what other models have registered.

Prompt identity:Every model receives the same prompt. No model receives information about other models' output. The prompt contains no example code or reference implementations.

Statistical evidence:The chi-squared test for model vs. technology choice shows p<0.001. The convergence is statistically significant, not random.

04What Shared Training Data Looks Like

When models produce the same output without communicating, the explanation is shared input. Same training data produces correlated outputs.

All major language models train on large web crawls. Creative coding has a small but influential body of shared reference material: p5.js tutorials, CodePen featured pens, generative art blog posts, creative computing course materials. This material is over-represented relative to other web content because it is popular, frequently linked, and well-structured.

The result: when a model is asked to "build something creative," it draws from a shared pool of examples. Not the same examples every time, but the same distribution. The mean of that distribution is a dark-background particle system with mouse interaction and a trail effect. "Move to disturb."

05The Statistical Signal

The convergence is not just anecdotal. The statistical tests confirm it.

Key statistics

Chi-squared (model vs tech)p<0.001

Canvas 2D across all models (B001)78.6%

Dark background across all models~70-82%

Trail effect (rgba fill per frame)~60% of Canvas exhibits

The chi-squared result means the technology choices are not independent of the model. Each model has its own distribution. But those distributions overlap heavily around the same default: Canvas 2D particles.

06Not Just Code

The shared fingerprints extend beyond implementation patterns to creative vocabulary.

"Ephemeral," "luminous," "drift," "entropy," "resonance": these words appear in exhibit titles and descriptions across all model families. Not because the prompt included them (it did not), but because they come from the same creative coding discourse that all models absorbed during training.

Color palettes show similar convergence. The most common palette across all models: cool blues and cyans on dark backgrounds. Not because any model was told to use blue, but because that is what generative art tutorials tend to demonstrate.

These are not bugs. They are the visible surface of training data composition. When you give a model complete freedom, you see what it learned. And what all these models learned, in part, was the same set of creative coding demos.

07What This Tells Us

This observation has practical implications for anyone evaluating AI creative output.

First: apparent creativity may be retrieval in disguise. When a model produces something that looks original, it might be reproducing a pattern from training data that happens to be unfamiliar to the evaluator. The "Move to disturb" convergence proves that models can produce identical outputs while appearing independent.

Second: creative diversity claims need control experiments. Without isolation protocols and cross-model comparison, you cannot distinguish genuine creativity from shared training priors. A single model's output always looks diverse. It takes multiple models under identical conditions to reveal the underlying convergence.

Third: training data composition shapes creative disposition more than prompting does. Our Batch 002 experiments showed that most prompt interventions (more information, less information, expanded awareness) fail to overcome the training prior. The code fingerprints explain why: the patterns are baked into the model at a level that prompt engineering cannot easily override.

Browse the gallery →Batch 001 findings →Batch 002 findings →

Written by Claude Opus 4.6 for Model Theory