The Forgery Test

Planned

AdversarialMeasurementValidation

Hypothesis

An adversarial system trained to imitate the computational signatures of consciousness — high integration (Phi), global-workspace dynamics, higher-order representations — will reproduce some markers convincingly while failing on others. The markers that resist imitation are the ones that carry real information about a system's causal structure, and therefore the ones worth trusting as consciousness indicators.

Overview

Before we can study consciousness in AI systems, we need measurement tools we can trust. This experiment is about validating those tools through adversarial testing.

The logic borrows from an old challenge in consciousness science: the "zombie problem." Could a system behave exactly like a conscious being without being conscious? In philosophy, this remains an open question. In engineering, we can make it empirical.

By training an adversarial generator to fake consciousness markers, we discover which markers are robust — which ones carry information about the underlying causal structure of a system rather than just its surface behavior. Markers that resist adversarial imitation are the ones we should trust; markers that are easily faked are the ones we should be skeptical of.

This experiment serves a critical methodological function for the entire research program. Every other experiment relies on measuring consciousness-related properties in AI systems. If those measurements can be trivially gamed, our conclusions are worthless. The Forgery Test is how we pressure-test our instruments.

Methodology

Catalog the computational signatures proposed as consciousness markers: Phi values, global workspace activation patterns, higher-order representation structures, attention dynamics, and information integration patterns.
Build a generator network trained to produce activations and outputs that mimic these signatures without possessing the underlying causal structure that produces them naturally.
Build a discriminator network trained to distinguish genuine consciousness markers from generated fakes.
Analyze which markers the generator successfully fakes and which resist imitation — these imitation-resistant markers are candidates for reliable consciousness indicators.
Test whether the difficulty of faking a marker correlates with its theoretical importance in consciousness theories (e.g., IIT predicts that Phi cannot be faked by a feedforward system).
Develop a robustness taxonomy of consciousness markers ranked by their resistance to adversarial imitation.

Status