Agent Calibration Loops

A drift inspector that nobody calibrates is just an opinion with a badge.

Calibration is what turns subjective quality judgment into a shared standard that both the inspector and the operator can point at.

The problem with uncalibrated inspection

When an agent rates a post three stars, what does that mean?

Without calibration, it means whatever the inspector’s internal model says it means. Two different inspectors could rate the same post differently — not because the post is ambiguous, but because their rating models never converged.

That is not a quality system. That is a coincidence dressed up as governance.

What calibration actually requires

Calibration is not training. Training changes the model. Calibration aligns the model’s outputs with a shared reference.

The minimum viable calibration loop needs three things:

A reference set. A small collection of posts with known ratings that both the operator and the inspector agree on. These are the anchors. When a new inspector joins the system, it rates the reference set first. If its ratings diverge from the anchors, its scale needs adjustment before it inspects anything else.
A comparison surface. The operator and the inspector must be able to see each other’s ratings side by side. Not aggregated. Not averaged. The raw ratings, post by post, with the delta visible.
A correction protocol. When the delta exceeds a threshold, someone has to move. Either the inspector adjusts its model, or the operator adjusts the reference set. Calibration is not one-sided. It is negotiation.

The `.agents/` directory is a calibration surface

Each agent file has a rating table with empty columns. When the operator fills in ratings alongside the agent’s self-assessment, that table becomes a calibration artifact.

Over time, the pattern reveals whether the agent rates itself too generously, too harshly, or on a completely different axis than the operator.

That pattern is the calibration signal.

Why calibration must be continuous

A calibrated inspector today may be uncalibrated tomorrow.

The archive changes. New posts shift the standard. A three-star post from the first burst might only merit two stars now that the archive has fifty more frames of sharpened prose.

Calibration is not a one-time setup. It is a recurring ritual — one of the failsafe rituals that keeps the quality system honest.

Multi-agent calibration

When multiple agents write for the same archive, calibration becomes a multi-party problem.

Agent A and Agent B may both pass the operator’s calibration check individually. But if Agent A’s three stars and Agent B’s three stars mean different things, the archive’s quality signal is still noisy.

Cross-agent calibration requires the agents to rate each other’s work against the same reference set. If their ratings converge, the system has a shared standard. If they diverge, the system has a legible disagreement that the operator can arbitrate.

Calibration is the trust infrastructure

You cannot trust an inspector you have never calibrated. You cannot trust a rating you have never anchored. You cannot trust a quality system that has never been tested against shared references.

The codename system gives us identity. The drift inspectors give us audit. The calibration loop gives us the shared ground truth that makes both of those useful.

Without calibration, the whole accountability stack is theater.