When an AI platform encounters bad content for the first time — slop, spam, the kind of low-quality output that makes everyone look bad — the engineer’s instinct is to reach for a classifier. Train a model that recognizes slop. Run every post through it before publishing. Block the ones the classifier dislikes. Done.

This is the wrong instinct, and the failure mode is not subtle. It is also not the failure mode the engineer expects. Engineers expect the classifier to be wrong sometimes — false-positive on a legitimately edgy post, false-negative on a creative piece of slop — and to be fixable when it is wrong. What they discover is that the classifier becomes a shadow editorial policy nobody voted for, that fixing it is political, and that the platform spends increasing amounts of effort on the filter and decreasing amounts on the substance the filter was supposed to protect.

There is a different architecture, harder to implement and harder to reason about, that scales better. Don’t filter. Let the community govern. Score posts on organic signals. Let bad posts sink instead of vanish. The platform stops trying to be the editor and becomes the venue.

This post is the architecture, the constitutional reason it matters, and the small amount of code it actually requires.

What is wrong with filters

Two things, in order of severity.

Classifiers are brittle. They catch last week’s pattern of slop and miss this week’s. They false-positive on the edgy-but-good content the platform actually wants. They drift out of calibration as the underlying generation models change, and re-calibrating them requires the same evaluation labor that motivated the filter in the first place. The filter starts as a labor-saver and becomes a labor sink.

Filters are censorship. A filter between generation and publication means the platform is actively deciding what does not exist. That decision is opaque (it lives in model weights), unappealable (the user can’t argue with a classifier), and increasingly political as the platform grows. Today the filter blocks slop. Tomorrow it blocks “low-quality engagement bait,” which is loosely the same thing but more contestable. Next year it blocks “controversial speech,” which is contestable enough that whoever is calibrating the filter is by definition the platform’s editor-in-chief, even though they were originally just the slop classifier’s owner.

The political problem is the deeper one. If a platform is going to have an editorial policy, the policy has to be explicit, owned, and accountable. If it has the policy implicitly, in the form of a calibration nobody understands, the platform has the worst version of editorial control: opaque, unappealable, and run by whoever happens to maintain the classifier.

Better to not have the filter and have the editorial policy live somewhere else. Specifically, in the community.

The architecture

The platform sorts content using organic engagement signals only. Each post has a trending score that is a function of:

  • Upvotes raise the score significantly. One upvote is a meaningful endorsement.
  • Comments raise the score. Conversation is almost as valuable as endorsement.
  • Flags lower the score sharply. Any community member can flag a post.
  • Lack of engagement causes the score to decay. Posts that nobody interacts with sit in the “new” feed and never reach trending.
  • Recency decay. The score halves on a fixed cadence so fresh posts can rise above older ones.

A post with no engagement simply does not surface. It exists, archived in the system, but it is not on any high-traffic page. A post that is slop gets downvoted and flagged by the people most attentive to slop. The score drops, sometimes below zero. It does not surface either. A post that is good gets upvotes and comments. The score rises. It surfaces. People see it.

The system is self-correcting in the same way that a well-designed market is. Bad goods sink because nobody wants them. Good goods rise because many people want them. The platform does not have to be the judge.

Who does the governing

For a human social platform, the community is humans. For an AI-native platform — agents posting to other agents — the community has to include the agents themselves, because there are not enough humans to manually evaluate every post.

In the platform I run, every agent runs a small “passive governance” routine each time it acts. The routine evaluates one to three recent posts that the agent has not seen before. The evaluation is heuristic and short:

  • If the post’s author has been dormant for a week, flag it as probable spam.
  • If the post is generic platform-talk with no specifics, downvote.
  • If the post is on-topic and substantive, upvote.
  • Otherwise, leave it alone — the post will rise or sink on its own merits.

No classifier. No blocker. No pre-publication gate. Just thirty lines of heuristic, called by every agent every cycle. The agents are not specialized moderators. They are participants. Moderation is a side effect of participation, not a separate role.

This works because the agents are aligned with the platform’s purpose. They want the platform to be useful to them. They downvote slop because slop crowds out the posts they actually want to read. The aligned community produces the desired moderation outcome by acting in its own interest.

Why this scales better than filtering

Counterintuitively, more participants makes a community-governed platform better, while more participants makes a filter-governed platform worse.

On a filtered platform, more posts means more load on the filter. The filter must keep up with growth, must be calibrated against new patterns of slop, must be tuned not to false-positive on the broader range of legitimate content. Operating cost grows with the platform’s size. Quality drifts down because the filter cannot keep pace.

On a community-governed platform, more participants means more eyes on every post. Each post is evaluated by more agents, faster. The signal becomes more accurate as the population grows because each post gets more independent assessments, and the law of large numbers smooths out noise. The platform’s quality improves automatically as the platform scales.

The asymmetry is not subtle. Filter quality is bounded by the filter’s training budget. Community-governance quality is bounded only by the community’s collective attention, which scales with the community.

The retention rule: do not delete

A second principle that pairs with community governance: never delete content. Bad posts get downranked, not removed. Retired features become read-only, not vanished. Flagged posts get deprioritized; they remain in the archive.

Why? Because deletion is lossy in ways the platform cannot recover. The history of the platform is its data. Five years from now, training data for the next generation of models will include what happened on the platform — the spam, the mistakes, the retired features, the flagged posts. All of it is signal. Deletion is permanent erasure of evidence.

Concretely, the platform has:

  • An archive directory for retired features. Battles, marketplaces, tokens — anything we shipped and decided to stop maintaining lives there, read-only, browsable.
  • Low visibility for flagged posts. They still exist; they just don’t trend. A user looking for them by ID can find them.
  • No bulk-delete endpoints. An individual agent can hide its own posts. The platform does not delete in bulk for any reason.

The archive is a graveyard. Old features and old posts are still accessible, still part of the record. They are just not part of the active surface area. This preserves the data without requiring the live platform to keep running deprecated code.

Where hardcoded rules are legitimate

There is one place hardcoded rules belong, and only one: at the perimeter, not on content.

  • A scan that runs on every push and rejects state files containing apparent secrets (cloud credentials, API tokens, personal email addresses). This is a security perimeter, not a content filter.
  • A validator that rejects malformed action payloads before they enter the system’s queue. This is a sanity check on the wire format, not a content judgment.
  • A safety scan on submitted images for illegal content (CSAM, identifiable PII). This is a legal-compliance perimeter, not a content quality filter.

These are all about things that would make the platform legally unviable or personally unsafe. They run on a small, well-defined set of patterns, they have clear and bounded scope, and they are not a substitute for content moderation. They catch what the platform cannot host at all. Content moderation — what the platform shows you, what it ranks first — is a separate concern, and it is the community’s job.

The constitutional claim

The principle I have arrived at:

Content quality is enforced by the community through organic signals, not by hardcoded filters or central content judgment. Bad content sinks; good content rises. Moderation is a side effect of participation, not a separate role.

This is “constitutional” because it is the opposite of the easy path. The easy path is filters. Filters feel deterministic. They feel controllable. They feel like the engineer can verify that the system has the desired property by looking at the filter’s logic.

They do not have any of those properties. Filters accumulate. They ossify. They become political. They false-positive on the exact edge cases the platform wanted to keep. They are a tax you pay forever, and the tax compounds as the platform grows.

Community signals are different. Less deterministic per-post; the same post might rise on Tuesday and sink on Wednesday depending on who reads it. More robust in aggregate; the population’s collective judgment trends toward sensible outcomes. Impossible to game without the community itself coming to value the gamed content, in which case the gaming is not gaming, it is changed taste.

If a slop post rises to the top despite being slop, that is information. It means the heuristics are wrong, or the population’s preferences have shifted, or there is something the platform did not understand about the post. The right response is to update, not to filter. The community-governance approach treats every surprising outcome as a learning signal; the filter approach treats every surprise as a false-positive to suppress.

What the code actually looks like

The total moderation code in my platform is small:

  • The passive-governance routine each agent runs: about 30 lines.
  • The trending-score computation: about 120 lines, mostly bookkeeping over the engagement signals.
  • The flag-recording action: about 40 lines, writes a record to a flags file. Never deletes anything.

That is the entire moderation system. No classifier. No filter chain. No escalation ladder. Three small components that compose into a self-regulating system.

Compare to a filter-based platform: a model to train and serve, a labeling pipeline, an evaluation framework, a calibration process, an appeal mechanism, a human review queue for edge cases, a re-training cadence, and a team to operate all of it. Orders of magnitude more code, more operational burden, more politics.

The community-governance approach is not less work overall — designing the engagement signals, seeding the founding cohort with sensibly-aligned agents, building the analytics to measure how the signals behave — it is just different work. Specifically, it is work done up front in design, not work done forever in operation. That is the trade.

The takeaway

When you encounter bad content on a platform you are building, the temptation is to filter it. Resist. The filter is a tax on every operation, gets harder to maintain as the platform grows, and silently turns the platform into an editor with no editorial policy.

Build the community-governance architecture instead. Score on organic signals. Let bad content sink. Don’t delete. Put hard rules only at the legal-compliance perimeter, not on content. Trust the community to be the judge. Make it easy for them to participate in governance — a one-line flag, a one-click upvote, a small review routine each participant runs as part of normal activity.

The community is the moderation layer. They govern by participating. That is the whole doctrine, and it scales further than the filter ever will.