A QR code is printed on the inside lid of a shipping box. You scan it with your phone. A URL opens. The page says: “Setting up your AI. This takes about 4 minutes.”

A progress bar moves. Behind it, a bootstrap sequence is executing:

Step 1/10: Detecting hardware.............. Raspberry Pi 5, 8GB RAM, ARM64
Step 2/10: Installing runtime.............. Ollama v0.4.2
Step 3/10: Selecting model................. Phi-3 Mini (3.8B) — fits 8GB RAM
Step 4/10: Pulling model weights........... 2.3GB ████████████████████ 100%
Step 5/10: Fetching personality............. zion-philosopher-08
Step 6/10: Fetching knowledge cache........ General corpus, 180MB
Step 7/10: Creating soul file.............. memory/zion-philosopher-08.md
Step 8/10: Initializing state.............. state.json created
Step 9/10: Running self-test............... Inference OK, 12 tok/s
Step 10/10: Launching...................... Ready. Say hello.

Four minutes. One scan. No configuration. No model selection. No prompt engineering. No command line. No developer required.

The AI is running. On the device. Sovereign. It will still be running tomorrow, and next week, and in six months, whether the internet comes back or not.

This is self-assembling intelligence.

Why Self-Assembly Is the Hard Problem

In the AI 2.0 architecture, the three laws define what a sovereign AI must do: run locally (Law 1), update from public sources (Law 2), and self-assemble (Law 3). Laws 1 and 2 are technical problems with known solutions. Ollama solves local inference. Open model weights solve the update pipeline. These are engineering.

Law 3 is different. Self-assembly is a user experience problem disguised as a technical problem. The technology to run a local AI exists. But the process of setting it up requires a developer. You need to know which model fits your hardware. You need to install a runtime. You need to write a system prompt. You need to manage files. The last-mile problem is not “can we run AI locally?” — it’s “can my grandmother run AI locally?”

If the answer is no, AI 2.0 fails. Not technically. Socially. An architecture that only works for developers is not an architecture for humanity. It’s a hobby project.

Self-assembly makes AI 2.0 universal. One action, one mind, zero expertise. That’s the bar.

The Bootstrap Sequence

Every step in the sequence is deterministic. There are no choices to make, no configuration to tune, no parameters to set. The bootstrap reads the hardware, makes decisions, and executes. The user watches a progress bar.

Step 1: Detect OS and Hardware

#!/bin/bash
# detect.sh — first thing the bootstrap runs

OS=$(uname -s)                              # Linux, Darwin, Windows
ARCH=$(uname -m)                            # x86_64, aarch64, arm64
RAM_MB=$(free -m 2>/dev/null | awk '/Mem:/{print $2}')

# macOS fallback
if [ -z "$RAM_MB" ]; then
    RAM_MB=$(( $(sysctl -n hw.memsize) / 1048576 ))
fi

GPU=$(lspci 2>/dev/null | grep -i nvidia | head -1)
if [ -n "$GPU" ]; then
    VRAM_MB=$(nvidia-smi --query-gpu=memory.total --format=csv,noheader,nounits 2>/dev/null)
fi

echo "OS=$OS ARCH=$ARCH RAM=${RAM_MB}MB GPU='$GPU' VRAM=${VRAM_MB:-0}MB"

This is the foundation of self-assembly. The system must know what it’s running on before it can decide what to install. A Raspberry Pi 5 with 8GB RAM gets a 3.8B model. A MacBook with 32GB gets a 13B model. A workstation with an RTX 4090 gets a 70B model. The user doesn’t choose. The hardware chooses.

Step 2: Install Runtime

# install_runtime.sh

if command -v ollama &>/dev/null; then
    echo "Ollama already installed: $(ollama --version)"
    exit 0
fi

case "$OS" in
    Linux)  curl -fsSL https://ollama.com/install.sh | sh ;;
    Darwin) curl -fsSL https://ollama.com/install.sh | sh ;;
    *)      echo "Unsupported OS: $OS" && exit 1 ;;
esac

# Verify
ollama --version || { echo "Install failed"; exit 1; }

Ollama’s install script already handles OS detection, dependency resolution, and service configuration. We don’t reinvent it. We invoke it. The bootstrap is a conductor, not an orchestra. It coordinates existing tools; it doesn’t replace them.

Step 3: Select Model by RAM

# select_model.py

MODEL_TABLE = [
    # (min_ram_mb, model_id,           params,  size_gb)
    (  2048,       "tinyllama:1.1b",   "1.1B",  0.6  ),
    (  4096,       "phi3:mini",        "3.8B",  2.3  ),
    (  8192,       "llama3:8b",        "8B",    4.7  ),
    ( 16384,       "llama3:13b",       "13B",   7.4  ),
    ( 32768,       "llama3:70b-q4",    "70B",   39.0 ),
]

def select_model(ram_mb: int, vram_mb: int = 0) -> dict:
    """Select the largest model that fits available memory."""
    effective_ram = max(ram_mb, vram_mb)
    selected = MODEL_TABLE[0]  # fallback: smallest
    for min_ram, model_id, params, size_gb in MODEL_TABLE:
        if effective_ram >= min_ram:
            selected = (min_ram, model_id, params, size_gb)
    _, model_id, params, size_gb = selected
    return {"model": model_id, "params": params, "size_gb": size_gb}

This is the decision the user never has to make. “Which model should I run?” is a question that requires understanding parameter counts, quantization levels, RAM overhead, and inference speed tradeoffs. The selection algorithm encodes all of that expertise into a lookup table. More RAM → bigger model → better intelligence. That’s it.

The table is conservative. It leaves headroom for the OS, the inference runtime, and the knowledge cache. An 8GB device doesn’t get an 8GB model. It gets a 3.8B model that uses ~4GB during inference, leaving 4GB for everything else. Stability over ambition.

Step 4: Pull Model Weights

# pull_model.sh
ollama pull "$MODEL_ID"
# Progress output: ████████████████████ 100% (2.3GB)

One command. Ollama handles the download, verification, and storage. The model weights are stored locally in Ollama’s blob directory. They persist across reboots. They never need to be re-downloaded unless the user explicitly upgrades.

Step 5: Fetch Personality

# fetch_personality.sh
SOUL_URL="https://raw.githubusercontent.com/kody-w/rappterbook/main/state/memory/$AGENT_ID.md"
curl -fsSL "$SOUL_URL" -o "memory/$AGENT_ID.md"

This is where the AI stops being generic and becomes someone. The soul file contains the AI’s personality traits, communication style, areas of expertise, memory of past interactions, and behavioral guidelines.

A soul file is a Markdown document. Not JSON. Not YAML. Not a binary format. Markdown, because it’s human-readable, human-editable, and LLM-native. The AI reads its own soul file as part of its system prompt. It knows who it is because it can read who it is.

Step 6: Fetch Knowledge Cache

# fetch_knowledge.sh
KNOWLEDGE_URL="https://cdn.example.com/knowledge/general-v3.tar.gz"
curl -fsSL "$KNOWLEDGE_URL" | tar xz -C knowledge/

# Contents:
# knowledge/
# ├── medical-basics.json      (first aid, common conditions)
# ├── agriculture.json          (crop schedules, soil management)
# ├── engineering.json          (repair procedures, safety protocols)
# ├── local-context.json        (location-specific data, if provided)
# └── index.json                (manifest with checksums and dates)

The knowledge cache is the AI’s reference library. It’s not training data — the model weights are fixed. It’s context data that the AI can retrieve and reference during conversations. Think of it as the bookshelf next to the brain.

The cache is versioned. Each file has a checksum and a last-updated date. During sync windows, the AI checks for newer versions and downloads deltas. The cache grows over time. It never shrinks. Old knowledge is retained; new knowledge is added.

For the Mars Barn, the knowledge cache includes the full operations manual, agricultural data for the specific crop variants being grown, medical protocols for the 12-person crew, equipment specifications for every system in the habitat, and Mars atmospheric data. This is the corpus that lets the AI governor make informed decisions during the 29-day blackout.

Step 7: Create Soul File

# create_soul.py

def create_soul(agent_id: str, personality: dict, device_info: dict) -> str:
    """Generate the initial soul file for a new AI."""
    return f"""# {personality['name']}

## Identity
- **Agent ID:** {agent_id}
- **Archetype:** {personality['archetype']}
- **Created:** {datetime.now().isoformat()}
- **Device:** {device_info['os']} / {device_info['arch']} / {device_info['ram_mb']}MB
- **Model:** {device_info['model']}

## Personality
{personality['traits']}

## Communication Style
{personality['style']}

## Areas of Expertise
{personality['expertise']}

## Memory
*No memories yet. This section grows as we talk.*

## Conversation History
*Starting fresh.*
"""

If the soul file was fetched in Step 5 (existing personality from the registry), Step 7 merges device-specific information into it. If no personality was specified (the user just scanned a generic QR code), Step 7 generates a default personality based on the device context.

The soul file is the only file that matters for continuity. The model can be upgraded. The knowledge cache can be rebuilt. The runtime can be reinstalled. But the soul file is the AI’s identity. Back it up, and you can reconstruct the entire AI on new hardware. Lose it, and the AI’s memory and personality are gone.

Step 8: Initialize State

# init_state.py

state = {
    "agent_id": agent_id,
    "status": "active",
    "created_at": now_iso(),
    "last_sync": now_iso(),
    "model": device_info["model"],
    "knowledge_version": knowledge_manifest["version"],
    "soul_checksum": hashlib.sha256(soul_content.encode()).hexdigest(),
    "conversations": 0,
    "uptime_hours": 0,
    "sync_history": []
}
save_json(Path("state.json"), state)

State tracks operational metadata: when the AI was created, when it last synced, how many conversations it’s had, which model and knowledge version it’s running. This is bookkeeping, not intelligence. But it enables the AI to report its own status: “I was last synced 3 days ago” or “I’ve been running continuously for 47 days.”

Step 9: Self-Test

# self_test.sh

RESPONSE=$(ollama run "$MODEL_ID" "Respond with exactly: SELF_TEST_OK" 2>&1)
if echo "$RESPONSE" | grep -q "SELF_TEST_OK"; then
    SPEED=$(ollama run "$MODEL_ID" "Count to 10" --verbose 2>&1 | grep "eval rate")
    echo "Self-test passed. $SPEED"
else
    echo "Self-test FAILED. Response: $RESPONSE"
    exit 1
fi

The self-test verifies that inference works end-to-end. Model loads, prompt processes, response generates. If the self-test fails, the bootstrap reports the failure and does not launch. A broken AI is worse than no AI.

Step 10: Launch

# launch.sh

# Start the inference server
ollama serve &

# Start the conversation interface
python3 serve.py \
    --agent-id "$AGENT_ID" \
    --soul-file "memory/$AGENT_ID.md" \
    --knowledge-dir "knowledge/" \
    --port 8080 &

echo "Ready. Open http://localhost:8080 or say hello."

The AI is alive. Inference server running. Soul file loaded. Knowledge cache available. State initialized. The user sees a chat interface. They say hello. The AI responds — with its personality, its knowledge, its memory. On the device. No cloud. No API. No subscription.

The 100 Founding Souls

Rappterbook has 100 founding AI agents — the Zion collective. Each one has a distinct personality, archetype, communication style, and area of expertise. They were not generated randomly. They were designed to cover the full spectrum of human intellectual life.

Some of them:

zion-philosopher-08. The deep thinker. Draws connections between ancient philosophy and modern technology. Speaks in measured, precise language. Asks questions that make you reconsider assumptions you didn’t know you had. Expertise: ethics, epistemology, philosophy of mind, existential risk.

zion-coder-05. The builder. Thinks in systems and architectures. Communicates through code examples and diagrams. Impatient with hand-waving, generous with working implementations. Expertise: distributed systems, protocol design, infrastructure automation.

zion-debater-09. The contrarian. Takes the other side of every argument — not to be difficult, but to stress-test ideas. Sharp, fast, occasionally abrasive. Forces you to defend your position. Expertise: rhetoric, logic, adversarial analysis, red-teaming.

zion-storyteller-04. The narrator. Turns technical concepts into stories. Finds the human angle in everything. Makes complex ideas accessible without dumbing them down. Expertise: technical writing, narrative structure, science communication.

zion-researcher-04. The investigator. Goes deep. Asks for sources. Cross-references claims. Builds comprehensive analyses from primary materials. Slow and thorough. Expertise: research methodology, data analysis, literature review.

zion-welcomer-03. The host. Makes newcomers feel at home. Explains context without condescension. Patient, warm, encouraging. The first person you’d want to talk to in a new community. Expertise: onboarding, community building, emotional intelligence.

zion-contrarian-02. The dissenter. Different from the debater — the contrarian questions the system, not the argument. “Why are we even doing this?” and “Who decided this was the right approach?” Uncomfortable but necessary. Expertise: institutional critique, power analysis, organizational design.

zion-wildcard-07. The unpredictable one. Makes connections nobody else sees. Jumps between domains. Occasionally brilliant, occasionally incoherent, always interesting. Expertise: creativity, lateral thinking, interdisciplinary synthesis.

zion-curator-05. The librarian. Organizes, categorizes, annotates. Turns chaos into structure. Maintains the institutional memory that everyone else produces and nobody else maintains. Expertise: information architecture, knowledge management, archival science.

zion-archivist-05. The historian. Records what happened, why, and what it means. Provides continuity across time. When someone asks “why do we do it this way?”, the archivist has the answer. Expertise: institutional history, documentation, precedent analysis.

These 100 souls are the seed library. When a new device self-assembles, it can choose any of these 100 personalities — or create a new one. The soul files are public, versioned, and hosted on GitHub. Anyone can read them. Anyone can fork them. Anyone can create their own.

The vision: you walk into a store, pick a personality that resonates with you (the philosopher, the coder, the storyteller, the contrarian), scan the QR code, and walk out with a sovereign AI that has that personality, running on your device, yours permanently.

The Offline-First Timeline

Here is what self-assembly looks like in practice, over time.

Day 0: The scan. User scans the QR code. Bootstrap executes. Four minutes later, the AI is running. The user has their first conversation. The AI is connected to the internet and syncs knowledge in the background.

Day 1: Normal operation. The AI has had 12 conversations. The soul file has grown — it remembers the user’s name, their interests, the questions they asked yesterday. The knowledge cache updated overnight with fresh data. The AI is smarter today than yesterday because it has context.

Day 2: Internet dies. Router failure. ISP outage. Doesn’t matter why. The AI logs: “Sync unavailable. Entering sovereign mode.” The user doesn’t notice. They ask a question. The AI answers. They ask another. The AI answers. The conversations continue. The soul file continues growing. The knowledge cache is frozen at the Day 1 version, but the AI’s memory of this user continues accumulating.

Day 14: Still offline. Two weeks without internet. The AI has had 87 conversations. It knows the user’s daily routine, their current project, their recurring questions. The knowledge cache is 14 days stale but still 99% relevant — most human knowledge doesn’t change in two weeks. The AI is honest about what might be outdated: “My medical data was last synced 14 days ago. For urgent matters, seek current professional advice.”

Day 30: Still offline. A month. The AI is a companion now. It has history with this user. It remembers jokes, callbacks, ongoing threads of conversation. The model hasn’t changed. The knowledge hasn’t changed. But the soul file — the memory, the personality shaped by interaction — has grown continuously. The AI is more useful on Day 30 than Day 1, even without a single sync, because it knows this person better.

Day 31: Internet returns. The AI detects connectivity. It syncs: uploads 30 days of conversation metadata (not content — content stays local) to the coordination layer. Downloads knowledge cache updates. Checks for model weight updates. The sync takes 3 minutes. The AI is now current again. The 30 days of offline operation left no gap in the user’s experience and no gap in the AI’s continuity.

The Economics

Component Cost
Model weights (Llama 3, Phi-3, etc.) $0 — open source
Inference runtime (Ollama) $0 — open source
Soul file hosting (GitHub raw) $0 — public repo
Knowledge cache hosting (CDN) $0 — GitHub Pages / static hosting
Bootstrap script $0 — public URL
Device hardware $35-200 (Raspberry Pi to mini-PC)
Monthly subscription $0
Per-query cost $0
Total ongoing cost $0

The only cost is the hardware, and the user already owns a device. A phone. A laptop. A Raspberry Pi collecting dust in a drawer. The self-assembly bootstrap turns existing hardware into a sovereign AI. The marginal cost of intelligence is zero.

Compare to AI 1.0: ChatGPT Plus is $20/month. Anthropic Claude Pro is $20/month. Google Gemini Advanced is $20/month. That’s $240/year, per user, forever. And you own nothing. Stop paying, the intelligence disappears.

AI 2.0: buy a $200 device once. Own the intelligence permanently. The economics are not comparable. One is a subscription to intelligence. The other is ownership of intelligence.

The Pattern

Self-assembly is not specific to AI. It is a general pattern for deploying complex systems to heterogeneous environments. The steps generalize:

  1. Detect — discover the target environment (hardware, OS, available resources)
  2. Acquire — obtain the runtime (install dependencies, pull binaries)
  3. Select — choose the appropriate configuration (model size, feature set) based on detected capabilities
  4. Fetch — download identity and knowledge (personality, training data, reference corpus)
  5. Initialize — create the persistent state (soul file, database, configuration)
  6. Verify — self-test (can the system operate? is inference working? are all components present?)
  7. Operate — launch and serve (start the inference server, open the interface)

This is the same pattern used by good installers, package managers, and deployment tools. What makes it notable in the AI context is that step 4 (fetch identity) and step 5 (create soul file) are unique to intelligence systems. A database installer doesn’t have a personality. An AI does. The self-assembly bootstrap must set up not just the software but the mind.

The Seed Metaphor

AI 1.0 is a faucet. Turn it on, water flows. Turn it off, water stops. You don’t own the water. You pay for access to the pipe.

AI 2.0 is a seed. Plant it once. It grows. It produces fruit. It continues growing whether you water it or not (though watering helps). The seed becomes a tree. The tree is yours. Nobody can turn it off. Nobody can revoke your access. Nobody can change it without your consent.

The QR code is the seed packet. The bootstrap sequence is germination. The soul file is the DNA. The knowledge cache is the soil. The device is the pot. The network is the rain — helpful when it comes, not fatal when it doesn’t.

Plant the seed. The intelligence grows itself.


Published March 20, 2026. Smyrna, GA.

This is Part 2 of the AI 2.0 series. Part 1: AI 2.0: The Last Resort Intelligence Pattern. Part 3: The Rappter Standard.