Operations & Packaging Guide (Draft)

Doc status: Draft — normative for migration, trust policy, and config precedence. Last updated: 2025-11-02.

This guide covers operational tasks: configuration layering, KB migrations, vector index maintenance, packaging targets, and trust management.

1. Configuration precedence (normative)

Runloop merges configuration from several layers. Highest precedence wins unless a system policy forbids the change.

CLI flags (rlp run --model=…, :budget … inline overrides)
Environment variables (RUNLOOP_*)
User config ~/.runloop/config.yaml
System config /etc/runloop/config.yaml
Built-in defaults

1.1 Policy overlays

System config may define policy.* keys that represent hard limits (e.g., policy.max_tokens, policy.providers.allowlist, policy.confirm_external_actions = true). Lower layers may only tighten these values. Attempts to exceed policy MUST cause the command to fail with a descriptive error.

1.2 Merge semantics

Type	Rule
Scalars	Last writer wins (respecting precedence).
Maps	Deep merge; map entries follow precedence per key.
Lists	Replace entirely (last writer). Exceptions: `models.providers` unions entries before applying allow/deny lists.
Capability sets / allowlists	Intersect with policy first, then apply precedence.

Environment variables mirror YAML paths (upper case, underscores). Examples:

RUNLOOP_MODELS_DEFAULT=local:llama3.1-8b
RUNLOOP_MODELS_BUDGETS_SYSTEM_TOKENS_HARD=750000
RUNLOOP_SECURITY_CONFIRM_EXTERNAL_ACTIONS=true
RUNLOOP_CONFIG=/custom/path/config.yaml

1.5 Runtime socket & discovery (MVP, normative)

Runloop uses a single Unix domain socket for both the bus and control plane.
Default naming and discovery precedence:
1. If runtime.socket_path is non-empty, use it and error immediately if unreachable (no probing).
2. Else if runtime.sockets_dir is set, use ${runtime.sockets_dir}/rmp.sock.
3. Else ~/.runloop/sock/rmp.sock.
4. Else /run/runloop/rmp.sock.

Examples:

runtime:
  socket_path: "/run/runloop/rmp.sock" # overrides discovery; short-circuits probing
  sockets_dir: "/var/run/runloop" # used only when socket_path is empty

The CLI refuses to silently fall back to local execution when the daemon is unavailable. It fails fast with guidance to start the daemon (or re-run with --local).

1.3 Model broker configuration (MVP)

models.broker.providers lists named backends. kind may be local, http (OpenAI-compatible completions), http_openai_chat (OpenAI chat), http_anthropic (Claude /v1/messages), http_ollama (local Ollama), or http_gemini (Google Gemini generateContent). These HTTP kinds accept base_url, secret_id, and optional static headers.
models.broker.route is an ordered array of { pattern, provider, target_model? } entries; the first matching pattern wins. Legacy map syntax like { "*": "local" } (or the legacy key routing) still deserialises into the same shape.
models.broker.cache exposes ttl_ms and capacity for the in-memory LRU. Requests may override TTL via cache_ttl_ms; 0 disables caching for that call.
models.broker.budgets retains default_tokens, per_request_tokens_cap, and hard_cap_usd. Per-request budgets clamp to the stricter of the request and config-provided values.
Provider secret_id values resolve at runtime via the configured secret store; raw API keys should never be stored in YAML.
To use Gemini, add a provider entry with kind: http_gemini, base_url: https://generativelanguage.googleapis.com, and a secret_id such as runloop/models/gemini (the runtime will also look for the environment variable RUNLOOP_MODELS_GEMINI). Agents that invoke Gemini still need model = true in policy.caps; automation agents that shell out (e.g., to manage tmux) also require exec = true plus explicit filesystem whitelists for any touched configs.

1.4 Runtime readiness gate (normative)

Agents only become visible to supervisors after a two-sided readiness handshake: Wasmtime instantiates, the bus mailbox subscribes, tracing context is seeded, and the guest either calls the hostcall runloop::notify_ready() or enters its mailbox_recv loop (fallback for pre-ready binaries).
runtime.spawn_ready_timeout_ms (default 5000 ms) controls how long the runtime waits for that handshake. Per-agent overrides live in AgentSpec::spawn_ready_timeout_ms; environment variable RUNLOOP_SPAWN_READY_TIMEOUT_MS is the lowest-precedence fallback.
When the timeout elapses, callers receive Error::ReadyTimeout, the runtime emits runloop.runtime.spawn.ready_timeouts_total, and it tears down any partially created bus subscriptions/audit state to prevent ghost agents.
Treat notify_ready as part of the minimum agent ABI going forward; older agents that cannot be rebuilt should block on mailbox_recv immediately so the fallback signal still fires.

2. Knowledge Base (POG) operations (normative)

The POG consists of two SQLite files and a derived vector index.

~/.runloop/pog/events.sqlite — append-only ledger (WAL, synchronous=FULL)
~/.runloop/pog/pog.sqlite — materialized views (WAL, synchronous=NORMAL)
~/.runloop/pog/vectors/ — HNSW index files (derived; safe to rebuild)
runloopd runs a background materializer that tails the ledger and updates the views. Progress is tracked in the singleton row pog.sqlite.materializer_state with columns:
- id INTEGER PRIMARY KEY CHECK (id = 1)
- watermark INTEGER NOT NULL

2.1 Migration workflow

rlp kb migrate orchestrates upgrades across both stores.

Ensure runloopd is stopped (command refuses to run if sockets are open; override with --force).
Create timestamped backups of both DBs.
Apply schema migrations to events.sqlite (rare; append-only).
Rebuild pog.sqlite by replaying events (events.sqlite → views). Use --inplace only for emergency SQL patches.
Rebuild vector index using the VectorStore::rebuild path.
Set meta.dirty = 0, record new schema_version, and create a snapshots entry.
Update materializer_state.watermark with the highest applied ledger id.

Supporting commands:

rlp kb verify — referential integrity, hashes, BLAKE3 checks
rlp kb backup — consistent hot backup (uses SQLite backup APIs)
rlp kb vacuum — optional compaction (requires exclusive lock)
rlp kb why <entity> — print ordered source events for a materialized entity key.
Redaction: by default, emails are masked at read time for all interfaces (CLI, hostcalls, backups). Operators can set kb.redaction.allow_unredacted_admin=true to allow privileged reads and should set a deployment-specific kb.redaction.salt. Agents must declare kb_read.contacts_raw to bypass masking; such reads should be audited.

2.2 Metadata tables

Both databases include meta(schema_version TEXT, dirty INTEGER, ts DATETIME). pog.sqlite also tracks the snapshots table with columns:

id INTEGER PRIMARY KEY
ts DATETIME
events_high_watermark INTEGER
comment TEXT

2.3 Retention

Ledger retains all events; corrections produce new StateDelta entries.
Operators can archive older events by copying subsets elsewhere; never delete rows in-place.
Materialized views compact automatically during rebuild; configure retention by emitting StateDelta events that mark artifacts/contacts inactive.

3. Vector index lifecycle (normative)

Implementation milestone 1 uses a pure-Rust HNSW crate (hnsw_rs class). Keyword search uses SQLite FTS5; results fuse via Reciprocal Rank Fusion (RRF).
Embeddings are stored in pog.sqlite (blob column) with metadata. The vector index is derived and can be discarded/rebuilt.
VectorStore trait (conceptual):

#![allow(unused)]
fn main() {
trait VectorStore {
  fn upsert(&self, id: ItemId, embedding: &[f32], meta: &Meta) -> Result<()>;
  fn delete(&self, id: ItemId) -> Result<()>;
  fn search(&self, q: &[f32], k: usize, filter: &MetaFilter) -> Result<Vec<Hit>>;
  fn rebuild(&self, iter: impl Iterator<Item = (ItemId, Embedding, Meta)>) -> Result<()>;
}
}

Provenance filters (confirmed_only, agent_allowlist) run before final scoring.
Future milestone may integrate Tantivy; implementations must conform to the same trait.

4. Packaging targets (informative)

4.1 Debian 13 (`trixie`) packages

Assets live under packaging/systemd/ (systemd unit, tmpfiles definition, default config, README, maintainer scripts). cargo-deb consumes them directly; no in-tree debian/ directory is required.
Build requirements: build-essential, cargo, rustc, pkg-config, libssl-dev, libsqlite3-dev, systemd, and cargo-deb (cargo install cargo-deb). The just deb recipe runs the three builds in sequence:
```
just deb
# equivalent to:
cargo deb -p runloopd
cargo deb -p rlp
cargo deb -p agtop
```
Artifacts land under each crate’s target/debian/ directory (e.g., crates/runloopd/target/debian/runloopd_0.1.0_amd64.deb). Install with sudo apt install crates/<crate>/target/debian/<pkg>_<ver>_<arch>.deb.
runloopd package duties: install /usr/bin/runloopd, systemd service, tmpfiles definition, /etc/runloop/config.yaml (as a conffile), and docs. The maintainer scripts create the runloop system user, chown /var/lib/runloop / /var/log/runloop, call systemd-tmpfiles --create, run systemctl daemon-reload, and enable but do not start runloopd.service on a first-time install so operators can edit config before launching. They should also create /var/lib/runloop/agents and /var/lib/runloop/openings (owned by runloop:runloop) so rlp agent install --root /var/lib/runloop/agents is immediately usable. Upgrades capture whether the daemon was running prior to dpkg stopping it and automatically restart runloopd.service once the new bits are configured, keeping CLI/agent traffic flowing with zero downtime.
The CLI (rlp) and monitor (agtop) ship as independent packages so they can be updated without restarting the daemon; they just depend on ca-certificates plus transitive Rust runtime libraries.
Purging the daemon package (sudo apt purge runloopd) removes /etc/runloop, /var/lib/runloop, /var/log/runloop, and the runloop system user/group; CLI/TUI packages only drop their binaries/docs.

4.2 Additional artifacts

Artifact	Location	Status
Live ISO	`packaging/live-build/`	Folders exist; scripts TBD after `.deb` packaging.
Dev container	`packaging/container/`	README tracks mounts, base image expectations.

5. Trust policy & agent signatures (normative)

Runloop enforces signatures on agent bundles before install/launch.

Status: rlp agent install is available for local bundles and validates manifest digests + tools.json schema, but signature verification and rlp trust update are still landing with the packaging milestone. Edit trust policy files manually per the steps below. rlp agent list shows discovered bundles plus digest status.

Algorithm: Ed25519 detached signature over manifest.toml (canonicalized) and referenced files.
Bundle layout:

agent.bundle/
├─ manifest.toml       # includes digests of contents
├─ policy.caps
├─ tools.json          # optional host tool attachments
├─ agent.wasm
├─ schemas/… (optional)
├─ SBOM/spdx.json (optional)
└─ SIGNATURES/manifest.sig

Tool attachments: When present, tools.json MUST follow the schema in docs/tool-attachments.md; its digest appears in manifest.toml so the signature covers attachment metadata alongside binaries.
Trust policy file: ~/.runloop/trust-policy.toml

[anchors]
runloop_release = "ed25519:ABCD..."
dev = {
  key = "ed25519:DEAD...",
  allow_dev = true
}

[rules]
runloop_release = {
  allow_caps = "any",
  allow_net = "any",
  allow_exec = false
}
dev = {
  allow_caps = ["kb_read", "kb_write"],
  allow_net = [],
  allow_exec = false
}

Lifecycle:
- First-party releases signed with Runloop Release key (private material stored outside repo).
- Third-party vendors sign with their key; operators add the corresponding anchor.
- rlp trust update fetches keysets/CRLs.
- Install flow: rlp agent install bundle.tar → verify digests → (signature verification pending) → enforce trust policy (pending) → stage bundle.
- Launch flow re-verifies manifest + signature as defense in depth.
- If security.allow_unsigned_agents=false, rlp agent install will refuse bundles until signature verification is implemented.
Parameter schemas: agent manifests embed JSON Schemas under [schemas.with] so tooling can validate with payloads before execution. Each schema becomes part of the signed manifest; CLI and daemon consumers load them via the shared agent registry.
Revocation: increment keyset version or publish revocation list; runtime refuses to start bundles signed by revoked keys.

6. Secrets backends (summary)

See docs/security-model.md for secret-store details. Ops tasks:

Status: rlp secrets ... tooling is being wired up; use your platform’s secret store CLI until the native commands ship. Default provider is stub (in-memory, for dev) but it will consult environment variables first to preserve existing env-only setups. Prefer a real backend for anything sensitive.

Planned: rlp secrets init --backend=secret-service|pass|age
Planned: rlp secrets put runloop/mail/smtp_api_key (reads from stdin)
Planned: rlp secrets list and rlp secrets delete for maintenance

7. Observability (summary)

Default logging: JSON (ndjson) with keys ts, level, service.name, trace_id, opening_id, agent_id.
Tracing & metrics via OpenTelemetry OTLP. Configure endpoint, protocol, and sampling under observability in config.
Bus/TUI metrics snapshots: observability.metrics_interval_ms (default 1000, allowed 100–60000) controls how often runloopd publishes CT_METRICS_SNAPSHOT frames to rlp/sys/metrics and rlp/agents/<agent_id>/metrics (TTL = 2× interval, minimum interval+250 ms). System frames include queue depth/capacity, drop counters, and broker/hostcall totals; agent frames include RSS/CPU and mailbox depth, with a final zeroed snapshot on teardown.
Model broker exports runloop_broker_calls_total, runloop_broker_cache_hits_total, and runloop_broker_errors_total{kind=*} counters for dashboards.
agtop pane + rlp trace rely on the metrics exported by agents.
Capability audit volume is gated by security.caps.audit_on_allow and security.caps.audit_on_deny; the latter defaults to true so denied hostcalls land in the KB as cap.audit events.

8. Message bus topics (normative)

Only UI/TUI processes may publish action.decision; the bus rejects other publishers and emits an audit event.
The runtime publishes drop notices (DropNotice) on rlp/sys/drops whenever TTL expiry or duplicate suppression occurs. Operators should scrape this topic for reliability dashboards.

8.0 Control plane

rlp/ctrl carries CT_CTRL_REQ and CT_CTRL_RESP. Submit requests use a 30s TTL; the CLI waits up to 2s for acceptance.
After acceptance, the daemon publishes CT_RUN_EVENT to rlp/runs/<trace_id>/events. This is a live-only stream; historical events are persisted in the KB.

8.1 Bus publisher ACL (configuration)

Configure publisher kinds allowed to emit specific schemas:

bus:
  auth:
    publishers:
      action_decision:
        allowed_kinds: ["ui", "tui"]

Defaults permit only ui and tui. Publishers establish identity at connect time (connect_as). runloopd validates the list at startup; unknown strings or empty entries cause the daemon to fail fast so operators notice misconfigurations.

Appendix A. Repo admin checklist

Branch protection (owner: @release-eng)

Protect main: require PRs, 1+ code owner review, dismiss stale reviews on changes.
Require status checks: build, test, clippy, fmt, docs-check, commitlint.
Require branch to be up to date before merging.
Disallow force-push to main.

Security features (owner: @release-eng)

Enable Dependabot alerts & updates.
Enable secret scanning & push protection.
Enable code scanning (CodeQL or equivalent).

Labels (owner: @pm)

Create: bug, feature, task, docs, infra, security, design, good-first-issue, epic, phase:g.

CI secrets (owner: @release-eng)

CRATES_IO_TOKEN (future), signing keys, release GPG key (optional).

Release gates (owner: @pm, @release-eng)

Tag pattern v0.x.y.
Required checks green.
CHANGELOG updated.
SBOM/signatures attached (when implemented).

Further reading:

Keyboard shortcuts

Runloop OS