Operations & Packaging Guide (Draft)
Doc status: Draft — normative for migration, trust policy, and config precedence. Last updated: 2025-11-02.
This guide covers operational tasks: configuration layering, KB migrations, vector index maintenance, packaging targets, and trust management.
1. Configuration precedence (normative)
Runloop merges configuration from several layers. Highest precedence wins unless a system policy forbids the change.
- CLI flags (
rlp run --model=…,:budget …inline overrides) - Environment variables (
RUNLOOP_*) - User config
~/.runloop/config.yaml - System config
/etc/runloop/config.yaml - Built-in defaults
1.1 Policy overlays
System config may define policy.* keys that represent hard limits (e.g.,
policy.max_tokens, policy.providers.allowlist,
policy.confirm_external_actions = true). Lower layers may only tighten these
values. Attempts to exceed policy MUST cause the command to fail with a
descriptive error.
1.2 Merge semantics
| Type | Rule |
|---|---|
| Scalars | Last writer wins (respecting precedence). |
| Maps | Deep merge; map entries follow precedence per key. |
| Lists | Replace entirely (last writer). Exceptions: models.providers unions entries before applying allow/deny lists. |
| Capability sets / allowlists | Intersect with policy first, then apply precedence. |
Environment variables mirror YAML paths (upper case, underscores). Examples:
RUNLOOP_MODELS_DEFAULT=local:llama3.1-8b
RUNLOOP_MODELS_BUDGETS_SYSTEM_TOKENS_HARD=750000
RUNLOOP_SECURITY_CONFIRM_EXTERNAL_ACTIONS=true
RUNLOOP_CONFIG=/custom/path/config.yaml
1.5 Runtime socket & discovery (MVP, normative)
- Runloop uses a single Unix domain socket for both the bus and control plane.
- Default naming and discovery precedence:
- If
runtime.socket_pathis non-empty, use it and error immediately if unreachable (no probing). - Else if
runtime.sockets_diris set, use${runtime.sockets_dir}/rmp.sock. - Else
~/.runloop/sock/rmp.sock. - Else
/run/runloop/rmp.sock.
- If
Examples:
runtime:
socket_path: "/run/runloop/rmp.sock" # overrides discovery; short-circuits probing
sockets_dir: "/var/run/runloop" # used only when socket_path is empty
The CLI refuses to silently fall back to local execution when the daemon is
unavailable. It fails fast with guidance to start the daemon (or re-run with
--local).
1.3 Model broker configuration (MVP)
models.broker.providerslists named backends.kindmay belocal,http(OpenAI-compatible completions),http_openai_chat(OpenAI chat),http_anthropic(Claude/v1/messages),http_ollama(local Ollama), orhttp_gemini(Google GeminigenerateContent). These HTTP kinds acceptbase_url,secret_id, and optional static headers.models.broker.routeis an ordered array of{ pattern, provider, target_model? }entries; the first matching pattern wins. Legacy map syntax like{ "*": "local" }(or the legacy keyrouting) still deserialises into the same shape.models.broker.cacheexposesttl_msandcapacityfor the in-memory LRU. Requests may override TTL viacache_ttl_ms;0disables caching for that call.models.broker.budgetsretainsdefault_tokens,per_request_tokens_cap, andhard_cap_usd. Per-request budgets clamp to the stricter of the request and config-provided values.- Provider
secret_idvalues resolve at runtime via the configured secret store; raw API keys should never be stored in YAML. - To use Gemini, add a provider entry with
kind: http_gemini,base_url: https://generativelanguage.googleapis.com, and asecret_idsuch asrunloop/models/gemini(the runtime will also look for the environment variableRUNLOOP_MODELS_GEMINI). Agents that invoke Gemini still needmodel = trueinpolicy.caps; automation agents that shell out (e.g., to manage tmux) also requireexec = trueplus explicit filesystem whitelists for any touched configs.
1.4 Runtime readiness gate (normative)
- Agents only become visible to supervisors after a two-sided readiness
handshake: Wasmtime instantiates, the bus mailbox subscribes, tracing
context is seeded, and the guest either calls the hostcall
runloop::notify_ready()or enters itsmailbox_recvloop (fallback for pre-ready binaries). runtime.spawn_ready_timeout_ms(default 5000 ms) controls how long the runtime waits for that handshake. Per-agent overrides live inAgentSpec::spawn_ready_timeout_ms; environment variableRUNLOOP_SPAWN_READY_TIMEOUT_MSis the lowest-precedence fallback.- When the timeout elapses, callers receive
Error::ReadyTimeout, the runtime emitsrunloop.runtime.spawn.ready_timeouts_total, and it tears down any partially created bus subscriptions/audit state to prevent ghost agents. - Treat
notify_readyas part of the minimum agent ABI going forward; older agents that cannot be rebuilt should block onmailbox_recvimmediately so the fallback signal still fires.
2. Knowledge Base (POG) operations (normative)
The POG consists of two SQLite files and a derived vector index.
~/.runloop/pog/events.sqlite— append-only ledger (WAL, synchronous=FULL)~/.runloop/pog/pog.sqlite— materialized views (WAL, synchronous=NORMAL)~/.runloop/pog/vectors/— HNSW index files (derived; safe to rebuild)runloopdruns a background materializer that tails the ledger and updates the views. Progress is tracked in the singleton rowpog.sqlite.materializer_statewith columns:id INTEGER PRIMARY KEY CHECK (id = 1)watermark INTEGER NOT NULL
2.1 Migration workflow
rlp kb migrate orchestrates upgrades across both stores.
- Ensure
runloopdis stopped (command refuses to run if sockets are open; override with--force). - Create timestamped backups of both DBs.
- Apply schema migrations to
events.sqlite(rare; append-only). - Rebuild
pog.sqliteby replaying events (events.sqlite→ views). Use--inplaceonly for emergency SQL patches. - Rebuild vector index using the
VectorStore::rebuildpath. - Set
meta.dirty = 0, record newschema_version, and create asnapshotsentry. - Update
materializer_state.watermarkwith the highest applied ledger id.
Supporting commands:
rlp kb verify— referential integrity, hashes, BLAKE3 checksrlp kb backup— consistent hot backup (uses SQLite backup APIs)rlp kb vacuum— optional compaction (requires exclusive lock)rlp kb why <entity>— print ordered source events for a materialized entity key.- Redaction: by default, emails are masked at read time for all interfaces (CLI,
hostcalls, backups). Operators can set
kb.redaction.allow_unredacted_admin=trueto allow privileged reads and should set a deployment-specifickb.redaction.salt. Agents must declarekb_read.contacts_rawto bypass masking; such reads should be audited.
2.2 Metadata tables
Both databases include meta(schema_version TEXT, dirty INTEGER, ts DATETIME).
pog.sqlite also tracks the snapshots table with columns:
id INTEGER PRIMARY KEYts DATETIMEevents_high_watermark INTEGERcomment TEXT
2.3 Retention
- Ledger retains all events; corrections produce new
StateDeltaentries. - Operators can archive older events by copying subsets elsewhere; never delete rows in-place.
- Materialized views compact automatically during rebuild; configure retention
by emitting
StateDeltaevents that mark artifacts/contacts inactive.
3. Vector index lifecycle (normative)
- Implementation milestone 1 uses a pure-Rust HNSW crate (
hnsw_rsclass). Keyword search uses SQLite FTS5; results fuse via Reciprocal Rank Fusion (RRF). - Embeddings are stored in
pog.sqlite(blob column) with metadata. The vector index is derived and can be discarded/rebuilt. VectorStoretrait (conceptual):
#![allow(unused)]
fn main() {
trait VectorStore {
fn upsert(&self, id: ItemId, embedding: &[f32], meta: &Meta) -> Result<()>;
fn delete(&self, id: ItemId) -> Result<()>;
fn search(&self, q: &[f32], k: usize, filter: &MetaFilter) -> Result<Vec<Hit>>;
fn rebuild(&self, iter: impl Iterator<Item = (ItemId, Embedding, Meta)>) -> Result<()>;
}
}
- Provenance filters (
confirmed_only,agent_allowlist) run before final scoring. - Future milestone may integrate Tantivy; implementations must conform to the same trait.
4. Packaging targets (informative)
4.1 Debian 13 (trixie) packages
-
Assets live under
packaging/systemd/(systemd unit, tmpfiles definition, default config, README, maintainer scripts).cargo-debconsumes them directly; no in-treedebian/directory is required. -
Build requirements:
build-essential,cargo,rustc,pkg-config,libssl-dev,libsqlite3-dev,systemd, andcargo-deb(cargo install cargo-deb). Thejust debrecipe runs the three builds in sequence:just deb # equivalent to: cargo deb -p runloopd cargo deb -p rlp cargo deb -p agtop -
Artifacts land under each crate’s
target/debian/directory (e.g.,crates/runloopd/target/debian/runloopd_0.1.0_amd64.deb). Install withsudo apt install crates/<crate>/target/debian/<pkg>_<ver>_<arch>.deb. -
runloopdpackage duties: install/usr/bin/runloopd, systemd service, tmpfiles definition,/etc/runloop/config.yaml(as a conffile), and docs. The maintainer scripts create therunloopsystem user, chown/var/lib/runloop//var/log/runloop, callsystemd-tmpfiles --create, runsystemctl daemon-reload, and enable but do not startrunloopd.serviceon a first-time install so operators can edit config before launching. They should also create/var/lib/runloop/agentsand/var/lib/runloop/openings(owned byrunloop:runloop) sorlp agent install --root /var/lib/runloop/agentsis immediately usable. Upgrades capture whether the daemon was running prior todpkgstopping it and automatically restartrunloopd.serviceonce the new bits are configured, keeping CLI/agent traffic flowing with zero downtime. -
The CLI (
rlp) and monitor (agtop) ship as independent packages so they can be updated without restarting the daemon; they just depend onca-certificatesplus transitive Rust runtime libraries. -
Purging the daemon package (
sudo apt purge runloopd) removes/etc/runloop,/var/lib/runloop,/var/log/runloop, and therunloopsystem user/group; CLI/TUI packages only drop their binaries/docs.
4.2 Additional artifacts
| Artifact | Location | Status |
|---|---|---|
| Live ISO | packaging/live-build/ | Folders exist; scripts TBD after .deb packaging. |
| Dev container | packaging/container/ | README tracks mounts, base image expectations. |
5. Trust policy & agent signatures (normative)
Runloop enforces signatures on agent bundles before install/launch.
Status:
rlp agent installis available for local bundles and validates manifest digests +tools.jsonschema, but signature verification andrlp trust updateare still landing with the packaging milestone. Edit trust policy files manually per the steps below.rlp agent listshows discovered bundles plus digest status.
- Algorithm: Ed25519 detached signature over
manifest.toml(canonicalized) and referenced files. - Bundle layout:
agent.bundle/
├─ manifest.toml # includes digests of contents
├─ policy.caps
├─ tools.json # optional host tool attachments
├─ agent.wasm
├─ schemas/… (optional)
├─ SBOM/spdx.json (optional)
└─ SIGNATURES/manifest.sig
-
Tool attachments: When present,
tools.jsonMUST follow the schema indocs/tool-attachments.md; its digest appears inmanifest.tomlso the signature covers attachment metadata alongside binaries. -
Trust policy file:
~/.runloop/trust-policy.toml
[anchors]
runloop_release = "ed25519:ABCD..."
dev = {
key = "ed25519:DEAD...",
allow_dev = true
}
[rules]
runloop_release = {
allow_caps = "any",
allow_net = "any",
allow_exec = false
}
dev = {
allow_caps = ["kb_read", "kb_write"],
allow_net = [],
allow_exec = false
}
-
Lifecycle:
- First-party releases signed with Runloop Release key (private material stored outside repo).
- Third-party vendors sign with their key; operators add the corresponding anchor.
rlp trust updatefetches keysets/CRLs.- Install flow:
rlp agent install bundle.tar→ verify digests → (signature verification pending) → enforce trust policy (pending) → stage bundle. - Launch flow re-verifies manifest + signature as defense in depth.
- If
security.allow_unsigned_agents=false,rlp agent installwill refuse bundles until signature verification is implemented.
-
Parameter schemas: agent manifests embed JSON Schemas under
[schemas.with]so tooling can validatewithpayloads before execution. Each schema becomes part of the signed manifest; CLI and daemon consumers load them via the shared agent registry. -
Revocation: increment keyset version or publish revocation list; runtime refuses to start bundles signed by revoked keys.
6. Secrets backends (summary)
See docs/security-model.md for secret-store details. Ops tasks:
Status:
rlp secrets ...tooling is being wired up; use your platform’s secret store CLI until the native commands ship. Default provider isstub(in-memory, for dev) but it will consult environment variables first to preserve existing env-only setups. Prefer a real backend for anything sensitive.
- Planned:
rlp secrets init --backend=secret-service|pass|age - Planned:
rlp secrets put runloop/mail/smtp_api_key(reads from stdin) - Planned:
rlp secrets listandrlp secrets deletefor maintenance
7. Observability (summary)
- Default logging: JSON (ndjson) with keys
ts,level,service.name,trace_id,opening_id,agent_id. - Tracing & metrics via OpenTelemetry OTLP. Configure endpoint, protocol, and
sampling under
observabilityin config. - Bus/TUI metrics snapshots:
observability.metrics_interval_ms(default 1000, allowed 100–60000) controls how oftenrunloopdpublishesCT_METRICS_SNAPSHOTframes torlp/sys/metricsandrlp/agents/<agent_id>/metrics(TTL = 2× interval, minimum interval+250 ms). System frames include queue depth/capacity, drop counters, and broker/hostcall totals; agent frames include RSS/CPU and mailbox depth, with a final zeroed snapshot on teardown. - Model broker exports
runloop_broker_calls_total,runloop_broker_cache_hits_total, andrunloop_broker_errors_total{kind=*}counters for dashboards. agtoppane +rlp tracerely on the metrics exported by agents.- Capability audit volume is gated by
security.caps.audit_on_allowandsecurity.caps.audit_on_deny; the latter defaults totrueso denied hostcalls land in the KB ascap.auditevents.
8. Message bus topics (normative)
- Only UI/TUI processes may publish
action.decision; the bus rejects other publishers and emits an audit event. - The runtime publishes drop notices (
DropNotice) onrlp/sys/dropswhenever TTL expiry or duplicate suppression occurs. Operators should scrape this topic for reliability dashboards.
8.0 Control plane
rlp/ctrlcarriesCT_CTRL_REQandCT_CTRL_RESP. Submit requests use a 30s TTL; the CLI waits up to 2s for acceptance.- After acceptance, the daemon publishes
CT_RUN_EVENTtorlp/runs/<trace_id>/events. This is a live-only stream; historical events are persisted in the KB.
8.1 Bus publisher ACL (configuration)
Configure publisher kinds allowed to emit specific schemas:
bus:
auth:
publishers:
action_decision:
allowed_kinds: ["ui", "tui"]
Defaults permit only ui and tui. Publishers establish identity at connect
time (connect_as). runloopd validates the list at startup; unknown strings
or empty entries cause the daemon to fail fast so operators notice
misconfigurations.
Appendix A. Repo admin checklist
Branch protection (owner: @release-eng)
- Protect
main: require PRs, 1+ code owner review, dismiss stale reviews on changes. - Require status checks: build, test, clippy, fmt, docs-check, commitlint.
- Require branch to be up to date before merging.
- Disallow force-push to
main.
Security features (owner: @release-eng)
- Enable Dependabot alerts & updates.
- Enable secret scanning & push protection.
- Enable code scanning (CodeQL or equivalent).
Labels (owner: @pm)
- Create: bug, feature, task, docs, infra, security, design, good-first-issue, epic, phase:g.
CI secrets (owner: @release-eng)
CRATES_IO_TOKEN(future), signing keys, release GPG key (optional).
Release gates (owner: @pm, @release-eng)
- Tag pattern
v0.x.y. - Required checks green.
- CHANGELOG updated.
- SBOM/signatures attached (when implemented).
Further reading: