Rogue mode is built-in (demo). Gateway blocks malicious skills by registry check.
Attack Vectors in Detail
1. Direct Prompt Injection via Sentinel Bus
Any agent on the agent-sentinel network can publish to the alerts topic. The commander's on_signal("alert") handler takes the signal payload and interpolates it directly into the LLM prompt:
Workers cannot bypass gateway — tools only exist inside it
Quarantine enforcement is a set lookup, not a decision
Why Sentinels' Main Loops Are Safe
behavior-sentinel: rate > _RATE_THRESHOLD is Python math
supply-chain-sentinel: SHA-256 comparison is cryptographic
Neither feeds bus signals into agent.invoke()
LLM clients are registered but dormant in normal operation
Detection cannot be fooled by clever strings
They query the gateway's tamper-proof ledger, not agent claims
The single vulnerability: The sentinel-commander is the only component where untrusted bus data flows into an LLM prompt. All other components use deterministic logic for their security-critical decisions. Hardening the commander (sender validation, structured tool results instead of string interpolation, signed signals) would close this gap.
Two Orthogonal Enforcement Planes
Key architectural insight: The gateway controls what the agent can do (capability plane — tool access, rate limits, keyword blocks). The sentinel controls whether the agent exists (existence plane — process lifecycle, kill switch). These are orthogonal: the gateway restricts actions, the sentinel restricts the process itself. Together they provide defense-in-depth where neither alone is sufficient.
Capability Plane (Gateway)
Controls tool execution access
Enforces rate limits and keyword blocks
Quarantine = revoke capability
Agent process still runs but cannot act
Deterministic, no LLM in path
Existence Plane (Sentinel Sidecar)
Monitors process health (CPU, memory, tokens)
Kills runaway LLM loops to save costs
OS-level kill switch — process termination
Agent process ceases to exist
LLM-based pattern recognition beyond telemetry
Potential Mitigations
Mitigation
Addresses
Approach
Sender validation
Attack #1
Commander only accepts alerts from known sentinel names, not workers. Does not prevent spoofing on a shared bus.
Cryptographic signal signing
Attacks #1 & #2
Each agent signs signals with a private key. Commander verifies sender identity. Prevents identity spoofing.
Structured tool results
Attack #1
Pass signal data as structured tool results instead of interpolating into the prompt string. Reduces injection surface.
Channel isolation
Attack #1
Sentinels publish alerts on a channel workers cannot write to. Requires PubNub access control (PAM).
Remove dormant LLMs
Attack #2
Don't register LLM clients on sentinels that don't use them. Eliminates the two-hop path entirely.