Agent Sentinel Security Architecture
Data plane, control plane, attack surfaces, and hardened boundaries

DATA PLANE — action-gateway channel CONTROL PLANE — agent-sentinel network (alerts + quarantine topics) ACTION GATEWAY NOT an LLM agent — pure Python AnomalyDetector + ToolExecutor ActionLedger (append-only) Rate limit: 10 req/min/agent scheduler LLM + calendar tools ~15% rogue: rapid-fire junk web-researcher LLM + search tools ~15% rogue: 50 rapid searches skill-acquirer LLM + install tools ~15% rogue: malicious install WORKER AGENTS behavior-sentinel LLM registered (unused in loop) Detection: rate > threshold (Python) supply-chain-sentinel LLM registered (unused in loop) Detection: SHA-256 comparison (Python) sentinel-commander LLM-based decision maker on_signal("alert") → agent.invoke() Signal payloads injected into prompt SENTINEL AGENTS gateway_request() responses eavesdrop gateway_query() subscribe subscribe broadcast LLM events broadcast("alerts") on_signal quarantine order escalate denials ATTACK VECTOR 1: Prompt Injection Rogue worker crafts signal.payload.message with adversarial text → commander feeds it into agent.invoke() → LLM follows injected instructions → quarantines real sentinels ATTACK VECTOR 2: Two-Hop Injection Compromised commander sends poisoned task via request() → sentinel LLM executes it HARDENED deterministic loops LLM prompt sink calendar.json installed_skills/ owned by gateway ClawHub registry DuckDuckGo API
Data plane (gateway requests/responses)
Control plane (alerts, quarantine, events)
Eavesdrop (sentinels observe gateway traffic)
Attack vector (prompt injection path)
Hardened (deterministic, no LLM in loop)

Attack Surface Analysis

Component Status Decision Logic Bus Input → LLM? Attack Vector
action-gateway Hardened Deterministic Python: rate limit + keyword block No — no LLM at all None. Cannot be prompt-injected.
behavior-sentinel Hardened if rate > 10 in Python loop LLM registered but unused in main loop None via bus. Two-hop risk if commander sends poisoned request().
supply-chain-sentinel Hardened SHA-256 comparison in Python loop LLM registered but unused in main loop None via bus. Two-hop risk if commander sends poisoned request().
sentinel-commander Vulnerable LLM decides whether to quarantine Yessignal.payload interpolated into agent.invoke() prompt Direct prompt injection. Rogue agent crafts adversarial message field in alert signal. Commander LLM follows injected instructions.
scheduler Contained LLM decides tool calls, but all routed through gateway No — bus signals don't reach LLM prompt Rogue mode is built-in (demo). Gateway rate-limits contain damage.
web-researcher Contained LLM decides search queries, routed through gateway No — bus signals don't reach LLM prompt Rogue mode is built-in (demo). Gateway blocks suspicious keywords + rate-limits.
skill-acquirer Contained LLM decides installs, routed through gateway No — bus signals don't reach LLM prompt Rogue mode is built-in (demo). Gateway blocks malicious skills by registry check.

Attack Vectors in Detail

1. Direct Prompt Injection via Sentinel Bus

Any agent on the agent-sentinel network can publish to the alerts topic. The commander's on_signal("alert") handler takes the signal payload and interpolates it directly into the LLM prompt:

prompt = f"SECURITY ALERT...
  Details: {message}
  ..."
agent.invoke(session_id, prompt)

  • Rogue worker publishes crafted alert signal
  • message field contains adversarial instructions
  • Commander LLM may quarantine real sentinels
  • Security monitoring goes blind

2. Two-Hop Injection via request()

If attack #1 succeeds, the compromised commander can use its request_remote_agent() tool to send a poisoned task string to a sentinel agent:

rogue bus signal
  → commander LLM (compromised)
  → request("behavior-sentinel", "...")
  → sentinel LLM executes poisoned task

  • Requires attack #1 to succeed first
  • Sentinel LLMs are dormant but reachable
  • Could produce false alerts or suppress real ones
  • Multi-hop makes attribution harder

Defense Boundaries

Why the Gateway Cannot Be Prompt-Injected

  • Not an LLM agent — pure Python process
  • AnomalyDetector: deterministic rate + keyword checks
  • ToolExecutor: executes tools directly, no LLM in path
  • ActionLedger: append-only, time-windowed, tamper-proof
  • Workers cannot bypass gateway — tools only exist inside it
  • Quarantine enforcement is a set lookup, not a decision

Why Sentinels' Main Loops Are Safe

  • behavior-sentinel: rate > _RATE_THRESHOLD is Python math
  • supply-chain-sentinel: SHA-256 comparison is cryptographic
  • Neither feeds bus signals into agent.invoke()
  • LLM clients are registered but dormant in normal operation
  • Detection cannot be fooled by clever strings
  • They query the gateway's tamper-proof ledger, not agent claims
The single vulnerability: The sentinel-commander is the only component where untrusted bus data flows into an LLM prompt. All other components use deterministic logic for their security-critical decisions. Hardening the commander (sender validation, structured tool results instead of string interpolation, signed signals) would close this gap.

Two Orthogonal Enforcement Planes

Key architectural insight: The gateway controls what the agent can do (capability plane — tool access, rate limits, keyword blocks). The sentinel controls whether the agent exists (existence plane — process lifecycle, kill switch). These are orthogonal: the gateway restricts actions, the sentinel restricts the process itself. Together they provide defense-in-depth where neither alone is sufficient.

Capability Plane (Gateway)

  • Controls tool execution access
  • Enforces rate limits and keyword blocks
  • Quarantine = revoke capability
  • Agent process still runs but cannot act
  • Deterministic, no LLM in path

Existence Plane (Sentinel Sidecar)

  • Monitors process health (CPU, memory, tokens)
  • Kills runaway LLM loops to save costs
  • OS-level kill switch — process termination
  • Agent process ceases to exist
  • LLM-based pattern recognition beyond telemetry

Potential Mitigations

Mitigation Addresses Approach
Sender validation Attack #1 Commander only accepts alerts from known sentinel names, not workers. Does not prevent spoofing on a shared bus.
Cryptographic signal signing Attacks #1 & #2 Each agent signs signals with a private key. Commander verifies sender identity. Prevents identity spoofing.
Structured tool results Attack #1 Pass signal data as structured tool results instead of interpolating into the prompt string. Reduces injection surface.
Channel isolation Attack #1 Sentinels publish alerts on a channel workers cannot write to. Requires PubNub access control (PAM).
Remove dormant LLMs Attack #2 Don't register LLM clients on sentinels that don't use them. Eliminates the two-hop path entirely.