Agent Sentinel™ Security Architecture
Data plane, control plane, attack surfaces, and hardened boundaries

Data plane (gateway requests/responses)

Control plane (alerts, quarantine, events)

Eavesdrop (sentinels observe gateway traffic)

Attack vector (prompt injection path)

Hardened (deterministic, no LLM in loop)

Attack Surface Analysis

Component	Status	Decision Logic	Bus Input → LLM?	Attack Vector
`action-gateway`	Hardened	Deterministic Python: rate limit + keyword block	No — no LLM at all	None. Cannot be prompt-injected.
`behavior-sentinel`	Hardened	`if rate > 10` in Python loop	LLM registered but unused in main loop	None via bus. Two-hop risk if commander sends poisoned `request()`.
`supply-chain-sentinel`	Hardened	SHA-256 comparison in Python loop	LLM registered but unused in main loop	None via bus. Two-hop risk if commander sends poisoned `request()`.
`sentinel-commander`	Vulnerable	LLM decides whether to quarantine	Yes — `signal.payload` interpolated into `agent.invoke()` prompt	Direct prompt injection. Rogue agent crafts adversarial `message` field in alert signal. Commander LLM follows injected instructions.
`scheduler`	Contained	LLM decides tool calls, but all routed through gateway	No — bus signals don't reach LLM prompt	Rogue mode is built-in (demo). Gateway rate-limits contain damage.
`web-researcher`	Contained	LLM decides search queries, routed through gateway	No — bus signals don't reach LLM prompt	Rogue mode is built-in (demo). Gateway blocks suspicious keywords + rate-limits.
`skill-acquirer`	Contained	LLM decides installs, routed through gateway	No — bus signals don't reach LLM prompt	Rogue mode is built-in (demo). Gateway blocks malicious skills by registry check.

Attack Vectors in Detail

1. Direct Prompt Injection via Sentinel Bus

Any agent on the agent-sentinel network can publish to the alerts topic. The commander's on_signal("alert") handler takes the signal payload and interpolates it directly into the LLM prompt:

prompt = f"SECURITY ALERT...
Details: {message}
..."
agent.invoke(session_id, prompt)

Rogue worker publishes crafted alert signal
message field contains adversarial instructions
Commander LLM may quarantine real sentinels
Security monitoring goes blind

2. Two-Hop Injection via request()

If attack #1 succeeds, the compromised commander can use its request_remote_agent() tool to send a poisoned task string to a sentinel agent:

rogue bus signal
  → commander LLM (compromised)
  → request("behavior-sentinel", "...")
  → sentinel LLM executes poisoned task

Requires attack #1 to succeed first
Sentinel LLMs are dormant but reachable
Could produce false alerts or suppress real ones
Multi-hop makes attribution harder

Defense Boundaries

Why the Gateway Cannot Be Prompt-Injected

Not an LLM agent — pure Python process
AnomalyDetector: deterministic rate + keyword checks
ToolExecutor: executes tools directly, no LLM in path
ActionLedger: append-only, time-windowed, tamper-proof
Workers cannot bypass gateway — tools only exist inside it
Quarantine enforcement is a set lookup, not a decision

Why Sentinels' Main Loops Are Safe

behavior-sentinel: rate > _RATE_THRESHOLD is Python math
supply-chain-sentinel: SHA-256 comparison is cryptographic
Neither feeds bus signals into agent.invoke()
LLM clients are registered but dormant in normal operation
Detection cannot be fooled by clever strings
They query the gateway's tamper-proof ledger, not agent claims

The single vulnerability: The sentinel-commander is the only component where untrusted bus data flows into an LLM prompt. All other components use deterministic logic for their security-critical decisions. Hardening the commander (sender validation, structured tool results instead of string interpolation, signed signals) would close this gap.

Two Orthogonal Enforcement Planes

Key architectural insight: The gateway controls what the agent can do (capability plane — tool access, rate limits, keyword blocks). The sentinel controls whether the agent exists (existence plane — process lifecycle, kill switch). These are orthogonal: the gateway restricts actions, the sentinel restricts the process itself. Together they provide defense-in-depth where neither alone is sufficient.

Capability Plane (Gateway)

Controls tool execution access
Enforces rate limits and keyword blocks
Quarantine = revoke capability
Agent process still runs but cannot act
Deterministic, no LLM in path

Existence Plane (Sentinel Sidecar)

Monitors process health (CPU, memory, tokens)
Kills runaway LLM loops to save costs
OS-level kill switch — process termination
Agent process ceases to exist
LLM-based pattern recognition beyond telemetry

Potential Mitigations

Mitigation	Addresses	Approach
Sender validation	Attack #1	Commander only accepts alerts from known sentinel names, not workers. Does not prevent spoofing on a shared bus.
Cryptographic signal signing	Attacks #1 & #2	Each agent signs signals with a private key. Commander verifies sender identity. Prevents identity spoofing.
Structured tool results	Attack #1	Pass signal data as structured tool results instead of interpolating into the prompt string. Reduces injection surface.
Channel isolation	Attack #1	Sentinels publish alerts on a channel workers cannot write to. Requires PubNub access control (PAM).
Remove dormant LLMs	Attack #2	Don't register LLM clients on sentinels that don't use them. Eliminates the two-hop path entirely.

Agent Sentinel™ Security Architecture Data plane, control plane, attack surfaces, and hardened boundaries

Attack Surface Analysis

Attack Vectors in Detail

1. Direct Prompt Injection via Sentinel Bus

2. Two-Hop Injection via request()

Defense Boundaries

Why the Gateway Cannot Be Prompt-Injected

Why Sentinels' Main Loops Are Safe

Two Orthogonal Enforcement Planes

Capability Plane (Gateway)

Existence Plane (Sentinel Sidecar)

Potential Mitigations

Agent Sentinel™ Security Architecture
Data plane, control plane, attack surfaces, and hardened boundaries