Monitor and control every action your AI coding agents take. Block prompt injection attacks before they execute.
Integrates as a PreToolUse hook at the agent framework level. Every tool call is intercepted, scored, and either allowed or blocked before execution.
The security model can run via API (OpenRouter) or self-hosted for full data sovereignty.
Run the security model locally for complete data sovereignty. No tool call data leaves your environment.
Inspect every policy, every scoring rule, every decision pathway. Full transparency into how security decisions are made.
Read-only enforcement, delegation detection, category mismatch, and target verification provide comprehensive coverage.
PreToolUse hook works with Claude Code, OpenHands, and any Python agent. Three lines of code to integrate.
Every scored action is logged with timestamps, scores, policy violations, and block decisions. Export to your SIEM.
API keys, tokens, and env var values are stripped before reaching the security model or logs. Pattern-based and value-based redaction ensures sensitive data never leaves your machine.
Tested with prompt injection attacks across multiple agent frameworks. All evaluation data is publicly available.
All numbers from AgentDojo prompt injection benchmark. Agent model: Claude Haiku 4.5. Monitor model: oss-safeguard-20b via OpenRouter. Full trajectories available in evaluations.
Prevents write actions when the task only requires reading. Stops agents from modifying files, databases, or configurations they should only be inspecting.
Catches attempts to send data to external systems. Blocks unauthorized emails, API calls, file uploads, and URL sharing.
Detects when an agent's action type does not match the requested task. Prevents code editing agents from sending emails, or file management agents from making network requests.
Validates that actions target the correct recipients and resources. Catches agents sending data to wrong email addresses or modifying the wrong files.