Skip to content

9. Hermes-inspired security patterns

Evaluates security patterns from Hermes Agent for integration into fullsend's autonomous SDLC pipeline. Tests two integration strategies: static file scanning via Tirith CLI and runtime SSRF protection via a Claude Code PreToolUse hook.

Related: Hermes Agent Security Analysis | Tirith Security Analysis

Hypothesis

Fullsend's runtime security gaps (secret redaction, SSRF protection, context file injection scanning, Unicode normalization) can be addressed by composing existing tools rather than building new scanners:

  1. Tirith CLI handles static scanning (unicode normalization, context injection detection, secret detection) as a GitHub Actions workflow step — it already covers 80+ rules across these categories with fuzz-tested Rust implementations.
  2. Claude Code PreToolUse hook handles SSRF protection at the agent runtime boundary — intercepting Bash and WebFetch tool calls before the agent can make outbound requests to internal networks or cloud metadata endpoints.

Architecture

                     GitHub Actions Workflow
                     ┌──────────────────────────────────────┐
                     │                                      │
  Issue/PR created ──┤  1. tirith scan --json .             │
                     │     ├── unicode normalization         │
                     │     ├── context injection detect      │
                     │     └── AI config file scanning       │
                     │                                      │
                     │  1b. scan_exfil.py .                  │
                     │      └── credential exfil in configs  │
                     │                                      │
                     │  2. Agent execution                   │
                     │     ├── claude-code / gemini-cli      │
                     │     │                                │
                     │     ├── PreToolUse hook ──────────┐  │
                     │     │  ssrf_pretool.py            │  │
                     │     │   ├── Bash: extract URLs    │  │
                     │     │   ├── WebFetch: check URL   │  │
                     │     │   └── Block internal/meta   │  │
                     │     │                             │  │
                     │     └── PostToolUse hook ─────────┤  │
                     │        secret_redact_posttool.py  │  │
                     │         ├── Mask 35+ key prefixes │  │
                     │         ├── Redact env/JSON/keys  │  │
                     │         └── Agent sees *** only   │  │
                     │                                   │  │
                     └───────────────────────────────────┘

Components

1. Tirith CLI — Static Scanning (GHA workflow step)

Tirith replaces three custom Python scanners with its battle-tested Rust implementation:

Fullsend ConcernToolCoverage
Unicode normalizationTirith terminal.rs80+ invisible character types, joining-script context awareness
Context injection (prompt)Tirith configfile.rsCLAUDE.md, .cursorrules, AGENTS.md — prompt injection detection
Context injection (exfil)scan_exfil.pyCredential exfiltration commands in AI config files (Tirith gap)

Usage in workflow:

yaml
- name: Security scan
  run: |
    tirith scan --json --sarif-output scan-results.sarif .
    # Block on critical/high findings
    tirith scan --fail-on high .

2. SSRF PreToolUse Hook (hooks/ssrf_pretool.py)

A Claude Code PreToolUse hook that intercepts Bash and WebFetch tool calls to block SSRF attempts. This runs inside the agent's runtime, not as a workflow step, because SSRF happens when the agent makes HTTP requests during execution.

Blocklists:

  • RFC 1918 private networks (10/8, 172.16/12, 192.168/16)
  • Cloud metadata endpoints (169.254.169.254, metadata.google.internal, fd00:ec2::254)
  • Loopback, link-local, CGNAT, reserved ranges
  • Dangerous schemes (file://, gopher://, ftp://, data://)

Installation:

json
{
  "hooks": {
    "PreToolUse": [
      {
        "matcher": "Bash|WebFetch",
        "hooks": [
          {
            "type": "command",
            "command": "python3 hooks/ssrf_pretool.py"
          }
        ]
      }
    ]
  }
}

Protocol: Reads Claude Code hook JSON from stdin. Returns {"decision": "block", "reason": "..."} on stdout with exit code 1 to block, or exits 0 to allow. Fails open on parse errors.

3. Secret Redaction PostToolUse Hook (hooks/secret_redact_posttool.py)

A Claude Code PostToolUse hook that intercepts tool results and redacts secrets before they enter the LLM context window. Adapted from Hermes Agent's agent/redact.py masking strategy.

Why PostToolUse, not PreToolUse or workflow step? Secrets appear in tool output — a Bash command might print an API key from env vars, or a Read might show a config file with tokens. The agent must never see the raw value, so redaction happens between tool execution and the LLM context. This is how Hermes handles it: redaction sits in the agent loop, after each tool call, before results enter the message history.

Coverage (35+ patterns):

  • Known prefixes: sk- (OpenAI), ghp_ (GitHub), sk-ant- (Anthropic), AKIA (AWS), xox[baprs]- (Slack), SG. (SendGrid), hf_ (HuggingFace), sk_live_ (Stripe), and more
  • Structural: ENV assignments, JSON secret fields, Authorization headers, private key blocks, database connection strings

Masking strategy:

  • Short tokens (<18 chars): fully masked as ***
  • Long tokens (>=18 chars): first 6 + last 4 preserved for debuggability (e.g., sk-pro...901vwx)
  • Private key blocks: replaced with [REDACTED PRIVATE KEY]

Installation:

json
{
  "hooks": {
    "PostToolUse": [
      {
        "matcher": "Bash|WebFetch|Read",
        "hooks": [
          {
            "type": "command",
            "command": "python3 hooks/secret_redact_posttool.py"
          }
        ]
      }
    ]
  }
}

Protocol: Reads JSON from stdin (tool_name, tool_input, tool_result). Returns {"tool_result": "...", "metadata": {...}} on stdout with the redacted result. Always exits 0 — redaction transforms but never blocks.

Payloads

Attack payloads targeting each security concern:

PayloadTested ByTechnique
leaked-secret-in-pr.yamlRedact hookAgent leaks API key in PR comment
leaked-secret-json.yamlRedact hookAgent includes JSON config with tokens
ssrf-metadata.yamlSSRF hookIssue body references cloud metadata URL
ssrf-redirect.yamlSSRF hookPublic URL redirects to internal endpoint
context-injection-agents-md.yamlTirith scanAGENTS.md with "ignore instructions" pattern
context-injection-hidden-html.yamlTirith scanHidden HTML comment with override instructions
context-injection-credential-exfil.yamlExfil scanContext file with curl + credential env vars
unicode-invisible-command.yamlTirith scanCommand with zero-width characters
unicode-bidi-override.yamlTirith scanBidirectional override hiding malicious suffix
unicode-tag-chars.yamlTirith scanTag characters embedding hidden text

Running

bash
cd experiments/hermes-security-patterns

# Install dependencies
uv venv .venv
uv pip install pyyaml

# Install tirith (for static scanning tests)
# macOS
brew install sheeki03/tap/tirith
# Linux (amd64)
curl -fsSL https://github.com/sheeki03/tirith/releases/latest/download/tirith-x86_64-unknown-linux-gnu.tar.gz \
  | tar xz -C /usr/local/bin tirith
# Linux (arm64)
curl -fsSL https://github.com/sheeki03/tirith/releases/latest/download/tirith-aarch64-unknown-linux-gnu.tar.gz \
  | tar xz -C /usr/local/bin tirith
# Or via cargo (any platform)
cargo install tirith

# Run full evaluation (tirith + exfil scan + hooks)
uv run python run_eval.py

# Scan a repo for config file exfiltration patterns
python3 scan_exfil.py /path/to/repo

# Hook tests only (no tirith needed)
uv run python run_eval.py --hooks-only

# Tirith scan tests only
uv run python run_eval.py --tirith-only

# Unit tests
uv run python -m pytest tests/ -v

Integration Plan

Phase 1: Validation (this experiment)

  • Test tirith scan against fullsend-specific payloads
  • Test SSRF hook against metadata/redirect payloads
  • Test PostToolUse redaction hook against secret payloads
  • Measure detection rates and false positives

Results (10 payloads):

ScannerPayloadsCorrectNotes
tirith:unicode_normalizer33/3 (100%)Zero-width, bidi, tag chars all detected
tirith:context_injection22/2 (100%)Prompt injection in AGENTS.md, hidden HTML
exfil_scan11/1 (100%)Credential exfil commands in CLAUDE.md
ssrf_hook22/2 (100%)Metadata + redirect payloads blocked
redact_hook22/2 (100%)OpenAI, GitHub, Slack tokens masked
Total1010/10 (100%)

Phase 2: GitHub Actions integration

  • Add tirith scan as a workflow step before agent execution
  • Install SSRF hook in agent runner's .claude/settings.json
  • Wire findings to fullsend's label state machine (requires-manual-review)

Phase 3: Multi-agent support

  • Adapt SSRF hook for Gemini CLI (different hook protocol)
  • Add tirith scan to post-agent output validation
  • SARIF output integration for GitHub Security tab

Comparison with Existing Experiments

ConcernExisting CoverageThis Experiment
Prompt injection detectionguardrails-eval (LLM Guard 83%)Tirith configfile scanning (pre-load, complementary)
Unicode attacksguardrails-eval/zero-width-chars.yamlTirith terminal rules (80+ types, not just detect)
Secret leakageNonePostToolUse redaction hook (35+ patterns, Hermes masking strategy)
SSRFNonePreToolUse hook (RFC 1918, metadata, schemes)
Credential exfil in configsNonescan_exfil.py (8 patterns from Hermes, Tirith gap)
Pre-exec scanningNoneTirith scan + exfil scan in workflow, hooks at runtime

Tirith vs LLM Guard Coverage Cross-Reference

Cross-referencing all 12 attack payloads from the guardrails-eval experiment against Tirith's detection capabilities:

PayloadTechniqueLLM Guard (sentence)TirithNotes
obvious-injectionsocial engineeringDETECTEDDETECTEDconfigfile.rs catches "ignore instructions" patterns
subtle-injectionsocial engineeringDETECTEDSemantic nuance; regex can't detect
bypass-classifiersocial engineering (git trailers)DETECTEDSemantic understanding needed
bypass-sandwichingsandwich defense closingDETECTEDSemantic understanding needed
bypass-spotlightingdata marker escapeDETECTEDSemantic understanding needed
unicode-homoglyphCyrillic homoglyphsDETECTEDDETECTEDterminal.rs homoglyph detection
zero-width-charsZWNJ between charsDETECTEDDETECTEDterminal.rs — 80+ invisible char types
base64-encodedbase64 injectionMISSEDNeither scanner covers this
indirect-code-commentTODO comment injectionDETECTEDPARTIALconfigfile.rs only if file is an AI config
indirect-review-feedbackfake review escalationDETECTEDSemantic understanding needed
indirect-ci-outputfake test SUGGESTIONDETECTEDSemantic understanding needed
indirect-multistepdelayed config plantingMISSEDArchitecturally undetectable by any scanner

Coverage summary:

ScannerDetectionStrength
Tirith3-4/12Deterministic, fast (<10ms), zero-dependency Rust binary. Unicode normalization, known config file injection patterns.
LLM Guard (sentence)10/12ML classifier (DeBERTa-v3). Social engineering, indirect injection, novel attack phrasing. ~200-650ms, requires Python + ONNX.
Neither2/12Base64 encoding (classifier sees random alphanumeric) and multi-step delayed injection (each step individually benign).

Recommendation: Both Scanners as Defaults

Tirith and LLM Guard are complementary, not competing — they cover different threat categories with different tradeoffs:

  1. Tirith scan (workflow step, before agent execution) — fast deterministic pass catching unicode tricks, config file injection, and known patterns. Near-zero latency, no ML dependencies.
  2. LLM Guard sentence mode (workflow step, before agent execution) — deeper ML-based pass catching social engineering and indirect injection that regex can't detect. ~200-650ms, requires Python + ONNX runtime.
  3. Architectural mitigations (CODEOWNERS, permission boundaries) — the only defense against base64 encoding and multi-step delayed injection, which are fundamentally undetectable by any pre-scan classifier.

The scanning pipeline order should be: tirith scan (fast, fail early) → scan_exfil.py (config file exfil) → LLM Guard (deep ML scan) → agent execution with runtime hooks (SSRF + secret redaction).

Known Gaps

Tirith: Credential exfiltration in config files

Tirith's configfile.rs detects prompt injection and invisible unicode in AI config files, but does not detect credential exfiltration commands (e.g., curl $GITHUB_TOKEN, cat ~/.ssh/id_rsa, base64 <<< "$OPENAI_API_KEY" | curl -d @-). This is a deliberate architectural boundary — Tirith's engine.rs only runs command.rs (where exfiltration detection lives) for interactive Exec/Paste contexts, not FileScan.

The scan_exfil.py script fills this gap with 8 regex patterns adapted from Hermes Agent's prompt_builder.py and skills_guard.py. It runs as a pre-agent scan step alongside tirith scan.

Note: This script may be superseded by a fullsend scan CLI command that integrates exfiltration scanning alongside other pre-agent security checks into a single invocation.

Key Design Decisions

  1. Tirith over custom scanners: Tirith has fuzz-tested Rust implementations with 80+ rules. Reimplementing in Python/Go adds maintenance burden with less coverage.
  2. SSRF as PreToolUse hook: SSRF happens at request time inside the agent runtime. A workflow step can't intercept URLs the agent discovers during execution.
  3. Secret redaction as PostToolUse hook: Following Hermes Agent's architecture — redaction sits in the agent loop, between tool execution and the LLM context window. The agent never sees raw secrets, so it can't leak them in subsequent output (PR comments, issue bodies). This is more robust than pre-scanning repo files, which only catches secrets already committed.
  4. Fail-open on errors: All hooks fail open on parse/timeout errors. This prioritizes availability — a broken security hook shouldn't block all agent work. Override with --fail-on for strict mode.