8 · Governance & Observability

Most agent diagrams show the happy path, not authority — and that is exactly where production systems fail. AgentMark makes risk, trust, authority, and telemetry explicit. The framing follows NIST's AI RMF functions: govern, map, measure, manage.

Trust boundaries — group nodes by trust zone so the diagram shows where data crosses a boundary.

trust: User Device
  [human: User]
  [ui: Terminal]

trust: Hermes Backend
  [harness: Hermes]
  [agent: Coding Agent]
  [middleware: MCP Tool Selector]

trust: Third Party
  [api: OpenAI API]
  [api: GLM API]
  [api: Anthropic API]

trust: Local Execution
  [shell: Bash]
  [fs: Repository]
  [container: Test Runner]

High-risk capabilities are explicit. Every dangerous capability declares a risk level and what it requires before it can run.

capability: write_files   { risk: medium,   requires: [policy: Repo Write Policy] }
capability: run_shell     { risk: high,     requires: [approval: User Approval] }
capability: send_email    { risk: high,     requires: [approval: Human Approval] }
capability: delete_data   { risk: critical, requires: [approval: Admin Approval] }

Risk levels: low, medium, high, critical.

Authority — the real production question. Not "what can it call?" but "what is it allowed to do without asking?"

authority:
  [agent#coder: Coding Agent]
  can:
    - read_repo
    - edit_worktree
    - run_tests
  cannot:
    - push_to_main
    - access_prod_secrets
    - modify_billing
  approval_required:
    - install_dependencies
    - run_network_commands
    - delete_files

Policy / guardrail / approval / budget:

[policy: Allowed Tools Policy]
[guardrail: Prompt Injection Guard]
[approval: Human Approval]
[budget: Token Budget {limit: 2M/day}]
[budget: Cost Budget {limit: 500/month}]

Browser automation modeled as a stack. Playwright is a driver, not a tool.

[agent#browser_agent: Browser Agent {roles: [browser, form_filler]}]
  -> [tool#web_task: Web Task Tool]
  -> [driver#playwright: Playwright]
  -> [browser#chromium: Chromium {mode: headless}]
  -> [profile#session: Browser Profile {cookies: isolated}]
  -> [sandbox#browserbox: Browser Sandbox]

[browser#chromium] -> [data: DOM]
[browser#chromium] -> [file: Screenshot]
[policy: No Credential Exfiltration] x> [browser#chromium]

Playwright = driver. Chromium = browser. Profile = state. Sandbox = trust boundary. Agent = decision-maker.

Observability — first-class telemetry. AgentMark adopts OpenTelemetry's GenAI semantic conventions.

observability:
  traces: [otel: GenAI Traces]
  metrics:
    - token_usage
    - tool_calls
    - handoffs
    - guardrail_blocks
    - mcp_failures
    - cost_per_task
    - human_approval_rate

[agent#coder: Coding Agent]            -> [log: OpenTelemetry]
[middleware#mcp_selector: MCP Tool Selector] -> [log: Tool Selection Trace]
[harness#codex: Codex]                 -> [metric: Cost Per Patch]