8 · Governance & Observability
Most agent diagrams show the happy path, not authority — and that is exactly where production systems fail. AgentMark makes risk, trust, authority, and telemetry explicit. The framing follows NIST's AI RMF functions: govern, map, measure, manage.
Trust boundaries — group nodes by trust zone so the diagram shows where data crosses a boundary.
trust: User Device
[human: User]
[ui: Terminal]
trust: Hermes Backend
[harness: Hermes]
[agent: Coding Agent]
[middleware: MCP Tool Selector]
trust: Third Party
[api: OpenAI API]
[api: GLM API]
[api: Anthropic API]
trust: Local Execution
[shell: Bash]
[fs: Repository]
[container: Test Runner]
High-risk capabilities are explicit. Every dangerous capability declares a risk level and what it requires before it can run.
capability: write_files { risk: medium, requires: [policy: Repo Write Policy] }
capability: run_shell { risk: high, requires: [approval: User Approval] }
capability: send_email { risk: high, requires: [approval: Human Approval] }
capability: delete_data { risk: critical, requires: [approval: Admin Approval] }
Risk levels: low, medium, high, critical.
Authority — the real production question. Not "what can it call?" but "what is it allowed to do without asking?"
authority:
[agent#coder: Coding Agent]
can:
- read_repo
- edit_worktree
- run_tests
cannot:
- push_to_main
- access_prod_secrets
- modify_billing
approval_required:
- install_dependencies
- run_network_commands
- delete_files
Policy / guardrail / approval / budget:
[policy: Allowed Tools Policy]
[guardrail: Prompt Injection Guard]
[approval: Human Approval]
[budget: Token Budget {limit: 2M/day}]
[budget: Cost Budget {limit: 500/month}]
Browser automation modeled as a stack. Playwright is a driver, not a tool.
[agent#browser_agent: Browser Agent {roles: [browser, form_filler]}]
-> [tool#web_task: Web Task Tool]
-> [driver#playwright: Playwright]
-> [browser#chromium: Chromium {mode: headless}]
-> [profile#session: Browser Profile {cookies: isolated}]
-> [sandbox#browserbox: Browser Sandbox]
[browser#chromium] -> [data: DOM]
[browser#chromium] -> [file: Screenshot]
[policy: No Credential Exfiltration] x> [browser#chromium]
Playwright = driver. Chromium = browser. Profile = state. Sandbox = trust boundary. Agent = decision-maker.
Observability — first-class telemetry. AgentMark adopts OpenTelemetry's GenAI semantic conventions.
observability:
traces: [otel: GenAI Traces]
metrics:
- token_usage
- tool_calls
- handoffs
- guardrail_blocks
- mcp_failures
- cost_per_task
- human_approval_rate
[agent#coder: Coding Agent] -> [log: OpenTelemetry]
[middleware#mcp_selector: MCP Tool Selector] -> [log: Tool Selection Trace]
[harness#codex: Codex] -> [metric: Cost Per Patch]