▢→AgentMarkv0.2.0

10 · Full Worked Example: Hermes Coding Harness

The complete hermes.agentmark document. It documents the system, the market assumptions, the constraints, the tests, and the decision logic — all in one file, all with expiry dates.

---
agentmark: 0.2.0
title: Hermes Coding Harness
id: hermes
owner: Joel Tong
status: draft
as_of: 2026-06-02
review_by: 2026-06-16
timezone: Asia/Singapore
views: [topology, decision, risk, eval, landscape]
---

# Topology

[human#joel: Joel] -> [harness#hermes: Hermes]

[harness#hermes]
  -> [agent#coder: Coding Agent {roles: [planner, coder, tester, executor], autonomy: 3}]
  -> [middleware#mcp_selector: MCP Tool Selector {strategy: shortlist, top_k: 12}]
  -> [mcp#mcp_gateway: MCP Gateway]

[agent#coder] -> [model#codex: Codex]
[agent#coder] -> [tool#tests: Test Runner]
[agent#coder] -> [fs#repo: Repository]
[agent#coder] ~>[traces] [log#otel: OpenTelemetry Traces]

# Constraints

constraint K-001:
  title: Claude Code backend harness restriction
  level: MUST_NOT
  applies_to: [harness#claude_code: Claude Code]
  scope: Hermes
  as_of: 2026-06-02
  review_by: 2026-06-16

# Claims

claim C-001:
  text: Codex is currently best suited for Hermes when optimized for cost and backend use.
  kind: comparative
  subjects: [Codex, Claude Code, GLM, Opus, Qwen Coder]
  metric: overall_suitability
  scope: Hermes
  evidence: [bench#CODE-HARNESS-2026-06]
  confidence: medium
  volatility: high
  as_of: 2026-06-02
  review_by: 2026-06-16

# Landscape

landscape: Coding Harnesses
  as_of: 2026-06-02
  review_by: 2026-06-16
  scope: Hermes
  adopt:
    - Codex { reason: cost/performance and backend-harness suitability }
  blocked:
    - Claude Code { reason: K-001 }

# Decision

decision D-001:
  title: Use Codex as primary Hermes coding harness
  status: accepted
  chosen: [model#codex: Codex]
  alternatives: [Claude Code, GLM, Opus, Qwen Coder]
  constrained_by: [K-001]
  supported_by: [C-001]
  invalid_if:
    - K-001 is false
    - C-001 expires

# Benchmarks

bench CODE-HARNESS-2026-06:
  title: Coding-agent substitute comparison
  run_on: 2026-06-02
  repeat: weekly
  candidates: [Codex, GLM, Opus, Qwen Coder]
  metrics:
    - accepted_patch_rate
    - cost_per_accepted_patch

# Views

view topology:
  show: [human, harness, agent, middleware, mcp, model, tool, fs, log]
view decision:
  show: [decision, constraint, claim, landscape]

The topology alone is trivial — [User] -> [Hermes] -> [Codex] — but the surrounding constraints, claims, landscape, benchmarks, and invalidation conditions are the part teams currently lose every two weeks. Capturing them in the same file is the whole point. Open this example in the editor to see all of its generated views.