7 · Temporal Validity, Benchmarks & Invalidation

This cluster is what turns AgentMark from documentation into a living architecture contract. In AI, every claim expires — so the language makes expiry, evidence, and automatic invalidation first-class.

Temporal validity — every claim has a clock. Any time-sensitive assertion carries a validity window.

assert: Claude Code best MCP support
  valid:
    from: 2026-06-01
    review: 2026-06-15

The ecosystem changes weekly; without as_of / review_by / valid, a document silently lies.

Benchmarks (bench) and tests (test) — the killer feature. Claims should expire unless backed by fresh tests. A bench defines candidates, tasks, metrics, and a repeat schedule.

bench CODE-HARNESS-2026-06:
  title: Coding harness comparison for Hermes
  run_on: 2026-06-02
  owner: Joel Tong
  candidates: [Claude Code, Codex, GLM, Opus, Qwen Coder]
  tasks:
    - edit multi-file repo
    - run tests
    - fix failing test
    - connect to MCP server
    - choose correct MCP tool
    - summarize patch
  metrics:
    - accepted_patch_rate
    - cost_per_accepted_patch
    - mcp_tool_success_rate
    - latency
    - human_interventions
    - context_loss_events
  schedule:
    repeat: weekly

Claims then bind to that evidence and declare their own kill conditions:

claim C-003:
  text: GLM is cheapest among credible coding-agent substitutes.
  evidence: [bench#CODE-HARNESS-2026-06]
  invalid_if:
    - cost_per_accepted_patch is no longer lowest
  review_by: 2026-06-16

Invalidation logic — the document tells you when it's stale. A decision lists the exact conditions under which it stops being valid.

decision D-001:
  title: Use Codex for Hermes
  chosen: Codex
  supported_by: [C-002, C-003]
  constrained_by: [K-001]
  invalid_if:
    - K-001 is false
    - C-002 expires
    - C-003 confidence < medium
    - MCP-SMOKE success_rate for Codex < 0.8
    - GLM cost_per_accepted_patch < Codex by 40%

Now tooling can emit: "This architecture is stale. Decision D-001 depends on claim C-003, which expired on 2026-06-16." That is the difference between a diagram that quietly rots and a contract that announces its own decay.