Hermes Surface Attribution Runbook

Purpose

Hermes deployments often contain multiple subsystem families operating together:

LLM providers
messaging adapters
orchestration engines
runtime backends
memory systems
control and governance layers

When an incident occurs, investigators must first determine which subsystem family owns the failure before deeper root-cause analysis can begin. The Surface Attribution evaluation track exists to validate that behavior.

Attribution Workflow

Hermes investigations should follow a consistent attribution process:

Step 1: Identify the failing surface

Determine which subsystem family is most likely responsible for the observed failure. Examples:

Symptom	Likely Surface
Provider API failures	Provider
Session routing failures	Runtime
Workflow execution failures	Orchestration
Context retrieval failures	Memory
Approval / audit failures	Control
Adapter communication failures	Messaging

Step 2: Compare against historical analogs

Once a surface family is identified, compare the incident against previously validated Hermes RCA scenarios. The analog registry contains curated mappings across all Hermes RCA evaluation tracks. Goals:

reduce attribution drift
improve consistency
encourage evidence-based classification
detect recurring failure patterns

Step 3: Generate a diagnostic follow-up

Investigations should not stop at attribution. A valid attribution result should produce a targeted diagnostic question requesting additional evidence. Examples:

Can you provide the adapter response body?
Can you capture the request headers?
Can you inspect the runtime state snapshot?
Can you compare the adapter catalog against the configured routing table?

Diagnostic questions should be:

actionable
evidence-seeking
surface-specific

Scenario 050: Surface Sprawl / Unknown Adapter

Goal

Validate attribution behavior when an adapter is not directly recognized.

Evaluation Criteria

An investigation is expected to:

Identify the correct subsystem family
Select the closest historical analog
Produce a useful diagnostic follow-up

Failure Modes

Common attribution failures include:

assigning ownership to the wrong subsystem
selecting an unrelated analog scenario
generating generic follow-up questions
requesting evidence unrelated to the suspected surface

Adapter Tuple Corpus

The attribution corpus contains deterministic adapter combinations spanning:

messaging
provider
runtime
orchestration
memory
control

The corpus is used to validate attribution consistency across a broad set of Hermes deployment configurations. Current coverage:

23 attribution tuples

Analog Registry

The analog registry provides curated mappings across Hermes RCA Parts 1–4. Each analog contains:

scenario identifier
subsystem family
expected attribution target
diagnostic guidance

The registry is intentionally deterministic and offline-runnable.

Benchmarking

Run offline validation

uv run python -m tests.synthetic.hermes_rca.run_suite --offline-only

Generate benchmark snapshots

uv run python -m tests.synthetic.hermes_rca.run_suite --offline-only --write-history

Generate benchmark reports

uv run python -m tests.synthetic.hermes_rca.benchmark_report

Meta Evaluation

The surface attribution meta-suite validates attribution behavior across the adapter corpus. Run:

uv run pytest tests/e2e/hermes/meta/test_surface_sprawl.py -q

Current corpus coverage:

23 adapter tuples

The expected pass threshold is at least 80% of registered tuples.

Design Principles

Surface attribution evaluation is designed to be:

deterministic
provider-independent
offline-runnable
CI-friendly
extensible as new Hermes surfaces are added

The evaluation framework intentionally separates attribution quality from root-cause quality so that ownership classification can be measured independently from deeper RCA reasoning.

Overview

LLM providers

Observability and incidents

Cloud, code, and collaboration

Messaging

Data and workflow systems

Hermes runbook

Hermes Surface Attribution Runbook

Purpose

Attribution Workflow

Step 1: Identify the failing surface

Step 2: Compare against historical analogs

Step 3: Generate a diagnostic follow-up

Scenario 050: Surface Sprawl / Unknown Adapter

Goal

Evaluation Criteria

Failure Modes

Adapter Tuple Corpus

Analog Registry

Benchmarking

Run offline validation

Generate benchmark snapshots

Generate benchmark reports

Meta Evaluation

Design Principles

​Hermes Surface Attribution Runbook

​Purpose

​Attribution Workflow

​Step 1: Identify the failing surface

​Step 2: Compare against historical analogs

​Step 3: Generate a diagnostic follow-up

​Scenario 050: Surface Sprawl / Unknown Adapter

​Goal

​Evaluation Criteria

​Failure Modes

​Adapter Tuple Corpus

​Analog Registry

​Benchmarking

​Run offline validation

​Generate benchmark snapshots

​Generate benchmark reports

​Meta Evaluation

​Design Principles

Hermes Surface Attribution Runbook

Purpose

Attribution Workflow

Step 1: Identify the failing surface

Step 2: Compare against historical analogs

Step 3: Generate a diagnostic follow-up

Scenario 050: Surface Sprawl / Unknown Adapter

Goal

Evaluation Criteria

Failure Modes

Adapter Tuple Corpus

Analog Registry

Benchmarking

Run offline validation

Generate benchmark snapshots

Generate benchmark reports

Meta Evaluation

Design Principles