Failure Types¶

FailureType is the core enum that drives every recovery decision. The classifier assigns one value per failure; the policy maps each value to a strategy.

All 10 types¶

Member	String value	When it occurs	Default recovery
`WRONG_TOOL_CALLED`	`wrong_tool_called`	Agent called a tool that doesn't exist or isn't in the manifest	Retry with correct manifest
`CONSTRAINT_IGNORED`	`constraint_ignored`	LLM output violates an explicit constraint from the task	Replan with constraint reminder
`LOOP_DETECTED`	`loop_detected`	Agent repeating the same tool + input across 3+ steps	Replan or rollback
`HALLUCINATED_STATE`	`hallucinated_state`	Agent asserts facts that contradict tool outputs	Rollback to checkpoint
`PLAN_INCOMPLETE`	`plan_incomplete`	Agent declared success before completing all sub-goals	Resume from subgoal
`SCHEMA_MISMATCH`	`schema_mismatch`	Tool output or LLM response didn't match expected structure	Retry with schema hint
`CONTEXT_OVERFLOW`	`context_overflow`	Agent lost earlier task context due to long-horizon drift	Replan with compressed context
`GOAL_DRIFT`	`goal_drift`	Agent making progress toward the wrong interpretation of the goal	Replan with goal restatement
`EXTERNAL_FAULT`	`external_fault`	Tool/API returned a transient error (429, 500, 502, 503)	Backoff and retry
`UNKNOWN`	`unknown`	Classifier could not determine the failure type	Escalate

Classifier coverage¶

Not all failure types are detectable by the pattern-based RulesClassifier. The table below shows which classifier is required for each type:

Type	RulesClassifier	LLMClassifier / HybridClassifier
`WRONG_TOOL_CALLED`	✓	✓
`SCHEMA_MISMATCH`	✓	✓
`EXTERNAL_FAULT`	✓	✓
`LOOP_DETECTED`	✓ (window configurable)	✓
`CONSTRAINT_IGNORED`	✓ (requires `constraints=` arg)	✓
`HALLUCINATED_STATE`	✗ → `UNKNOWN`	✓
`PLAN_INCOMPLETE`	✗ → `UNKNOWN`	✓
`CONTEXT_OVERFLOW`	✗ → `UNKNOWN`	✓
`GOAL_DRIFT`	✗ → `UNKNOWN`	✓

HALLUCINATED_STATE, GOAL_DRIFT, PLAN_INCOMPLETE, and CONTEXT_OVERFLOW require semantic understanding of the trajectory — pattern-matching cannot detect them. If you use RulesClassifier alone, these failures arrive at your UNKNOWN strategy. Use HybridClassifier (rules first, LLM on UNKNOWN) to get full coverage without paying for an LLM call on every failure.

Design rules¶

Order is stable — do not reorder members. String values are used in logs, serialized state, and external systems.
UNKNOWN is always last — new types are inserted before it.
String values are the stable public identifier — never change them.

Using failure types in strategies¶

from triage.taxonomy import FailureType, FailureContext
from triage.policy import RecoveryAction

async def smart_strategy(ctx: FailureContext) -> RecoveryAction:
    if ctx.failure_type == FailureType.EXTERNAL_FAULT:
        # Check attempt history to avoid infinite backoff loops
        external_faults = sum(
            1 for ft, _ in ctx.attempt_history
            if ft == FailureType.EXTERNAL_FAULT
        )
        if external_faults >= 3:
            return RecoveryAction.ESCALATE(message="External service unavailable after 3 retries.")
    return RecoveryAction.RETRY(delay=2.0 ** len(ctx.attempt_history))