Skip to content

Known Limitations

This page documents the current limitations of triage honestly. Some are design tradeoffs, some are planned for v0.4, and some are fundamental constraints of the approach.


Classifier

RulesClassifier cannot detect semantic failures

RulesClassifier is pattern-based and makes zero API calls. It reliably detects structural failures — wrong tool name, bad JSON, HTTP errors, loops — but it physically cannot detect:

  • HALLUCINATED_STATE — requires comparing LLM assertions against tool outputs
  • GOAL_DRIFT — requires understanding intent vs. trajectory
  • PLAN_INCOMPLETE — requires knowing what sub-goals were supposed to be completed
  • CONTEXT_OVERFLOW — requires detecting that the agent lost track of earlier context

These types return UNKNOWN from RulesClassifier. If your agent produces them, either:

  • Map UNKNOWN to an appropriate recovery strategy as a catch-all
  • Use HybridClassifier(llm=LLMClassifier()) to get semantic detection at low cost (LLM is only called when rules return UNKNOWN)

LLMClassifier blocks the event loop ~100–400ms

LLMClassifier.classify() is synchronous. triage runs it via anyio.to_thread.run_sync() to avoid freezing the event loop, but the classification still adds ~100–400ms latency on the failure path. This is acceptable for most agents since classification only happens after a failure, not on every step. A fully async classifier is planned for v0.4.

No benchmarks yet

There are no published false-positive/false-negative rates for RulesClassifier or LLMClassifier. Accuracy depends heavily on the frameworks, models, and error messages your agents produce. The examples/benchmark.py script runs a synthetic suite against both classifiers so you can measure accuracy for your own trajectories.

Error messages are framework- and locale-dependent

RulesClassifier patterns are written for English-language error messages from major Python SDKs (OpenAI, Anthropic, LangGraph, CrewAI). If your framework surfaces errors in a different language or format, pattern coverage will be lower. In that case, supply a custom classifier or use LLMClassifier.


Recovery

Rollback does not undo side effects

ROLLBACK restores the trajectory snapshot and any state saved via update_state(). It does not undo:

  • HTTP requests already sent
  • Database writes already committed
  • Emails or notifications already dispatched
  • Files already written to disk

If your agent must be rollback-safe, design tools to be idempotent — re-running them after rollback should produce the same result, not a duplicate. Consider using database transactions, idempotency keys on HTTP calls, or staging areas for file writes.

record_step is an honor system

triage has no way to intercept what your agent does internally. If your agent raises an exception before calling record_step(), the trajectory will be empty. triage handles this by synthesizing a sentinel step from the raw exception (action="<no steps recorded>", error=str(exc)), so the classifier still runs — but the trajectory context will be minimal.

The implication: the more faithfully your agent calls record_step() for each observable action, the more accurate the classifier will be. A trajectory with one sentinel step will almost always classify as UNKNOWN.

No global attempt cap across failure types

max_recovery_attempts (default 3) counts total attempts per run() call. However, each strategy also has its own max_attempts parameter. An agent that alternates between two failure types could cycle up to max_recovery_attempts times regardless of per-strategy limits.

Workaround: use attempt_history in a custom strategy to count total attempts across all types:

async def bounded_recovery(ctx: FailureContext) -> RecoveryAction:
    if len(ctx.attempt_history) >= 3:
        return RecoveryAction.ESCALATE("Too many failures of any type.")
    return RecoveryAction.RETRY()

Strategies are not composable (yet)

A FailurePolicy maps each FailureType to exactly one strategy. There is no built-in way to say "try replan first, then rollback if replan fails." The workaround is a custom strategy that checks attempt_history:

async def replan_then_rollback(ctx: FailureContext) -> RecoveryAction:
    already_replanned = any(kind == "replan" for _, kind in ctx.attempt_history)
    if already_replanned:
        return RecoveryAction.ROLLBACK()
    return RecoveryAction.REPLAN(hint="Previous plan failed. Try a different approach.")

Strategy chaining (FailurePolicy.chain(primary, fallback)) is planned for v0.4.


API

Only async agents are supported

Agent wraps async def callables only. Synchronous agent functions must be wrapped:

import asyncio
from functools import partial

def my_sync_agent(task: str, *, record_step, **kwargs) -> str:
    ...

async def async_wrapper(task: str, *, record_step, **kwargs) -> str:
    loop = asyncio.get_event_loop()
    fn = partial(my_sync_agent, task, record_step=record_step, **kwargs)
    return await loop.run_in_executor(None, fn)

agent = triage.Agent(async_wrapper, policy=policy)

Streaming agents require discrete step boundaries

The step-recording model assumes your agent produces observable, discrete actions. Streaming token-by-token output has no natural step boundary. triage works with streaming agents if you call record_step() at meaningful boundaries — tool call starts/ends, message completions, or plan transitions — rather than per token.

record_step requires a signature change

The wrapped agent function must accept record_step (and optionally update_state) as keyword arguments. Wrapping an existing agent function without modifying its signature requires either an adapter closure or — planned for v0.4 — a contextvars-based approach where record_step is accessible without being in the signature.

_triage_hint is a plain string

Recovery hints injected as _triage_hint are unstructured strings. They are designed to be passed directly into an LLM prompt. There is no type-safe schema for what a hint means, which limits programmatic interpretation. A structured _triage_context object is planned for v0.4.


Concurrency

A single Agent instance is not safe for concurrent run() calls

Agent holds mutable run-state (_trajectory, _current_state, _pending_checkpoints) that is reset at the start of each run() call. Calling run() concurrently on the same instance will corrupt this state.

For parallel task dispatch, create one Agent instance per concurrent task:

import asyncio
import triage

async def run_parallel(tasks: list[str]) -> list:
    agents = [triage.Agent(my_agent, policy=policy) for _ in tasks]
    return await asyncio.gather(*[ag.run(t) for ag, t in zip(agents, tasks)])

Or use a factory function:

def make_agent() -> triage.Agent:
    return triage.Agent(my_agent, policy=policy, checkpoint_store=shared_store)

results = await asyncio.gather(*[make_agent().run(t) for t in tasks])

Shared CheckpointStore instances are safe to share across agents — InMemoryCheckpointStore has no concurrency protection (last-write-wins), but SQLiteCheckpointStore and RedisCheckpointStore use atomic operations.


Comparison with framework-native error handling

vs. LangGraph

LangGraph's built-in error handling retries the full graph from the start. triage classifies the failure first and routes to a typed strategy — retry, replan, rollback, resume, escalate, or abort — with trajectory and state context available to the strategy. The two are composable: wrap_langgraph() adds triage's classification layer on top of a compiled LangGraph graph without replacing LangGraph's own logic.

vs. try/except

try/except on exception type works well for synchronous, deterministic errors. Agent failures often carry no discriminating exception type — the same RuntimeError can mean a loop, a hallucination, or a network error depending on what the agent was doing before it raised. triage classifies on the trajectory, not the exception string.