Taxonomy of Nondeterminism in Agentic Systems

Model nondeterminism

Definition

Variability in model outputs for identical prompts and parameters. Even with temperature set to zero, most production LLM APIs do not guarantee deterministic outputs across requests.

Common causes

Floating-point precision differences across hardware
Non-deterministic sampling implementations
Load balancing across model replicas with different quantization
Model version changes without explicit notification

Why it matters

If the foundational reasoning step cannot be reproduced, debugging agent failures becomes a statistical exercise rather than a deterministic investigation.

Tool nondeterminism

Definition

External tool calls that return different results for the same inputs. This includes API calls, database queries, file system operations, and any interaction with stateful systems.

Common causes

Time-dependent data sources (current timestamp, market prices, weather)
Rate limits and throttling behaviors
External service failures and retries
Stateful operations (incrementing counters, depleting resources)
Network-dependent responses (latency, routing, CDN variations)

Why it matters

Tool outputs form the agent's perception of reality. When perception changes between runs, the entire decision tree diverges, making it impossible to isolate whether a failure was due to reasoning or simply different input data.

Retrieval nondeterminism

Definition

Variability in retrieved context for identical queries. RAG systems, vector databases, and search indices introduce nondeterminism through ranking algorithms, approximate nearest neighbor search, and index staleness.

Common causes

Approximate vector similarity (ANN algorithms are probabilistic)
Index updates between queries
Tie-breaking in ranking that depends on insertion order or internal state
Reranking models with their own nondeterminism
Cache invalidation timing

Why it matters

Retrieved context directly shapes the model's reasoning. If the same query retrieves different documents, the agent's knowledge base becomes a moving target, and correctness cannot be verified through replay.

Temporal nondeterminism

Definition

Behavior that depends on when the agent runs rather than what inputs it receives. This includes explicit time dependencies and implicit ones buried in tool behaviors or data sources.

Common causes

Direct use of system time in prompts or logic
Time-windowed queries (last 24 hours, this week, today's events)
Scheduled data refreshes
Time-based access controls and credentials
Business hours logic and timezone handling

Why it matters

Temporal coupling means failures can only be investigated "live" during the exact window when they occurred. Replaying an agent execution hours or days later will encounter a fundamentally different environment, making root cause analysis unreliable.

Hidden state leakage

Definition

Dependence on state that is not explicitly captured as input. This includes environment variables, global configuration, user sessions, and any context that affects execution but is not visible in the agent's explicit inputs.

Common causes

Environment variables read at runtime
User authentication state and permissions
Feature flags and A/B test assignments
Implicit context from previous conversations or sessions
Global rate limit counters and quota state
Process-level randomness (thread scheduling, memory layout)

Why it matters

If inputs are incomplete, replay is impossible by definition. The agent appears to behave unpredictably, but the real issue is that its actual inputs were never fully captured. This is the most insidious form of nondeterminism because it masquerades as correct design.