Taxonomy of Nondeterminism in Agentic Systems
Model nondeterminism
Definition
Variability in model outputs for identical prompts and parameters. Even with temperature set to zero, most production LLM APIs do not guarantee deterministic outputs across requests.
Common causes
- Floating-point precision differences across hardware
- Non-deterministic sampling implementations
- Load balancing across model replicas with different quantization
- Model version changes without explicit notification
Why it matters
If the foundational reasoning step cannot be reproduced, debugging agent failures becomes a statistical exercise rather than a deterministic investigation.
Tool nondeterminism
Definition
External tool calls that return different results for the same inputs. This includes API calls, database queries, file system operations, and any interaction with stateful systems.
Common causes
- Time-dependent data sources (current timestamp, market prices, weather)
- Rate limits and throttling behaviors
- External service failures and retries
- Stateful operations (incrementing counters, depleting resources)
- Network-dependent responses (latency, routing, CDN variations)
Why it matters
Tool outputs form the agent's perception of reality. When perception changes between runs, the entire decision tree diverges, making it impossible to isolate whether a failure was due to reasoning or simply different input data.
Retrieval nondeterminism
Definition
Variability in retrieved context for identical queries. RAG systems, vector databases, and search indices introduce nondeterminism through ranking algorithms, approximate nearest neighbor search, and index staleness.
Common causes
- Approximate vector similarity (ANN algorithms are probabilistic)
- Index updates between queries
- Tie-breaking in ranking that depends on insertion order or internal state
- Reranking models with their own nondeterminism
- Cache invalidation timing
Why it matters
Retrieved context directly shapes the model's reasoning. If the same query retrieves different documents, the agent's knowledge base becomes a moving target, and correctness cannot be verified through replay.
Temporal nondeterminism
Definition
Behavior that depends on when the agent runs rather than what inputs it receives. This includes explicit time dependencies and implicit ones buried in tool behaviors or data sources.
Common causes
- Direct use of system time in prompts or logic
- Time-windowed queries (last 24 hours, this week, today's events)
- Scheduled data refreshes
- Time-based access controls and credentials
- Business hours logic and timezone handling
Why it matters
Temporal coupling means failures can only be investigated "live" during the exact window when they occurred. Replaying an agent execution hours or days later will encounter a fundamentally different environment, making root cause analysis unreliable.
Hidden state leakage
Definition
Dependence on state that is not explicitly captured as input. This includes environment variables, global configuration, user sessions, and any context that affects execution but is not visible in the agent's explicit inputs.
Common causes
- Environment variables read at runtime
- User authentication state and permissions
- Feature flags and A/B test assignments
- Implicit context from previous conversations or sessions
- Global rate limit counters and quota state
- Process-level randomness (thread scheduling, memory layout)
Why it matters
If inputs are incomplete, replay is impossible by definition. The agent appears to behave unpredictably, but the real issue is that its actual inputs were never fully captured. This is the most insidious form of nondeterminism because it masquerades as correct design.