Documentation

Learn how to interpret results and master system design principles

Understanding Your Results

Capacity (RPS)

The maximum requests per second your system can handle before becoming a bottleneck. This is determined by the slowest component in your architecture.

💡 Tip: If you're below the required RPS, look for the component with the lowest capacity in your path. Consider adding replicas, caching, or optimization.

Latency (P95)

The 95th percentile response time. This means 95% of requests are faster than this value. P95 is more meaningful than averages for understanding user experience.

💡 Tip: Network latency between components adds up quickly. Use CDNs, edge computing, and regional replicas to reduce geographic latency.

Outcome States

Pass
Both metrics met
~
Partial
One metric met
Fail
Neither met
Chaos Fail
Component crashed

Hints System

After failing a scenario, you'll get progressive hints to guide your solution. The more you struggle with a scenario, the more specific the guidance becomes.

💡 Tip: Don't rush to look at hints! The learning comes from discovering solutions yourself. Use hints when you're truly stuck.

Essential Reading

Common Design Patterns

Cache-Aside Pattern

Application checks cache first, then database if miss. Updates cache on writes.

Use when: Read-heavy workloads, acceptable stale data, fast database queries

Circuit Breaker

Stop calling failing services to prevent cascade failures. Automatically retry when healthy.

Use when: External service dependencies, network failures, timeout handling

Database Sharding

Split database across multiple servers using consistent hashing or range-based partitioning.

Use when: High write throughput, large datasets, horizontal scaling needed

Event Sourcing

Store all state changes as events. Rebuild state by replaying events.

Use when: Audit trails needed, temporal queries, complex business logic