Home/Resilience Patterns/Retry Patterns

Retry Patterns

Exponential backoff and retry strategies

intermediate
retrybackofffault-tolerance
🎮 Interactive Playground

What are Retry Patterns?

Retry patterns automatically repeat failed operations with configurable delays and backoff strategies. They handle transient failures gracefully, improving system reliability without manual intervention.

The Problem

Network calls and distributed systems fail intermittently:

  • Transient failures: Temporary network issues, timeouts
  • Rate limiting: APIs throttling requests
  • Resource contention: Database locks, busy services
  • Cascading failures: One retry storm can overwhelm systems

Example Code

use std::future::Future;
use std::time::Duration;
use tokio::time::sleep;

/// Retry configuration
#[derive(Debug, Clone)]
pub struct RetryConfig {
    /// Maximum number of attempts (including initial)
    pub max_attempts: u32,
    /// Initial delay between retries
    pub initial_delay: Duration,
    /// Maximum delay between retries
    pub max_delay: Duration,
    /// Backoff multiplier (for exponential backoff)
    pub multiplier: f64,
    /// Add random jitter to prevent thundering herd
    pub jitter: bool,
}

impl Default for RetryConfig {
    fn default() -> Self {
        RetryConfig {
            max_attempts: 3,
            initial_delay: Duration::from_millis(100),
            max_delay: Duration::from_secs(30),
            multiplier: 2.0,
            jitter: true,
        }
    }
}

/// Result of a retry operation
#[derive(Debug)]
pub struct RetryResult<T, E> {
    pub result: Result<T, E>,
    pub attempts: u32,
    pub total_delay: Duration,
}

/// Backoff strategies
#[derive(Debug, Clone)]
pub enum BackoffStrategy {
    /// Fixed delay between retries
    Fixed(Duration),
    /// Linear increase: delay * attempt
    Linear { initial: Duration, increment: Duration },
    /// Exponential: initial * multiplier^attempt
    Exponential { initial: Duration, multiplier: f64, max: Duration },
    /// Decorrelated jitter (AWS recommended)
    DecorrelatedJitter { base: Duration, max: Duration },
}

impl BackoffStrategy {
    pub fn delay(&self, attempt: u32, last_delay: Duration) -> Duration {
        match self {
            BackoffStrategy::Fixed(d) => *d,

            BackoffStrategy::Linear { initial, increment } => {
                *initial + (*increment * attempt)
            }

            BackoffStrategy::Exponential { initial, multiplier, max } => {
                let delay = initial.mul_f64(multiplier.powi(attempt as i32));
                delay.min(*max)
            }

            BackoffStrategy::DecorrelatedJitter { base, max } => {
                use rand::Rng;
                let mut rng = rand::thread_rng();
                let delay_ms = rng.gen_range(base.as_millis()..=(last_delay.as_millis() * 3));
                Duration::from_millis(delay_ms as u64).min(*max)
            }
        }
    }
}

/// Retry with exponential backoff
pub async fn retry_with_backoff<T, E, F, Fut>(
    config: &RetryConfig,
    mut operation: F,
) -> RetryResult<T, E>
where
    F: FnMut() -> Fut,
    Fut: Future<Output = Result<T, E>>,
    E: std::fmt::Debug,
{
    let mut attempts = 0;
    let mut total_delay = Duration::ZERO;
    let mut current_delay = config.initial_delay;

    loop {
        attempts += 1;

        match operation().await {
            Ok(value) => {
                return RetryResult {
                    result: Ok(value),
                    attempts,
                    total_delay,
                };
            }
            Err(e) => {
                if attempts >= config.max_attempts {
                    return RetryResult {
                        result: Err(e),
                        attempts,
                        total_delay,
                    };
                }

                // Calculate delay with exponential backoff
                let delay = if config.jitter {
                    add_jitter(current_delay)
                } else {
                    current_delay
                };

                sleep(delay).await;
                total_delay += delay;

                // Increase delay for next attempt
                current_delay = (current_delay.mul_f64(config.multiplier))
                    .min(config.max_delay);
            }
        }
    }
}

/// Add random jitter (0.5x to 1.5x)
fn add_jitter(delay: Duration) -> Duration {
    use rand::Rng;
    let mut rng = rand::thread_rng();
    let jitter_factor = rng.gen_range(0.5..1.5);
    delay.mul_f64(jitter_factor)
}

/// Retry only on specific errors
pub async fn retry_on<T, E, F, Fut, P>(
    config: &RetryConfig,
    mut operation: F,
    should_retry: P,
) -> RetryResult<T, E>
where
    F: FnMut() -> Fut,
    Fut: Future<Output = Result<T, E>>,
    P: Fn(&E) -> bool,
    E: std::fmt::Debug,
{
    let mut attempts = 0;
    let mut total_delay = Duration::ZERO;
    let mut current_delay = config.initial_delay;

    loop {
        attempts += 1;

        match operation().await {
            Ok(value) => {
                return RetryResult {
                    result: Ok(value),
                    attempts,
                    total_delay,
                };
            }
            Err(e) => {
                // Don't retry if error is not retryable
                if !should_retry(&e) || attempts >= config.max_attempts {
                    return RetryResult {
                        result: Err(e),
                        attempts,
                        total_delay,
                    };
                }

                let delay = if config.jitter {
                    add_jitter(current_delay)
                } else {
                    current_delay
                };

                sleep(delay).await;
                total_delay += delay;

                current_delay = (current_delay.mul_f64(config.multiplier))
                    .min(config.max_delay);
            }
        }
    }
}

/// Error types for retry decisions
#[derive(Debug)]
pub enum ApiError {
    // Retryable errors
    Timeout,
    ConnectionFailed,
    ServiceUnavailable,
    RateLimited { retry_after: Option<Duration> },

    // Non-retryable errors
    BadRequest(String),
    Unauthorized,
    NotFound,
    ValidationError(String),
}

impl ApiError {
    pub fn is_retryable(&self) -> bool {
        matches!(
            self,
            ApiError::Timeout
                | ApiError::ConnectionFailed
                | ApiError::ServiceUnavailable
                | ApiError::RateLimited { .. }
        )
    }

    pub fn retry_after(&self) -> Option<Duration> {
        match self {
            ApiError::RateLimited { retry_after } => *retry_after,
            _ => None,
        }
    }
}

/// Retry builder for fluent API
pub struct RetryBuilder<F> {
    operation: F,
    config: RetryConfig,
}

impl<F> RetryBuilder<F> {
    pub fn new(operation: F) -> Self {
        RetryBuilder {
            operation,
            config: RetryConfig::default(),
        }
    }

    pub fn max_attempts(mut self, attempts: u32) -> Self {
        self.config.max_attempts = attempts;
        self
    }

    pub fn initial_delay(mut self, delay: Duration) -> Self {
        self.config.initial_delay = delay;
        self
    }

    pub fn max_delay(mut self, delay: Duration) -> Self {
        self.config.max_delay = delay;
        self
    }

    pub fn multiplier(mut self, multiplier: f64) -> Self {
        self.config.multiplier = multiplier;
        self
    }

    pub fn with_jitter(mut self, jitter: bool) -> Self {
        self.config.jitter = jitter;
        self
    }
}

impl<T, E, F, Fut> RetryBuilder<F>
where
    F: FnMut() -> Fut,
    Fut: Future<Output = Result<T, E>>,
    E: std::fmt::Debug,
{
    pub async fn execute(self) -> RetryResult<T, E> {
        retry_with_backoff(&self.config, self.operation).await
    }
}

/// Convenience function
pub fn retry<F>(operation: F) -> RetryBuilder<F> {
    RetryBuilder::new(operation)
}

// Simulate external dependencies
mod rand {
    pub struct Rng;
    impl Rng {
        pub fn gen_range<T>(&mut self, _range: std::ops::RangeInclusive<T>) -> T
        where T: Default {
            T::default()
        }
    }
    pub fn thread_rng() -> Rng { Rng }
    pub trait RngTrait {
        fn gen_range<T>(&mut self, range: std::ops::RangeInclusive<T>) -> T;
    }
}

fn main() {
    println!("Retry patterns example");

    // Example configuration
    let config = RetryConfig {
        max_attempts: 5,
        initial_delay: Duration::from_millis(100),
        max_delay: Duration::from_secs(10),
        multiplier: 2.0,
        jitter: true,
    };

    println!("Config: {:?}", config);

    // Backoff examples
    let exp_backoff = BackoffStrategy::Exponential {
        initial: Duration::from_millis(100),
        multiplier: 2.0,
        max: Duration::from_secs(30),
    };

    for attempt in 0..5 {
        let delay = exp_backoff.delay(attempt, Duration::from_millis(100));
        println!("Attempt {}: delay {:?}", attempt, delay);
    }
}

Why This Works

  1. Exponential backoff: Prevents overwhelming recovering services
  2. Jitter: Distributes retry attempts, avoiding thundering herd
  3. Max attempts: Limits resource consumption on persistent failures
  4. Selective retry: Only retries transient, recoverable errors

Backoff Strategies

| Strategy | Behavior | Best For |

|----------|----------|----------|

| Fixed | Same delay each time | Simple rate limiting |

| Linear | Delay increases linearly | Gradual backpressure |

| Exponential | Delay doubles each time | Network failures |

| Decorrelated Jitter | Randomized with upper bound | Distributed systems |

⚠️ Anti-patterns

// DON'T: Retry everything
async fn bad_retry<T, E>(op: impl Fn() -> Result<T, E>) -> Result<T, E> {
    for _ in 0..3 {
        if let Ok(v) = op() { return Ok(v); }
    }
    op() // Retries validation errors, auth failures, etc.
}

// DON'T: No delay between retries
loop {
    if let Ok(v) = operation().await { return v; }
    // Hammers the server!
}

// DON'T: Infinite retries
while operation().await.is_err() {
    sleep(Duration::from_secs(1)).await;
    // Never gives up, resource leak
}

// DO: Selective retry with backoff
retry_on(&config, operation, |e| e.is_retryable()).await

Exercises

  1. Implement retry with circuit breaker integration
  2. Add retry budget (max retries per time window)
  3. Create retry metrics (attempts histogram, success rate)
  4. Implement hedged requests (parallel speculative retries)

🎮 Try it Yourself

🎮

Retry Patterns - Playground

Run this code in the official Rust Playground