Home/Ownership & Borrowing Patterns/Self-Referential Structures

Self-Referential Structures

Using Pin and safe patterns for self-references

expert
pinunsafeadvanced
šŸŽ® Interactive Playground

What are Self-Referential Structures?

A self-referential structure is a data type that contains a pointer to itself or to its own data. In most languages, this is trivial. In Rust, it's one of the hardest problems because of ownership and borrowing rules.

The Core Problem:
// This is what we WANT to do, but Rust forbids it
struct SelfRef {
    data: String,
    ptr: &String,  // ERROR: Can't have a reference without a lifetime
}

When a struct moves in memory, pointers to its fields become invalid. Rust's ownership system prevents this... except with Pin.

Why Do We Need Self-Referential Structures?

Critical Use Cases:

  1. Async/Await: Futures hold references to their own data across await points
  2. Intrusive Data Structures: Linked lists where nodes reference each other
  3. State Machines: Generator coroutines with internal state references
  4. Zero-Copy Parsing: Parser tokens pointing into the original buffer
  5. Network Protocol Buffers: Message structures with internal pointers
Without Pin, async/await in Rust wouldn't be possible.

Pin: The Foundation of Async Rust

Pin

is a wrapper that prevents a value from being moved in memory after it's been "pinned."

use std::pin::Pin;

// Pin prevents T from being moved
pub struct Pin<P: Deref> {
    pointer: P,
}
Key Guarantee: Once pinned, a value will never move in memory until it's dropped.

Real-World Example 1: Async Future Implementation

This is simplified, but shows how async/await actually works under the hood:

use std::pin::Pin;
use std::task::{Context, Poll};
use std::future::Future;

/// A self-referential future that stores data and a pointer to that data
struct AsyncReadFile {
    // The buffer where we'll read data
    buffer: Vec<u8>,
    // A pointer into our own buffer (self-referential!)
    buffer_ptr: *const u8,
    // File handle (simplified)
    file_descriptor: i32,
    // Current state
    state: ReadState,
}

enum ReadState {
    Initial,
    Reading,
    Complete,
}

impl AsyncReadFile {
    fn new(file_descriptor: i32) -> Pin<Box<Self>> {
        let mut file = Box::new(Self {
            buffer: Vec::with_capacity(1024),
            buffer_ptr: std::ptr::null(),
            file_descriptor,
            state: ReadState::Initial,
        });

        // CRITICAL: Set up self-reference
        // This is why we need Pin - buffer address won't change
        let buffer_ptr = file.buffer.as_ptr();
        file.buffer_ptr = buffer_ptr;

        // Pin it! Now it can never move
        Box::pin(file)
    }
}

impl Future for AsyncReadFile {
    type Output = Vec<u8>;

    fn poll(mut self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Self::Output> {
        // Safety: We're pinned, so buffer won't move
        // This is safe because we know the invariants are maintained
        unsafe {
            let this = self.get_unchecked_mut();

            match this.state {
                ReadState::Initial => {
                    println!("Starting async read from fd {}", this.file_descriptor);
                    this.state = ReadState::Reading;

                    // Register with runtime
                    cx.waker().wake_by_ref();
                    Poll::Pending
                }
                ReadState::Reading => {
                    // Simulate async read
                    // In real code: this would check if read is complete
                    println!("Reading into buffer at {:p}", this.buffer_ptr);

                    // Simulate data arrival
                    this.buffer.extend_from_slice(b"file contents here");
                    this.state = ReadState::Complete;

                    Poll::Pending
                }
                ReadState::Complete => {
                    println!("Read complete!");
                    Poll::Ready(std::mem::take(&mut this.buffer))
                }
            }
        }
    }
}

// Usage with async/await
async fn read_file_example() {
    let contents = AsyncReadFile::new(42).await;
    println!("Read {} bytes", contents.len());
}

Why Pin is Essential Here:

  1. Self-Reference: buffer_ptr points into buffer
  2. Across Await Points: Future might be suspended and resumed later
  3. Memory Stability: If Future moved, buffer_ptr would be invalid
  4. Pin Guarantee: Once pinned, Future won't move, so pointer stays valid

Real-World Example 2: Intrusive Linked List (Systems)

Intrusive data structures are common in OS kernels and embedded systems:

use std::pin::Pin;
use std::marker::PhantomPinned;

/// An intrusive linked list node
/// "Intrusive" means the node contains the link, not external storage
struct Node {
    data: i32,
    // Self-referential: next points to another Node
    next: Option<*mut Node>,
    // Marker that this type must not move
    _pin: PhantomPinned,
}

impl Node {
    fn new(data: i32) -> Pin<Box<Self>> {
        Box::pin(Node {
            data,
            next: None,
            _pin: PhantomPinned,
        })
    }
}

/// A pinned linked list
struct IntrusiveList {
    head: Option<Pin<Box<Node>>>,
}

impl IntrusiveList {
    fn new() -> Self {
        Self { head: None }
    }

    /// Push a new node to the front
    fn push(&mut self, data: i32) {
        let mut new_node = Node::new(data);

        // Take the old head
        if let Some(old_head) = self.head.take() {
            // SAFETY: We have exclusive access and node is pinned
            unsafe {
                let new_node_ptr = new_node.as_mut().get_unchecked_mut() as *mut Node;
                let old_head_ptr = Box::into_raw(Pin::into_inner_unchecked(old_head));

                (*new_node_ptr).next = Some(old_head_ptr);
            }
        }

        self.head = Some(new_node);
    }

    /// Iterate through the list
    fn iter(&self) -> IntrusiveIter {
        IntrusiveIter {
            current: self.head.as_ref().map(|pin| {
                unsafe { pin.as_ref().get_ref() as *const Node }
            }),
        }
    }
}

struct IntrusiveIter {
    current: Option<*const Node>,
}

impl Iterator for IntrusiveIter {
    type Item = i32;

    fn next(&mut self) -> Option<Self::Item> {
        self.current.map(|node_ptr| unsafe {
            let node = &*node_ptr;
            self.current = node.next.map(|p| p as *const Node);
            node.data
        })
    }
}

// Usage
fn intrusive_list_example() {
    let mut list = IntrusiveList::new();
    list.push(1);
    list.push(2);
    list.push(3);

    // Iteration works because nodes are pinned
    for value in list.iter() {
        println!("Value: {}", value);
    }
}

Why Intrusive Lists Need Pin:

  • Self-Links: Nodes contain raw pointers to other nodes
  • Memory Stability: Moving a node would invalidate all pointers to it
  • Zero-Copy: No separate allocation for links
  • Cache Locality: Data and links are co-located

PhantomPinned: Opting Out of Unpin

use std::marker::PhantomPinned;

struct MustNotMove {
    data: String,
    ptr_to_data: *const String,
    _pin: PhantomPinned,  // This makes the struct !Unpin
}
PhantomPinned is a zero-sized marker type that:
  • Makes your type !Unpin (opts out of the Unpin auto-trait)
  • Signals that this type must never move once pinned
  • Has zero runtime cost

Unpin vs !Unpin: The Auto-Trait

Most types are Unpin automatically:

// These are all Unpin (can be safely moved even when pinned)
struct Point { x: i32, y: i32 }
struct Config { name: String }
struct User { id: u64 }

// Only !Unpin if they contain PhantomPinned or other !Unpin types
struct SelfRef {
    _pin: PhantomPinned,  // Now this whole struct is !Unpin
}
Critical Distinction:
  • Unpin: Pin is effectively no-op, can still move
  • !Unpin: Pin is enforced, cannot move

Real-World Example 3: Zero-Copy Parser (Network/Systems)

use std::pin::Pin;
use std::marker::PhantomPinned;

/// A parser that keeps tokens pointing into the original buffer
/// This enables zero-copy parsing for performance
struct ZeroCopyParser {
    // Original input buffer
    input: String,
    // Tokens point into input (self-referential!)
    tokens: Vec<Token>,
    // Current parse position
    position: usize,
    // Marker to prevent moving
    _pin: PhantomPinned,
}

struct Token {
    // Raw pointer into the parser's input buffer
    data: *const str,
    token_type: TokenType,
}

#[derive(Debug, Clone, Copy)]
enum TokenType {
    Identifier,
    Number,
    Operator,
    Whitespace,
}

impl ZeroCopyParser {
    /// Create a new parser (must be pinned immediately)
    fn new(input: String) -> Pin<Box<Self>> {
        let mut parser = Box::pin(ZeroCopyParser {
            input,
            tokens: Vec::new(),
            position: 0,
            _pin: PhantomPinned,
        });

        // SAFETY: We just created it, and it will never move once pinned
        unsafe {
            let ptr = parser.as_mut().get_unchecked_mut();
            ptr.tokenize();
        }

        parser
    }

    /// Tokenize the input
    /// SAFETY: This must only be called when self is pinned
    unsafe fn tokenize(&mut self) {
        let input_bytes = self.input.as_bytes();
        let mut start = 0;

        while start < input_bytes.len() {
            // Skip whitespace
            while start < input_bytes.len() && input_bytes[start].is_ascii_whitespace() {
                start += 1;
            }

            if start >= input_bytes.len() {
                break;
            }

            // Determine token type and length
            let token_start = start;
            let token_type = if input_bytes[start].is_ascii_alphabetic() {
                while start < input_bytes.len() && input_bytes[start].is_ascii_alphanumeric() {
                    start += 1;
                }
                TokenType::Identifier
            } else if input_bytes[start].is_ascii_digit() {
                while start < input_bytes.len() && input_bytes[start].is_ascii_digit() {
                    start += 1;
                }
                TokenType::Number
            } else {
                start += 1;
                TokenType::Operator
            };

            // Create token with pointer into our input
            // CRITICAL: This pointer is only valid because we're pinned
            let token_slice = &self.input[token_start..start];
            let token = Token {
                data: token_slice as *const str,
                token_type,
            };

            self.tokens.push(token);
        }
    }

    /// Get tokens (safe because we're pinned)
    fn tokens(&self) -> &[Token] {
        &self.tokens
    }

    /// Get token text
    /// SAFETY: Token pointers are valid because parser is pinned
    unsafe fn token_text(&self, token: &Token) -> &str {
        &*token.data
    }
}

// Usage
fn parser_example() {
    let input = "foo 123 + bar 456".to_string();
    let parser = ZeroCopyParser::new(input);

    // SAFETY: Parser is pinned, so token pointers are valid
    unsafe {
        for token in parser.tokens() {
            let text = parser.token_text(token);
            println!("{:?}: {}", token.token_type, text);
        }
    }
}

Performance Benefits of Zero-Copy:

  • No Allocations: Tokens don't allocate separate strings
  • Cache Friendly: All data in one contiguous buffer
  • Fast Parsing: Only pointer arithmetic, no copying
  • Memory Efficient: Input string + token metadata only
Trade-off: Requires Pin to ensure safety.

Pin API: The Building Blocks

Creating Pinned Values

use std::pin::Pin;

// 1. Box::pin (most common)
let pinned = Box::pin(MyStruct { ... });

// 2. Pin::new (only works for Unpin types)
let mut value = MyUnpinStruct { ... };
let pinned = Pin::new(&mut value);

// 3. pin! macro (from pin-utils crate)
use pin_utils::pin_mut;
let value = MyStruct { ... };
pin_mut!(value);  // Now value is Pin<&mut MyStruct>

Working with Pin

// Getting a pinned reference
let pin_ref: Pin<&MyStruct> = pinned.as_ref();

// Getting a pinned mutable reference
let pin_mut: Pin<&mut MyStruct> = pinned.as_mut();

// Projection: pinning a field
unsafe {
    let field_pin: Pin<&mut FieldType> =
        pinned.as_mut().map_unchecked_mut(|s| &mut s.field);
}

The Unsafe Escape Hatches

// Get the inner value (only safe if Unpin)
let value: &T = Pin::into_inner(pinned);

// Get unchecked (dangerous! only use if you know what you're doing)
unsafe {
    let value: &mut T = Pin::get_unchecked_mut(pinned);
}

āš ļø Anti-Patterns and Common Mistakes

āš ļø āŒ Mistake #1: Forgetting PhantomPinned

// BAD: Self-referential but no PhantomPinned
struct SelfRef {
    data: String,
    ptr: *const String,
    // Missing _pin: PhantomPinned
}

// This is Unpin! Can still be moved even when "pinned"
// Leads to undefined behavior

āš ļø āŒ Mistake #2: Moving After Self-Reference Setup

// BAD: Setting up self-reference before pinning
let mut s = SelfRef {
    data: "hello".to_string(),
    ptr: std::ptr::null(),
};
s.ptr = &s.data as *const String;  // Self-reference created

let pinned = Box::pin(s);  // Moving s! ptr is now invalid!
Correct Order:
  1. Create value
  2. Pin it
  3. Set up self-references (only after pinning)

āš ļø āŒ Mistake #3: Exposing &mut to !Unpin Types

// BAD: Exposing &mut allows moving
impl SelfRef {
    fn get_data_mut(&mut self) -> &mut String {
        &mut self.data  // User can now std::mem::replace() and break invariants!
    }
}

// GOOD: Only expose through Pin
impl SelfRef {
    fn get_data_mut(self: Pin<&mut Self>) -> &mut String {
        unsafe { &mut self.get_unchecked_mut().data }
    }
}

When to Use Pin

āœ… Use Pin When:

  1. Implementing Async: Custom futures, streams, async iterators
  2. Self-Referential Structs: Data structure points to itself
  3. Intrusive Collections: Nodes with embedded links
  4. Zero-Copy Parsing: Tokens pointing into source buffer
  5. FFI Callbacks: C callbacks with context pointers

āŒ Avoid Pin When:

  1. Simple Data: No self-references, use normal ownership
  2. Moveable by Design: Data should be relocatable
  3. Temporary State: Short-lived, stack-allocated data
  4. Performance Not Critical: Copying is acceptable

The Pin Guarantees

What Pin Guarantees:

āœ… Pinned !Unpin value won't move in memory

āœ… Address stability until drop

āœ… Safe to store self-referential pointers

What Pin Does NOT Guarantee:

āŒ Doesn't prevent dropping

āŒ Doesn't prevent replacing with mem::replace (you must prevent this)

āŒ Doesn't work with Unpin types (they can still move)

Projection: Pinning Fields

use std::pin::Pin;

struct Outer {
    pinned_field: Inner,
    unpinned_field: String,
}

struct Inner {
    _pin: PhantomPinned,
}

impl Outer {
    // Safe projection for Unpin field
    fn project_unpinned(self: Pin<&mut Self>) -> &mut String {
        unsafe { &mut self.get_unchecked_mut().unpinned_field }
    }

    // Safe projection for pinned field
    fn project_pinned(self: Pin<&mut Self>) -> Pin<&mut Inner> {
        unsafe {
            self.map_unchecked_mut(|s| &mut s.pinned_field)
        }
    }
}
Use the pin-project crate for safe field projection.

Exercises

Exercise 1: Implement a Simple State Machine

Build a state machine that holds references to its own state data.

Hints:
  • Use Pin>
  • States can reference previous state data
  • PhantomPinned required

Exercise 2: Async I/O Buffer

Create an async I/O buffer that reuses its internal buffer across operations.

Hints:
  • Buffer + pointer into buffer = self-referential
  • Implement Future trait
  • Pin ensures buffer doesn't move during async operations

Exercise 3: Intrusive Queue

Build a lock-free queue using intrusive nodes with Pin.

Hints:
  • Each node contains next pointer
  • Nodes must be pinned
  • Compare with std::collections::VecDeque

Further Reading

Real-World Usage

šŸ¦€ Tokio

All async functions in Tokio return futures that use Pin internally.

View on GitHub

šŸ¦€ async-std

Async runtime with extensive Pin usage for futures and streams.

View on GitHub

šŸ¦€ futures-rs

Foundational futures library - Pin is everywhere.

View on GitHub

šŸŽ® Try it Yourself

šŸŽ®

Self-Referential Structures - Playground

Run this code in the official Rust Playground