Using Pin and safe patterns for self-references
A self-referential structure is a data type that contains a pointer to itself or to its own data. In most languages, this is trivial. In Rust, it's one of the hardest problems because of ownership and borrowing rules.
The Core Problem:// This is what we WANT to do, but Rust forbids it
struct SelfRef {
data: String,
ptr: &String, // ERROR: Can't have a reference without a lifetime
}
When a struct moves in memory, pointers to its fields become invalid. Rust's ownership system prevents this... except with Pin.
Pin is a wrapper that prevents a value from being moved in memory after it's been "pinned."
use std::pin::Pin;
// Pin prevents T from being moved
pub struct Pin<P: Deref> {
pointer: P,
}
Key Guarantee: Once pinned, a value will never move in memory until it's dropped.
This is simplified, but shows how async/await actually works under the hood:
use std::pin::Pin;
use std::task::{Context, Poll};
use std::future::Future;
/// A self-referential future that stores data and a pointer to that data
struct AsyncReadFile {
// The buffer where we'll read data
buffer: Vec<u8>,
// A pointer into our own buffer (self-referential!)
buffer_ptr: *const u8,
// File handle (simplified)
file_descriptor: i32,
// Current state
state: ReadState,
}
enum ReadState {
Initial,
Reading,
Complete,
}
impl AsyncReadFile {
fn new(file_descriptor: i32) -> Pin<Box<Self>> {
let mut file = Box::new(Self {
buffer: Vec::with_capacity(1024),
buffer_ptr: std::ptr::null(),
file_descriptor,
state: ReadState::Initial,
});
// CRITICAL: Set up self-reference
// This is why we need Pin - buffer address won't change
let buffer_ptr = file.buffer.as_ptr();
file.buffer_ptr = buffer_ptr;
// Pin it! Now it can never move
Box::pin(file)
}
}
impl Future for AsyncReadFile {
type Output = Vec<u8>;
fn poll(mut self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Self::Output> {
// Safety: We're pinned, so buffer won't move
// This is safe because we know the invariants are maintained
unsafe {
let this = self.get_unchecked_mut();
match this.state {
ReadState::Initial => {
println!("Starting async read from fd {}", this.file_descriptor);
this.state = ReadState::Reading;
// Register with runtime
cx.waker().wake_by_ref();
Poll::Pending
}
ReadState::Reading => {
// Simulate async read
// In real code: this would check if read is complete
println!("Reading into buffer at {:p}", this.buffer_ptr);
// Simulate data arrival
this.buffer.extend_from_slice(b"file contents here");
this.state = ReadState::Complete;
Poll::Pending
}
ReadState::Complete => {
println!("Read complete!");
Poll::Ready(std::mem::take(&mut this.buffer))
}
}
}
}
}
// Usage with async/await
async fn read_file_example() {
let contents = AsyncReadFile::new(42).await;
println!("Read {} bytes", contents.len());
}
buffer_ptr points into bufferbuffer_ptr would be invalidIntrusive data structures are common in OS kernels and embedded systems:
use std::pin::Pin;
use std::marker::PhantomPinned;
/// An intrusive linked list node
/// "Intrusive" means the node contains the link, not external storage
struct Node {
data: i32,
// Self-referential: next points to another Node
next: Option<*mut Node>,
// Marker that this type must not move
_pin: PhantomPinned,
}
impl Node {
fn new(data: i32) -> Pin<Box<Self>> {
Box::pin(Node {
data,
next: None,
_pin: PhantomPinned,
})
}
}
/// A pinned linked list
struct IntrusiveList {
head: Option<Pin<Box<Node>>>,
}
impl IntrusiveList {
fn new() -> Self {
Self { head: None }
}
/// Push a new node to the front
fn push(&mut self, data: i32) {
let mut new_node = Node::new(data);
// Take the old head
if let Some(old_head) = self.head.take() {
// SAFETY: We have exclusive access and node is pinned
unsafe {
let new_node_ptr = new_node.as_mut().get_unchecked_mut() as *mut Node;
let old_head_ptr = Box::into_raw(Pin::into_inner_unchecked(old_head));
(*new_node_ptr).next = Some(old_head_ptr);
}
}
self.head = Some(new_node);
}
/// Iterate through the list
fn iter(&self) -> IntrusiveIter {
IntrusiveIter {
current: self.head.as_ref().map(|pin| {
unsafe { pin.as_ref().get_ref() as *const Node }
}),
}
}
}
struct IntrusiveIter {
current: Option<*const Node>,
}
impl Iterator for IntrusiveIter {
type Item = i32;
fn next(&mut self) -> Option<Self::Item> {
self.current.map(|node_ptr| unsafe {
let node = &*node_ptr;
self.current = node.next.map(|p| p as *const Node);
node.data
})
}
}
// Usage
fn intrusive_list_example() {
let mut list = IntrusiveList::new();
list.push(1);
list.push(2);
list.push(3);
// Iteration works because nodes are pinned
for value in list.iter() {
println!("Value: {}", value);
}
}
use std::marker::PhantomPinned;
struct MustNotMove {
data: String,
ptr_to_data: *const String,
_pin: PhantomPinned, // This makes the struct !Unpin
}
PhantomPinned is a zero-sized marker type that:
!Unpin (opts out of the Unpin auto-trait)Most types are Unpin automatically:
// These are all Unpin (can be safely moved even when pinned)
struct Point { x: i32, y: i32 }
struct Config { name: String }
struct User { id: u64 }
// Only !Unpin if they contain PhantomPinned or other !Unpin types
struct SelfRef {
_pin: PhantomPinned, // Now this whole struct is !Unpin
}
Critical Distinction:
use std::pin::Pin;
use std::marker::PhantomPinned;
/// A parser that keeps tokens pointing into the original buffer
/// This enables zero-copy parsing for performance
struct ZeroCopyParser {
// Original input buffer
input: String,
// Tokens point into input (self-referential!)
tokens: Vec<Token>,
// Current parse position
position: usize,
// Marker to prevent moving
_pin: PhantomPinned,
}
struct Token {
// Raw pointer into the parser's input buffer
data: *const str,
token_type: TokenType,
}
#[derive(Debug, Clone, Copy)]
enum TokenType {
Identifier,
Number,
Operator,
Whitespace,
}
impl ZeroCopyParser {
/// Create a new parser (must be pinned immediately)
fn new(input: String) -> Pin<Box<Self>> {
let mut parser = Box::pin(ZeroCopyParser {
input,
tokens: Vec::new(),
position: 0,
_pin: PhantomPinned,
});
// SAFETY: We just created it, and it will never move once pinned
unsafe {
let ptr = parser.as_mut().get_unchecked_mut();
ptr.tokenize();
}
parser
}
/// Tokenize the input
/// SAFETY: This must only be called when self is pinned
unsafe fn tokenize(&mut self) {
let input_bytes = self.input.as_bytes();
let mut start = 0;
while start < input_bytes.len() {
// Skip whitespace
while start < input_bytes.len() && input_bytes[start].is_ascii_whitespace() {
start += 1;
}
if start >= input_bytes.len() {
break;
}
// Determine token type and length
let token_start = start;
let token_type = if input_bytes[start].is_ascii_alphabetic() {
while start < input_bytes.len() && input_bytes[start].is_ascii_alphanumeric() {
start += 1;
}
TokenType::Identifier
} else if input_bytes[start].is_ascii_digit() {
while start < input_bytes.len() && input_bytes[start].is_ascii_digit() {
start += 1;
}
TokenType::Number
} else {
start += 1;
TokenType::Operator
};
// Create token with pointer into our input
// CRITICAL: This pointer is only valid because we're pinned
let token_slice = &self.input[token_start..start];
let token = Token {
data: token_slice as *const str,
token_type,
};
self.tokens.push(token);
}
}
/// Get tokens (safe because we're pinned)
fn tokens(&self) -> &[Token] {
&self.tokens
}
/// Get token text
/// SAFETY: Token pointers are valid because parser is pinned
unsafe fn token_text(&self, token: &Token) -> &str {
&*token.data
}
}
// Usage
fn parser_example() {
let input = "foo 123 + bar 456".to_string();
let parser = ZeroCopyParser::new(input);
// SAFETY: Parser is pinned, so token pointers are valid
unsafe {
for token in parser.tokens() {
let text = parser.token_text(token);
println!("{:?}: {}", token.token_type, text);
}
}
}
use std::pin::Pin;
// 1. Box::pin (most common)
let pinned = Box::pin(MyStruct { ... });
// 2. Pin::new (only works for Unpin types)
let mut value = MyUnpinStruct { ... };
let pinned = Pin::new(&mut value);
// 3. pin! macro (from pin-utils crate)
use pin_utils::pin_mut;
let value = MyStruct { ... };
pin_mut!(value); // Now value is Pin<&mut MyStruct>
// Getting a pinned reference
let pin_ref: Pin<&MyStruct> = pinned.as_ref();
// Getting a pinned mutable reference
let pin_mut: Pin<&mut MyStruct> = pinned.as_mut();
// Projection: pinning a field
unsafe {
let field_pin: Pin<&mut FieldType> =
pinned.as_mut().map_unchecked_mut(|s| &mut s.field);
}
// Get the inner value (only safe if Unpin)
let value: &T = Pin::into_inner(pinned);
// Get unchecked (dangerous! only use if you know what you're doing)
unsafe {
let value: &mut T = Pin::get_unchecked_mut(pinned);
}
// BAD: Self-referential but no PhantomPinned
struct SelfRef {
data: String,
ptr: *const String,
// Missing _pin: PhantomPinned
}
// This is Unpin! Can still be moved even when "pinned"
// Leads to undefined behavior
// BAD: Setting up self-reference before pinning
let mut s = SelfRef {
data: "hello".to_string(),
ptr: std::ptr::null(),
};
s.ptr = &s.data as *const String; // Self-reference created
let pinned = Box::pin(s); // Moving s! ptr is now invalid!
Correct Order:
// BAD: Exposing &mut allows moving
impl SelfRef {
fn get_data_mut(&mut self) -> &mut String {
&mut self.data // User can now std::mem::replace() and break invariants!
}
}
// GOOD: Only expose through Pin
impl SelfRef {
fn get_data_mut(self: Pin<&mut Self>) -> &mut String {
unsafe { &mut self.get_unchecked_mut().data }
}
}
ā
Pinned !Unpin value won't move in memory
ā Address stability until drop
ā Safe to store self-referential pointers
ā Doesn't prevent dropping
ā Doesn't prevent replacing with mem::replace (you must prevent this)
ā Doesn't work with Unpin types (they can still move)
use std::pin::Pin;
struct Outer {
pinned_field: Inner,
unpinned_field: String,
}
struct Inner {
_pin: PhantomPinned,
}
impl Outer {
// Safe projection for Unpin field
fn project_unpinned(self: Pin<&mut Self>) -> &mut String {
unsafe { &mut self.get_unchecked_mut().unpinned_field }
}
// Safe projection for pinned field
fn project_pinned(self: Pin<&mut Self>) -> Pin<&mut Inner> {
unsafe {
self.map_unchecked_mut(|s| &mut s.pinned_field)
}
}
}
Use the pin-project crate for safe field projection.
Build a state machine that holds references to its own state data.
Hints:Create an async I/O buffer that reuses its internal buffer across operations.
Hints:Build a lock-free queue using intrusive nodes with Pin.
Hints:All async functions in Tokio return futures that use Pin internally.
View on GitHubAsync runtime with extensive Pin usage for futures and streams.
View on GitHubFoundational futures library - Pin is everywhere.
View on GitHubRun this code in the official Rust Playground