Published bag dwarf bag dwarf bag dwarf new_bag │ ┌────────────┐ │ │ Lock Mutex │ -> │ │ └────────────┘ │ ┌────────────┐ │->│Release Lock│ -> │ End │ └────────────┘ └───────────── on April 01, 2026

Cancellation Safety in different Rust AsyncRuntimes

By Nikita Bishonen • 8 minutes read •

tags:

categories:

To get more context I recommend my dear friends to read or watch a great talk made by Rain, where they/she described her attitude to “cancellation safety” in async Rust.

TL;DR for those who prefer to hack and not to read:

git clone https://gitlab.com/blogging1/cancellation-safety;
cargo test --no-fail-fast --workspace (or your runtime of choice by -p cancel_%RUNTIME% intsead of --workspace);
Read errors and tests;
Read lib.rs for runtimes you are interested in;
Play with the code to fix all the tests;

The Problem

“Cancellation Safety” usually means absence of unexpected “effects” when sequence of asynchronous operations has been stopped before it reaches the final state. I can think of a panic! in the middle of a synchronous sequence of operations as an analogy.

The fundamental principles of it tied to the Future trait and to the implementation details of the Asynchrnous Runtime. Here, I will like to experiment in an attempt to find common patterns and differences in how Tokio, smol and glommio handle cancellations.

Future::poll is a key aspect here, as at compile-time every Future implementation is a state-machine, that represents a sequence of operations, “cancellation” means that Runtime (Reactor, Event Loop or Scheduler) will stop polling this state-machine and (hopefully) destroy it’s state;
Pin and Unpin have their own places, as !Unpin types have to keep additional guarantees to be considered “safe” (mitigate “effects”);
Waker implementation may play a big role in handling “cancellation” requests;

The Example

In our adventure we will go join our strong and proud friends who lives in Moria:

pub async fn mine_with_tool<F, FT>(dwarf: &mut Dwarf, mut pickaxe: F) ... { let mut bag: Bag = Vec::new(); for i in 0..MAX_ALLOWED_SHIFTS { font-style:italic>        pickaxe().await; ... font-style:italic>        println!("Here at the Gates the king AWAITS"); ... t-dark(#81c8be,#adb1c2)>.push(Ferrum::Dirty); } dark(#81c8be,#adb1c2)>.bag = Some(bag); }

so our asynchronous process will imitate dwarf working in mines. The mine_with_tool itself is a composite trait Future implementation, that internally do MAX_ALLOWED_SHIFTS steps to reach it’s final (completed) state. In simple terms, line 4: pickaxe().await; will be a point when state-machine polled and make progress. It means that this exact spot may be used to “cancel” the execution.

How it loolks at high-level intermediate representation (code cleaned to make it less verbose, please run cargo +nightly rustc -- -Z unpretty=hir inside moria folder to see the full output):

async fn mine_with_tool<F, FT>(dwarf: &'_ mut Dwarf, pickaxe: F) -> /*impl Trait*/ where F: FnMut() -> FT, FT: std::future::Future |mut _task_context: ResumeTy| { ... let mut bag: Bag = Vec::new(); { ... loop { match next(&mut iter) { font-style:italic>                None {} => break, font-style:italic>                Some {  0: i } => { match into_future(pickaxe()) { mut __awaitee => loop { match unsafe { font-style:italic>                                        poll(new_unchecked(&mut __awaitee), font-style:italic>                                            get_context(_task_context)) } { font-style:italic>                                    Ready {  0: result } => break result, font-style:italic>                                    Pending {} => { } } _task_context = (yield ()); }, }; ... style=color:light-dark(#81c8be,#adb1c2)>.push(Ferrum::Dirty); } } }, ... dark(#81c8be,#adb1c2)>.bag = Some(bag); ... }

alomost no “async” magic anymore, we have a loop, an unsafe and a yield ().

Let’s jump to a “safety” aspect. In my humble opinion, if there is no unsafe code and Rust compiler is pleased, the operation is safe. Yet, it doesn’t mean that the program behaves the way programmers may expect it to behave.

... fn mine_with_tool<...>(dwarf: &mut Dwarf ... { let mut bag: Bag = Vec::new(); ... t-dark(#81c8be,#adb1c2)>.push(Ferrum::Dirty); ... dark(#81c8be,#adb1c2)>.bag = Some(bag); }

try to spot yourself what makes this trait Future implementation be named “cancellation not-safe” (I prefer not-safe here, as unsafe is important but unrelated Rust term).

The answer lies in how this asynchronous operation stores it’s state and tracks the own progress. Both things happen internally, while holding a mutable reference to the state given by the caller side dwarf: &mut Dwarf. Let’s see two examples of this feature usage inside smol runtime to see why such implementation can give a surprises to caller if cancelled.

pub async fn work(dwarf: &mut Dwarf) { super::mine(dwarf) .or(async { Timer::after(HALF_SHIFT).await; }) .await; }

Dwarf works here half of the shift time and made some progress. But when we go to see run the test:

let mut dwallin = Dwarf::new(Name::Dwalin); timeout::work(&mut dwallin).await; font-style:italic>// Bag is empty, but I heard the song! font-style:italic>assert!(dwallin.bag.is_some());

we see it’s assertation that some work has been done fails. The reason is that the timeout happened before Dwallin was able to finish his work and all what he put into the “bag” in internal state of the state-machine has been dropped once we cancel the operation at the first poll after timeout. (Try to change Timer::after(HALF_SHIFT) to use FULL_SHIFT and see if it will help 😉). So this is what we may say a cancellation not-safe (or incorrect as Rain proposed) implementation of asynchronous operation that leads to a behaviour that we would not expect.

More interesting (and closer to real-world) can happen if we make our Future implementation more complex and “dirty”:

pub async fn work(dwarf: &Mutex<Dwarf>) { let mut dwarf_guard = dwarf.lock().await; let mut old_bag = dwarf_guard.bag.take().unwrap(); font-style:italic>    timeout(HALF_SHIFT, super::mine(&mut dwarf_guard)) .await .err() .unwrap(); if let Some(new_bag) = dwarf_guard.bag.as_mut() { light-dark(#81c8be,#adb1c2)>.append(&mut old_bag) } }

I know, Dwarfs are not good at asynchronous programming, but it illustrates the problem really good. Our operation takes state out of input, holds it internally, than accumulates with own computation results and give it back.

┌────────────────────────────────────────────────────────────────────────────────────────┐ ┌──────────────┐    ┌───────┐    ┌─────────────┐    ┌──────────────┐│ Take Old Bag │ -> │ Mine  │ -> │Take New Bag │ -> │  Merge Bags  ││ └──────────────┘    └───────┘    └─────────────┘    └──────────────┘│ ┌──────────────┐                                                    │ │                                                    │ └──────────────┘                                                    │ ───────────────────────────────────────────────────────────────────────────┘

The issue is that operation is “safe” from the mutex point of view, as we know noone else will change the state of the dwarf. Yet it is not “correct” if we cancel super::mine before it fully completes. old_bag will be dropped as new_bag will not be a thing (dwarf_guard.bag.as_mut() is None). As work future implementation is a composition of itself with super::mine future implementation and timeout future implementation, our logic becomes broken, because both futures have incorrect behavior from the system perspective (while it is totally expected, valid and safe Rust code imho).

The Comparison

Runtime Architecture Overview

Tokio: The Industry Standard

Tokio represents the most widely used async runtime in the Rust ecosystem. It’s architecture focuses on:

Multi-threaded thread pool executor for CPU-bound work;
Completer-based I/O model for non-blocking operations;
Robust task cancellation with explicit abort capabilities;
Comprehensive ecosystem of supporting libraries;

Key characteristics:

Built-in cancellation tokens via tokio_util::sync::CancellationToken;
tokio::spawn() for creating tasks;
tokio::select! for concurrent operations;
Built-in timeout utilities;

smol: The Minimal Approach

smol takes a radically different approach with:

Single-threaded executor by default;
Simplified API focusing on essential async operations;
No own runtime - it brings to you existing executors;
Lightweight dependencies and minimal overhead;

Key characteristics:

smol::spawn() for task creation;
smol::Timer for async delays;
smol::channel for message passing;
No explicit cancellation tokens in core API;

Glommio: I/O Performance Focus

Glommio represents a specialized runtime designed for high-performance I/O workloads:

Local executor model with share-nothing-first approach;
I/O-optimized with dedicated thread-per-core architecture;
No shared memory between threads by default;
Local-only futures for better cache locality;

Key characteristics:

glommio::LocalExecutor for single-threaded execution;
glommio::spawn_local() for tasks;
glommio::timer for delays;
glommio::channels::local_channel for local-only communication;

Cancellation Scenarios Analysis

Most of the scenarios are work similar across runtimes:

Simple Drop Cancellation: Use explicit drop mechanisms for futures and all runtimes show partial work loss when futures are dropped unexpectedly;
Timeout Cancellation: All provide explicit timeout handling and tests show that timeout doesn’t preserve partial results;
Mutex-Protected Operations: tokio::sync::Mutex, smol::lock::Mutex, glommio::sync::RwLock all share similar semantics, while tests demonstrate lock holding patterns and cleanup works the way they should, while we still loss our progress;
Channel-Based Communication: very similar, only with locality nuances;

But some of them have nuances:

Macroses like select! and Concurrent Operations with Cancellation Tokens are specific to Tokio runtime ecosystem, which seems to be not a good or bad things. As Rain described in her talk and what I heard from other developers, such macroses sometimes are totally banned in projects due to their non-explicit nature (and for example replaced with futures_concurrency);
“Explicit” Cancel:
- in Tokio is done via JoinHandle::abort and dropping handle will not cause a cancellation, nahdle.await will return an error JoinError if future implementation was cancelled (or normal result if it finished), also worth noting that spawn_blocking tasks are not “cancellable” because they are not asynchronous (but it will prevent the task from starting if it wasn’t yet!);
- in smol you can call Task::cancel and wait for the cancellation (which may return Some if it finished), it is similar to dropping the future implementation;
- in glommio approach is also identical to smol with Option being returned on awaiting of the cancellateion;

From the documentation side this topic is only covered in Tokio API docs, smol has one small (ha-ha) note on this matter:Note that canceling a task actually wakes it and reschedules one last time. Then, the executor can destroy the task by simply dropping its Runnable or by invoking run()., while in glommio I was unable to find mentioning cancellation safety or correctness. I think there are two factors why it is, what it is:

Tokio has much bigger popularity and usage, though have more resources to add documentation;
Tokio has much “dangerous” API and internal executor model that makes it easier to run into cancellation safety issues using it;

I hope you found something new in this blog post, write your thoughts in the comments and check additional resources if you want to.

Additional Resources and Sources of inspiration

Great Rain’s talk;
Current comment on state of the book section about cancellation safety;
Tokio Docs on that matter
Sructured Concurrency lib

Cancellation Safety in different Rust AsyncRuntimes

The Problem

The Example

The Comparison

Runtime Architecture Overview

Tokio: The Industry Standard

smol: The Minimal Approach

Glommio: I/O Performance Focus

Cancellation Scenarios Analysis

Additional Resources and Sources of inspiration

Comments