CharlyCst

Last week I attended HotOS, the biennial workshop on Hot topics in Operating Systems. HotOS is my favorite academic conference. It is a small-scale, single track conference where people are open to discussing all sorts of ideas. This is very different from the main system conferences, where everyone tends to be on the defensive. This year there were 100 attendees at HotOS, which took place in Banff, Alberta, Canada.

The Venue

HotOS took place at the Fairmont Banff Springs Hotel, a castle at the heart of the Banff National Park. The organizing team managed to negotiate preferential rates, which kept the overall cost somewhat manageable. I have to say that I really loved the venue, Banff is a place that I would absolutely recommend. The town itself is in the middle of the mountains, with plenty of nature and wildlife. The view is breathtaking, and we even spotted deer while walking at night to relax after a day of the conference.

The HotOS'25 Venue

The Talks

I found this year’s talks to be of very high quality and engaging. HotOS is one of those places where people are very open to discussing early stage or controversial ideas, which gave birth to genuine and interesting exchanges.

The full list of talks is available on the conference website. In this post I will only go through a few of them, but the others are worth checking too!

Guillotine: Hypervisors for Isolating Malicious AIs

This was a talk by James Mickens, also known for his band and his writing such as The Night Watch. I had high expectations for this talk, and I was not disappointed.

James presented the Guillotine hypervisor, a system designed to contain AI programs in case of a Skynet scenario. Guillotine is architected as a set security of layers, to account for potentially multiple levels of sandbox escape by the AI. The security layers range from software and hardware partitioning all the way to physical failsafes such as disconnecting the power and internet cables or immolating the whole data center. James managed to mix humor and apocalyptic threats, all with a perfect delivery and some interesting food for thought. I had a lot of fun and would definitely recommend reading the paper, make your own mind on Guillotine!

Lightweight Hypervisor Verification: Putting the Hardware Burger on a Diet

In that talk I presented our work on hypervisor verification, and I have to say that I did not expect so much positive feedback.

The paper presents an approach for finding bugs in instruction emulation and hardware configuration in hypervisors. We developed a framework to verify Miralis (a research hypervisor that virtualises firmware) and found it super valuable for us, so we decided to share the ideas behind the verification of Miralis. There are two key ideas here: first, in recent years executable ISA specifications became a de facto standard; and second, hypervisors follow a pretty regular structure that makes them easy to verify automatically. In short, we translated the RISC-V specification into Rust code (with a custom compiler), and used a model checker to verify the hardware configuration and instruction emulation. The approach is completely automated (contrary to theorem provers), and helped us squash a lots of very nasty but subtle bugs in Miralis.

Modular, Full-System Verification

That was a talk by Gregory Malecha from BlueRock on the effort to verify the Nova microkernel (and hypervisor) and the OS built on top. The proof leverages separation logic and decomposes verification by privilege levels. Their approach contrasts with seL4, and in particular they hope to achieve multi-core verification, which seL4 has been struggling with for a while. Looking forward to seeing how far they can go!

Uniting the World by Dividing it: Federated Maps to Enable Spatial Applications

I liked the talk, which presented the design of a federated mapping system. Think Google Map merging with your local store, university, and office mapping systems. They have a prototype, OpenFlame, and some cool demos.

Real Life Is Uncertain. Consensus Should Be Too!

This was a great talk by Reginald Frank, from Natacha Crooks’ group at UC Berkeley. They remark that consensus protocols operate under the assumption that up to f machines can fail, but that the f threshold is not very representative of the reality of correlated and non-homogeneous failure rates of machines. They argue that the failure probability should be taken into account when designing consensus protocols, and is a parameter that can be played with to trade off costs for consistency. For instance, maybe the quorum size should be increased while the fleet is undergoing an update, or could be reduced on newer machines. Maybe the nodes’ votes should even be weighted by their probability of failure?

From Ahead-of- to Just-in-Time and Back Again: Static Analysis for Unix Shell Programs

A cool talk by Lukas Lazarek from Brown. They built a tool for static analysis of shell programs, which can do things such as type checking or predicting changes to the filesystem. The tool builds upon a simplified but complete enough for most use cases specification of the most common Unix tools. There was an interesting discussion about the relevance of that work in the age of AI agent, would be cool to be able to automatically check if that LLM-suggested script is going to wipe out my filesystem!

The NIC should be part of the OS

Maybe one of my favorite OS papers of this year. It was presented by Pengcheng Xu from ETH Zurich’s system group, and builds on their Enzian research computer. They explain how they can go from a packet on the NIC to the handler in user-space in 3 instructions. The trick is to issue a blocking read on the NIC, which the NIC will unblock with the packet data once the packet arrives. No need to burn watts on polling while getting the best possible latency. Of course, doing so requires a cache coherent interconnect between the NIC and the CPU, which they have on their Enzian machine (and that maybe we will have too with CXL one day). That also means that the NIC needs to be aware of part of the kernel state, such as what processes are scheduled on which cores. Overall, that is a super cool NIC/OS co-design.

Rethinking RPC Communication for Microservices-based Applications

A great talk about co-design between transport protocols and network functions. When the end-to-end path is controlled by the same entity there is an opportunity in deviating from standard protocols to expose the information needed by network functions directly. For instance, rather than un-marshaling multiple layers of protocols, the network functions could directly access the required field.

Analyzing Metastable Failures

A super interesting talk by the AWS crew. They explain how to model and visualize systems to find the stable states. They have pretty cool systems visualization as vector fields where we can clearly see the attractors, some of which map to stable under-performing states which are the cause of metastable failures. Some super cool research from AWS!

Designing a Datacenter-wide Distributed Shared Log

Another talk from Natacha’s group, presented by Micah Murray. They ask the question of how to scale a log to a whole data center. The bottleneck is sequencing, which requires synchronization, but in terms of compute sequencing is pretty lightweight. Their solution is to use programmable switches to manage the sequencing, rather than standard processors. Based on their prototype, they estimate that they can scale to 62 billion req/s with 100 switches while staying under 1ms latency. Pretty cool result and fun idea!

Roaming Around in Alberta

After those fun three days of the conference in Banff I stayed a bit around Calgary to roam around. Banff is the main attraction near Calgary, but another super interesting place is Drumheller. Drumheller is known for the Royal Tyrrell Museum and its fossils (of which there are plenty in Alberta), if you are into dinosaurs and natural history I can only recommend checking it out!

A T-Rex fossil from the Royal Tyrrell Museum

The area around Drumheller has a very cool Far West vibe, with canyons, small towns with saloons, and the hoodoos.

The hoodoos, in Drumheller, Alberta

And if you are a foodie, of the restaurants I tried in Calgary I would recommend checking out Ten Foot Henry.