CharlyCst

Memory protection consists in managing access rights of memory pages, either to avoid bugs or preventing malicious behavior. This is usually done through system calls, for instance with mprotect on Linux, because modification of the page table entries requires privileged access. However, in its Skylake architecture, Intel introduced a new way of managing memory permissions directly from userspace using memory protection keys, hence the acronym MPK, let’s dive into it!

What is Intel MPK?

MPK (also referred as PKU for Protection Keys for Userspace) is a userspace hardware mechanism to control page table permissions, it works by tagging memory pages with protection keys using 4 previously unused bits, in other words we can use up to 16 distinct keys to tag our pages.

Once a page is tagged we can change its protection rights at will, from userspace. But, because updating page table entries (PTE) requires privileged access, a system call is still necessary to tag the pages with a given key in the first place. To allocate and free a key we also need to go through the kernel, on Linux the API is the following:

int pkey_alloc(unsigned long flags, unsigned long ini_access_rights);
int pkey_free(int pkey);
int pkey_mprotect(unsigned long start, size_t len, unsigned long protection, int pkey);

pkey_mprotect is analog to the mprotect syscall, but takes an additional argument pkey: a key previously allocated through pkey_alloc. As I mentioned before there is only 16 keys available (the key 0 being already used as a default to tag newly allocated pages), thus allocation can fail. You can learn more on the Linux API here.

The rights associated with each key are stored inside a new 32 bits register called PKRU, it can be read and written respectively by the new instructions RDPKRU and WRPKRU. According to the x86_64 Intel manual, RDPKRU and WRPKRU can be used as follows:

WRPKRU: Writes the value of EAX inside PKRU, ECX and EDX must contain 0 when executed.
RDPKRU: Writes the value of PKRU inside EAX and clears EDX, ECX must contain 0 when executed.

For a key i ∊ ⟦0; 15⟧ the bit 2i of the PKRU block any data read or write if set to 1 (it is called access disable bit, or AD) and the bit 2i+1 disable only write (called write disable bit, WD). Thus, we can both read and write if the two bits (WD, AD) are set to (0, 0), only read with (1, 0) and have no access with (0, 1) or (1, 1).

Of course, it is not possible to override page table protections, thus the actual protection is the intersection of page table and key protections. For instance if the process is only allowed to read a page but the PKRU doesn’t restrict write, any attempt to write on that page will still rise an error.

Note that because PKRU is a register, this permission system is inherently thread local, that is two threads can have different access rights for the same page depending on the value of PKRU for each thread. This is very different from mprotect which affect all the threads of a process. If you want to keep the same behavior as mprotect, threads need to synchronize the value of their PKRU registers.

Some specificities

As you can imagine, there are some rather exiting use cases for this new mechanism:

First, with MPK the protection is thread local, wheras mprotect affect the whole process. This makes design patterns such a one writter for N readers trivial to set up: just allocate a single key, tag your pages with that key using pkey_mprotect and disable write for all reader threads by updating their PKRU. This way any attempt to modify the pages by the readers will raise an exception, reducing the attack surface and preventing a whole class of bugs.

Secondly, updating the PKRU is much faster than a call to mprotect. According to the benchmark presented in the libmpk paper, writing to PKRU using WRPKRU takes around 20 CPU cycles, comparable to a mispredicted if branch, against more than a 1000 cycles for the mprotect syscall on a single memory page. MPK is 50 times faster! But there is more: because mprotect actually modify the PTE, it runs in linear time with respect to the number of memory pages, whereas with MPK you have to pay this price once to tag pages with a key, and then updating access rights runs in constant time!

Obviously there are limitations, starting with the number of keys: 16. I also mentioned the need for synchronization in order to coordinate PKRU registers among threads, because of this MPK can not be used as a drop-in replacement for mprotect, even though the libmpk paper propose a software abstraction on top of MPK to achieve this goal. The paper also higlight the protection-key-use-after-free issue, which arises when reallocating a key because pkey_free does not clear page table entries, in other words some pages may have undesired restricted access rights.

Using MPK

Well, this is all very interesting, what about giving it a try?

I’m going to present a small wrapper module in Go, however I must warn you that using MPK in Go is probably a bad idea because:

You don’t have control over the memory layout.
The runtime might try to access protected pages.
You have no control over the thread your code is running on.

This is also valid for most high level languages, keep that in mind. However, my code was designed to be used inside the runtime itself, and thus it is possible to address all those issues.

At the time of writing, MPK is not yet widely available, I used an AWS EC2 c5.large instance to run the following code.

Syscalls

Let’s start with the 3 system calls, unfortunately I couldn’t find them in the sys/unix module, so let’s just use the Syscall functions from syscall module:

func Syscall(trap, a1, a2, a3 uintptr) (r1, r2 uintptr, err Errno)
func Syscall6(trap, a1, a2, a3, a4, a5, a6 uintptr) (r1, r2 uintptr, err Errno)

The trap parameter is the syscall signal number, we are interested in those:

const (
	sysPkeyMprotect = 329
	sysPkeyAlloc    = 330
	sysPkeyFree     = 331
)

While a1, a2, a3 are the arguments passed to the system call. Because mpkey_mprotect requires 4 arguments we will also need Syscall6, which is the same as Syscall with 3 extra arguments.

With a little help from the linux doc, it’s quite easy to write go wrapper.

type Pkey    int
type Prot    uint32
type SysProt int

// PkeyAlloc allocates a new pkey
func PkeyAlloc() (Pkey, error) {
	pkey, _, _ := syscall.Syscall(sysPkeyAlloc, 0, 0, 0)
	if int(pkey) < 0 {
		return Pkey(pkey), errors.New("Failled to allocate pkey")
	}
	return Pkey(pkey), nil
}

// PkeyFree frees a pkey previously allocated
func PkeyFree(pkey Pkey) error {
	result, _, _ := syscall.Syscall(sysPkeyFree, uintptr(pkey), 0, 0)
	if result != 0 {
		return errors.New("Could not free pkey")
	}
	return nil
}

// PkeyMprotect tags pages within [addr, addr + len -1] with the given pkey.
// Permission on page table can also be updated at the same time.
// Note that addr must be aligned to a page boundary.
func PkeyMprotect(addr uintptr, len uint64, sysProt SysProt, pkey Pkey) error {
	result, _, _ := syscall.Syscall6(sysPkeyMprotect, addr, uintptr(len), uintptr(sysProt), uintptr(pkey), 0, 0)
	if result != 0 {
		return errors.New("Could not update memory access rights")
	}
	return nil
}

Messing up with PKRU

System calls are cool, but what is even cooler is to stay in userspace. Now we need to update the PKRU register, there are two amd64 instructions for that, let’s roll up our sleeves and write some assembly!

First we define the function signature in plain go:

type PKRU uint32

func WritePKRU(prot PKRU)
func ReadPKRU() PKRU

And then we have to write some assembly. Recall that we need to set ECX and EDX to zero before calling WRPKRU, while RDPKRU only requires to set ECX to zero.

TEXT ·WritePKRU(SB),$0
	MOVQ prot+0(FP), AX // Mov the argument to AX
	XORQ CX, CX         // Set CX to 0
	XORQ DX, DX
   	WRPKRU              // Write AX to PKRU
	RET

TEXT ·ReadPKRU(SB),$0
	XORQ CX, CX
	RDPKRU
	MOVQ AX, ret+0(FP)  // Mov AX to the address of returned value
	RET

And that is it! We have got everything we need to start using MPK in Go. I don’t really want to manipulate bits to update rights of each key, however. Let’s make our life simpler and write a short helper function.

There are three possible protections for each key:

RWX when the two bits are set to 0
RX when write is disabled (bit 2i+1 set to 1)
X when all access are disabled (bit 2i set to 1)

We can manipulate these protections by defining some constants to represent the corresponding bits:

const (
	ProtRWX Prot = 0b00
	ProtRX  Prot = 0b10
	ProtX   Prot = 0b11
)

We often want to update the permission for only one key at a time. We can easily do that by setting the two bits corresponding to that key to 0, and then setting them to the value corresponding to the desired protection.

Let’s say we want to set protection of key i to RX, first we create a mask full of one except for bit 2i and 2i + 1, apply it to the pkru value with a bitwise and &, then shift the protection bits (1, 0) by 2i bits to the left and add them to the pkru.

const mask uint32 = 0xfffffff

func (p PKRU) Update(pkey Pkey, prot Prot) PKRU {
	pkeyMask := mask - (1 << (2 * pkey)) - (1 << (2*pkey + 1))
	pkru := uint32(p) & pkeyMask
	pkru += uint32(prot) << (2 * pkey)

	return PKRU(pkru)
}

Using MPK, for real

Time to do a demo! As I said before, using MPK in Go is usually a bad idea, yet let’s try to do it anyway.

First we want to allocate something on its own page, we can allocate a huge array and hope that the runtime will allocates a few new pages to store it. Then we allocate a key with our brand new MPK module and tag the page containing the first item of the array with that key.

// Allocate an array
a := make([]int, 1, 10000)

// Allocate a key
pkey, err := mpk.PkeyAlloc()
check(err)

// Tag the page containing `a[0]` with our key
err = mpk.PkeyMprotect(
	(uintptr(unsafe.Pointer(&a[0]))>>12)<<12, // Align pointer to page
	1<<12,          // Page size
	mpk.SysProtRWX, // Base protection
	pkey,           // Key
)

When calling pkey_mprotect, we need to ensure that the pointer we pass as argument is aligned with a page boundary. One way to do it is to shift the pointer to the right by 12 bits (the page size is 4096 = 2¹² on my system) and then back to the left by 12 bits.

We then check that we can still read and write the first item of our array, it should be the case because we didn’t update the value of PKRU yet.

fmt.Printf("Memory address of a[0]:  %p\n", a)
fmt.Println("The value inside a[0]:  ", a[0])
a[0] = 1

Now let’s update the value of PKRU to remove write access

pkru := mpk.AllRightsPKRU
pkru = pkru.Update(pkey, mpk.ProtRX)
mpk.WritePKRU(pkru)

We should still be able to read

fmt.Println("The value inside a[0]:  ", a[0])

However, the next write will rise an error

a[0] = 2

here is the output I got:

Memory address of a[0]:  0xc00007a000
The value inside a[0]:   0
The value inside a[0]:   1
unexpected fault address 0xc00007a000
fatal error: fault
[signal SIGSEGV: segmentation violation code=0x4 addr=0xc00007a000 pc=0x48ef69]

As expected, the execution triggered a segfault on the second write, after having removed write permission. We can check the error code (4) to verify the cause of the error, for instance in singinfo.h:

/*
 * SIGSEGV si_codes
 */
#define SEGV_MAPERR	1	/* address not mapped to object */
#define SEGV_ACCERR	2	/* invalid permissions for mapped object */
#define SEGV_BNDERR	3	/* failed address bound checks */
#define SEGV_PKUERR	4	/* failed protection key checks */

It is indeed because the protection key check failed 🎉

Conclusion

It is already the end of this post, I hope you enjoyed it and learned a few things about Intel MPK!

Here is a short recap of the key ideas:

Intel MPK is a userspace mechanism for memory permission management.
It is much cheaper than mprotect in CPU cycles.
You can have only 16 keys, thus you can control at most 16 groups of pages simultaneously.
It is thread bases, because the permissions are stored inside a new register PKRU. It can be an asset or a drawback depending on your application.

If you want to have a closer look, you can find the code of my MPK wrapper on github.

If you liked this post and want support me or be alerted when I publish a new one, you can follow me on twitter.

Diving into Intel MPK