CharlyCst

With more than 140k stars on Github at the time of writing, React is arguably one of the most used and loved web framework, but did you ever asked yourself how it works under the hood? Well you may know that there is this virtual DOM thing, and a certain reconciliation algorithm, but how do they work exactly? The best way to find out is probably to (re)code React by ourselves, let’s get started!

But first a quick disclaimer: React is a very mature framework, it’s the results of years of research and the actual algorithm is more evolved than the one I’m presenting here. The core idea is the same, however, and I will point out differences worth mentioning.

The virtual DOM

First thing first: let’s talk about the DOM. The DOM (or Document Object Model) is a tree structure that represent the web page, it can be created from HTML:

<div id="root">
  <h1>Hello, World !</h1>
  <p>I'm an HTML page</p>
</div>

But also through JavaScript, for instance we can update the DOM we created:

const root = document.getElementById("root");

// Set the background color
root.style.backgroundColor = "blue";

// Add a new element
const elt = document.createElement("b");
elt.textContent = "I was added dynamically";
root.appendChild(elt);

The resulting DOM will be equivalent to:

<div id="root" style="background-color: blue;">
  <h1>Hello, World !</h1>
  <p>I'm an HTML page</p>
  <b>I was added dynamically</b>
</div>

Updating the DOM through JavaScript is the only way to create dynamic web pages, however writing code to perform these updates can quickly become very cumbersome, that’s where React comes in.

The idea is that instead of writing imperative code to update the DOM, it would be much easier to write a description of what we want and let our framework figure out which operations to perform. That’s what React does, we provide a target DOM and the reconciliation algorithm will perform surgical updates on the real DOM to have it match the target.

The target DOM is called Virtual DOM, it is a simple JavaScript object, meaning that it is lightweight to create, we can update it at will or even throw it away, it will not have any impact on the actual web page.

// A virtual node
{
  type: "p";
  props: {
    children: "I'm a virtual node!";
  }
}

Those virtual nodes are what the virtual DOM is made of. To give a more formal definition, virtual nodes implement this interface:

interface VNode<P = {}> {
  type: string;
  props: P & { children: VNode[] | string };
}

The property children is a little special, it represents the children of this node: you can imagine it as the inner HTML which is itself either a string or a list of other HTML elements.

Now the question is how do we create virtual nodes? React exposes a function createElement that do it for us.

const vNode = React.createElement("h1", {}, "Hello, World!");

ReactDOM.render(vNode, document.getElementById("root"));

Creating elements with the JavaScript syntax can quickly become burdensome, that is why most of the time we use JSX. JSX is just syntactic sugar, it is transpiled to a pure JavaScript equivalent using React.createElement.

// Some JSX
const vNode = (
  <div id="root">
    <h1>Hello, World!</h1>
    <p>I'm a JSX page</p>
  </div>
);

// It is translated to
const vNode = React.createElement("div", { id: "root" }, [
  React.createElement("h1", null, "Hello, World!"),
  React.createElement("p", null, "I'm a JSX page")
]);

This transpilation step can be performed by tools such as Babel or Typescript. Fortunately it is possible to pass a custom function to be used instead of React.createElement, all we have to do is to create our own JSX factory.

Now that we learned a little more about the DOM, virtual nodes and JSX we are ready to go, let’s code!

Virtual nodes and components

We will name our framework µReact, it will feature components with the setState and componentDidMound methods as well as full JSX support. You can find the code of this section here.

Let’s first define our virtual nodes:

export interface VNode<P = {}> {
  type: string;
  props: P & { children: VNode[] | string };
  domElt?: HTMLElement;
  class?: new (props: P) => Component<P>;
  component?: Component<P>;
}

We added a few fields:

domElt is a reference to the corresponding DOM element if any. We will set it once we instantiate a real DOM element corresponding to this virtual node.
class is the constructor of the µReact component, it will be defined when the virtual node correspond to a component and not to a HTML tag.
component is a reference to the component, if any, once it has been instantiated with the class constructor.

Now we are ready to create our JSX factory. There are a few difficulties here: we need to implement the same interface as React.createElement, which accepts quite a lot of different types for its arguments. Let’s break it down first:

React.createElement(type, props, children);

type is either a string corresponding to an HTML element type, such ad "div", "h1" and so on, or a constructor if the JSX element correspond to a React component (that is it begins with a capital letter, such as “App”).
props can be either null or any JavaScript object
children is more complicated, it can be either a string, a number, another virtual node, a list of any of these types or even a list of list. To keep things simple we will not deal with each corner case, though it will be general enough to create interesting applications.

Ok, let’s write a function createElement that returns a virtual node:

function createElement<P = {}>(
  type: string | (new (props: P) => Component<P>),
  props: P,
  ...children: (VNode<any> | VNode<any>[])[] | (string | number)[]
): VNode<P> {
  const normalizedProps = {
    ...props,
    children: isVNodeArray(children) ? flatten(children) : `${children}`
  };

  if (typeof type == "string") {
    // It is an html element
    return { type: type, props: normalizedProps };
  } else {
    // It is a component
    return { type: "", props: normalizedProps, class: type };
  }
}

Don’t be scared by the types, I have done my best to keep things simple, but you know some times it just gets ugly. The two functions isVNodeArray and flatten are used to normalize the props, children will be either a string or a flat array of VNodes. You can find them in the full code here.

Maybe you noticed that type expect either a string or a Component constructor, but we didn’t define the Component class yet, let’s do it.

abstract class Component<P = {}, S = {}> {
  abstract state: S;
  props: P;
  _vNode: VNode<P>;

  constructor(props: P) {
    this.props = props;
    this._vNode = {
      type: "",
      props: { ...props, children: [emptyVNode] }
    };
  }

  abstract render(): VNode<any>;
  componentDidMount() {}

  setState(newState: Partial<S>) {
    this.state = { ...this.state, ...newState };

    const newChild = this.render();
    // Do something with the new child
  }
}

It is now possible to define components by extending Component and defining a render method that returns a virtual node, that is some JSX as we are used to. We also defined setState, it updates the state and then call render, but for now we do nothing with the new child, we will come back to this method when talking about the reconciliation algorithm.

Now that we have our component and our JSX factory we can create a virtual DOM, let’s try it out. First we set up our JSX transpiler (for instance Typescript here) to use our own factory:

// tsconfig.json
{
  "compilerOptions": {
    "jsx": "react",
    "jsxFactory": "µReact.createElement"
  }
}

And we can write some JSX:

import µReact from "./µReact";

class List extends µReact.Component {
  render() {
    return (
      <ul>
        <li>Drink apple juice</li>
        <li>Eat vegetables</li>
      </ul>
    );
  }
}

const app = (
  <div style={{ backgroundColor: "#f8f8f8" }}>
    <List />
  </div>
);

console.log(app);

As expected, this code prints:

{
  type: "div",
  props: {
    style: { backgroundColor: "#f8f8f8" },
    children: [
      {
        class: List
        props: { children: [] }
      }
    ]
  }
}

Notice that the List virtual node has no children yet, because we don’t call render when instantiating the component’s virtual node.

Now that we can create a virtual DOM, it’s time to actually render it on the screen.

The reconciliation algorithm

The reconciliation algorithm, also called diffing algorithm, computes the differences between two virtual DOMs and deduce a set of operations to be applied to the old DOM to have it match the new one.

As explained in React’s documentation, computing the smallest set of operations to transform one tree into another takes O(n³) comparisons, which is impractical. We will take the same approach as React using a simple O(n) heuristic: two different types of nodes produce different sub-trees.

Let’s create our diff function

function diff(
    parentDom: HTMLElement,
    newNode: VNode,
    oldNode: VNode
): VNode {
    ...
};

The algorithm is the following: we take two virtual nodes, one corresponding to the current state and another matching the desired new state. Those two nodes are sub-trees of the virtual DOM, we iterate both of them simultaneously and for each node we have several cases:

If both are HTML tags we can reuse the same (real) DOM element and update its properties. We will create a function updateDomProperties for that. Once updated, we diff children recursively.
If both nodes are components of the same class, for instance two <App/>, and both have the same props then the current DOM is already up to date and there is nothing to do. Otherwise, if props are different we need to render the component with the most recent props, and then we recursively diff both components children.
In the case that they does not share the same type, for instance a <div> and a <p> or an <App> and a <List>. Then accordingly to our heuristic we decide that these two nodes will produce different subtrees, thus we destroy the old DOM subtree and instantiate a fresh one. This is also the place to call the componentDidMount lifecycle method.

The exact code is full of implementation details, I won’t include it in this post but you can find it here if you want to have a look.

But how exactly do we iterate on children? Well the simplest possible way: we have two lists, for instance the old children may be [<div>, <List>] and the new [<div>, <p>, <List>], we first diff the two <div>, then we diff the <List> and the <p> and finally because we have child left in the new list bot not in the old we instantiate a fresh DOM node accordingly. If we had more old than new children we simply destroy any remaining nodes at the end.

// Old DOM
<App>
  <div>...</div>
  <List />
</App>

// New DOM
<App>
  <div>...</div>  {/* Same <div/> as before                                  */}
  <p>...</p>      {/* We destroyed <List/> and replaced it with a fresh <p/> */}
  <List />        {/* A fresh <List/>                                        */}
</App>

This has some practical implications: in the previous example it would have been more efficient to create a new <p> node and to insert it between the <div> and the <List>. In fact, it will even have a visual impact because the diff operation will destroy the <List> component to create a <p> instead and then create a fresh <List>: the state will be lost. There is a way to mitigate this in React using keys, I will discuss that a little later.

Remember the Component.setState() function we created earlier? It’s time to complete it:

setState(newState: Partial<S>) {
  this.state = { ...this.state, ...newState };

  const newChild = this.render();
  const oldChild = this._vnode.props.children[0];
  // We compute the difference with between the old and new child
  this._vnode.props.children[0] = diff(
    child.domElt.parentElement,
    newChild,
    oldChild
  );
}

Now when updating the state of a component the virtual DOM will automatically be updated.

At this point we can update the virtual DOM in response to events, let’s move on and update the real DOM.

Manipulate the DOM

We are done with the reconciliation algorithm, but we still need two other functions to have it run: updateDomProperties and instantiate. The first is responsible for adding and removing attributes of a DOM element to have it match the new desired state. The second one is needed to create fresh DOM elements.

Let’s starts with updating dom properties.

function updateDomProperties<P, Q>(
  dom: HTMLElement,
  newNode: VNode<P>,
  oldNode: VNode<Q>
) {
  const newProps = newNode.props as { [attr: string]: any };
  const oldProps = oldNode.props as { [attr: string]: any };

  // Remove no longer needed attributes and event listeners
  for (const attr in oldProps) {
    if (!(attr in newProps) || newProps[attr] !== oldProps[attr]) {
      if (isEventListener(attr)) {
        // For `onfoo` we need to remove the event `foo`
        dom.removeEventListener(attr.substring(2), oldProps[attr]);
      } else {
        dom.removeAttribute(attr);
      }
    }
  }
  // Add new attributes and event listeners
  for (const attr in newProps) {
    if (!(attr in oldProps) || newProps[attr] !== oldProps[attr]) {
      if (isEventListener(attr)) {
        dom.addEventListener(attr.substring(2), newProps[attr]);
      } else if (typeof newProps[attr] == "string") {
        dom.setAttribute(attr, newProps[attr]);
      }
    }
  }
}

This function has been curated by removing some special cases for simplicity purposes, as always you can have a look at the full working code here.

This function is rather simple, but we need to get it right: if we fail to remove an event listener, for instance a onclick attribute, but still add another one then both will be triggered when the user click the element.

We have to compose with the raw dom API here, and it does not treat all attributes as equal. For instance to set the onchange property (yes, it is not onChange, you’ve been fooled) we must first understand that it is an event listeners (because it starts with on) and then call addEventListener instead of setAttribute for most other properties.

And now let’s write instantiate:

function instantiate<P>(vNode: VNode<P>): HTMLElement {
  if (vNode.class) {
    // This is a µReact Component
    const component = new vNode.class(vNode.props);
    const child = component.render();
    vNode.component = component;
    vNode.props.children = [child];
    component._vNode = vNode;
    return instantiate(child);
  } else {
    // This is an HTML element
    const domElt = document.createElement(vNode.type);
    if (typeof vNode.props.children != "string") {
      vNode.props.children.forEach(child =>
        domElt.appendChild(instantiate(child))
      );
    }
    updateDomProperties(domElt, vNode, emptyVNode);
    vNode.domElt = domElt;
    return domElt;
  }
}

There are two cases:

The virtual node is a component, thus it will not have any corresponding DOM element, but instead it holds a pointer to a fresh instance of its component class. It is also time to render the component for the first time, then we recursively instantiate its children.
The virtual node correspond to an HTML element, then we create a new DOM element, update its properties to match the virtual element’s props and recursively instantiate its children.

And we are done, we can create and maintain real DOM element. The very last thing we need is the ability to attach our DOM somewhere in the web page, that’s the job of render:

export function render(vRoot: VNode, root: HTMLElement | null) {
  if (root) {
    diff(root, vRoot, emptyVNode);
  }
}

Let’s give it a try !

µReact.render(App, document.getElementById("app"));

And here is the result, a real application rendered by µReact:

(in case the iframe didn’t render for whatever reason, Here is the link to the app)

So satisfying, isn’t it? We just created our very own web framework 🎉

Some differences with React

Obviously there are some major differences between the real React and µReact. Beside the lack of major features such as functional components, hooks, asynchronous rendering or most lifecycle methods, I will discuss two differences that I believe are worth being aware of:

Keying
React architecture

Keying

When discussing the diffing algorithm we saw the issue of inserting new elements at the beginning or in the middle of the list of children, which is very inefficient:

<ul>
  <li>Python</li>
  <li>Go</li>
</ul>

// After inserting a new element
<ul>
  <li>Rust</li>   {/* old Python    */}
  <li>Python</li> {/* old Go        */}
  <li>Go</li>     {/* fresh element */}
</ul>

Inefficiency is a thing, but there is worse: there is actually a bug in our Todo list application. If you have at least two items in the todo list, click on the first (it then becomes yellow, its state contains clicked: true) and remove it (by clicking the red dot), then the second item takes the first place but becomes yellow, what happened?

<List>
  <Item text="Drink apple juice"/> {/* clicked:true  */}
  <Item text="Eat vegetables"/>    {/* clicked:false */}
</List>

// After deleting the first item
<List>
  <Item text="Eat vegetables"/>    {/* clicked:true  */}
</List>

When iterating through children the algorithm will first compare <Item text="Drink apple juice"/> and <Item text="Eat vegetables"/> and think that you just changed the props, so it keeps the same component with the state clicked:true, update its props and trigger a re render.

A solution is to use keys, those are special props used by the diffing algorithm to decide if it should reuse a component or an HTML element.

<List>
  <Item key="aaa" text="Drink apple juice"/> {/* clicked:true  */}
  <Item key="bbb" text="Eat vegetables"/>    {/* clicked:false */}
</List>

// After deleting the first item
<List>
  <Item key="bbb" text="Eat vegetables"/>    {/* clicked:false */}
</List>

Now because we used key, the diffing algorithm knows that it should reuse the second component instead of updating props of the first.

This is why React print a warning if you try to render a list without keys, this kind of performance issue or unexpected behavior may happen.

React architecture

There are a lot of react-like frameworks out there, for instance Preact and Inferno, they each have their own focus such as bundle size or raw speed and implement more or less of React’s features. There is, however, one big difference between most of these frameworks and React itself: the project architecture.

React is modular, and if the web is of course a privileged target it is not the only one: you may have heard of React Native for instance. We only considered the DOM when building µReact but the real React is composed of a base module simply called React, also sometime referenced as the reconciler, and a renderer, ReactDOM for the web.

When building an application with ReactNative, you are actually using the same code for the reconciler as you would when building a web app. The renderer, however, is different.

This is where things get very interesting: if one day you need to build your own UI engine, maybe for a new IoT device, embedded inside a game or whatever other platform, as long as you can embed a JavaScript engine you can use React with a custom renderer. This is very powerful because first developers already know how to create UI using React, and second you benefit from all the present and future features of React as they are added.

If you are interested in learning more about this, here is a talk on the subject.

Let’s (re)code React