Relaxed Radix Balanced Trees

I’m adding immutable vectors to my language, Ivan, and needed to pick a suitable data structure to implement them. Clojure uses Persistent Vectors (PVs) which support lookups, updates, and appends all in ‘effectively’ constant time. However, it doesn’t have efficient insert or merge operations. Relaxed Radix Balanced (RRB) Trees, introduced by Bagwell and Rompf in 2011, address this shortcoming.

I had to read a few different papers to properly understand how they worked so I thought I’d write an introduction in the hope of making it a tad easier for others. I’ll try to convey how they work and some intuition for why, while leaving the deeper mathematical treatment to the academic papers.

I will assume you’re already familiar with how Persistent Vectors work.

Merging Persistent Vectors

Let’s suppose we merge two Persistent Vectors, giving us the following result:

Since we rely on a fixed node size ( $M$ ) for radix search to work, we must ensure that every node (except the right edge) has $M$ children. A tree with this property is said to be leftwise dense. In this example, since the last child of the left vector only has 2 elements we must move 2 elements over from the right vector. Doing so imbalances the first child node of the right vector, so we must repeat the process with its subsequent children until all nodes are balanced. This process involves creating many new replacement nodes (shown in orange). The work that needs to be done is proportional to the size of the right vector.

Relax

We need to relax the leftwise dense constraint, but doing so will introduce two problems:

Radix search relies on each node having the same fixed branching factor.
The height of the tree, which is what lets us claim ‘effectively’ constant operations, is bounded by the fixed branching factor.

The first problem can be overcome by recording the actual size of nodes somewhere. We’ll add a size table to each branch – an array with a length equal to the number of children contained in that node. Each entry in the size table records the cumulative number of elements contained in the corresponding child node, which we’ll refer to as a slot.

In this example, the root node has two children and we can see from the associated size table that the first child contains 14 elements, while the second contains 8 ( $22 - 14$ ). Since we can now derive the range of elements contained within each slot, finding an element is fairly straightforward. Suppose we wanted to retrieve the element at index position $16$ :

Starting at the root, the element we are searching for will be contained within the first slot that has a corresponding entry in the size table greater than the index we are looking up. In this case, we skip the first slot (since $16 > 14$ ) and find our element will be in the second slot ( $16 < 22$ ).
Before we repeat the process and search for the element within that slot, we need to adjust our index to account for all the elements in preceding nodes that we are now disregarding. We do that by simply subtracting the previous entry in the size table ( $14$ ) from our index ( $16$ ) which gives us a new index of $2$ .
We step through the slots to find our element and this time it’s contained within the first slot ( $2 < 3$ ). Once again, we adjust our index before recursing – this time subtracting nothing because there is no previous entry in the size table.
We’ve now reached a leaf node which doesn’t have a size table since we can rely on normal array indexing instead. Our index is currently $2$ and we can see the 3^rd item is the expected value of $16$ .

Doing a linear scan through the size table at each level of the tree is a lot more work than the handful of bit operations performed when radix indexing. However, we can combine the two approaches. Consider that if we’re allowing nodes to contain fewer than $M$ slots then any elements that no longer fit in the current node will have to be placed in subsequent slots. Therefore we can use radix indexing on the size table to skip any slots that are guaranteed not to contain our element and then, if necessary, step forward until we find the right slot. We’ll soon introduce invariants that limit the number of additional slots we have to step to, on average, only 2 or 3.

Here’s how we might implement relaxed radix search in TypeScript:

const M = 32 // power of 2
const BIT_WIDTH = Math.log2(M) // 5

/**
 * Find the element at position `idx`, or null if the index is out of bounds.
 */
export const get = <T>(rrb: Rrb<T>, idx: number): T | null => {
  if (idx >= rrb.count) return null
  return findElement(rrb.root, idx)
}

const findElement = <T>(node: Node<T>, idx: number): T => {
  if (isBranch(node)) {
    // find the slot containing our element
    const slot = findSlot(idx, node.height, node.sizes)
    // find the number of elements in the preceding slots
    const prevSize = slot === 0 ? 0 : node.sizes[slot - 1]
    // calculate the index within our slot
    const nextIdx = idx - prevSize
    // recurse
    return findElement(node.items[slot], nextIdx)
  } else {
    // fallback to array indexing for leaf nodes
    return node.items[idx]
  }
}

const findSlot = (idx: number, height: number, sizes: number[]): number => {
  // find starting slot by radix indexing
  let slot = idx >> (BIT_WIDTH * height)
  // skip slots until we reach the first with a cumulative size greater than
  // our index - this is where our element will be
  while (sizes[slot] <= idx) slot++
  return slot
}

Another simple optimisation that we could do is to omit the size table on nodes that are leftwise dense. In other words, we start with Persistent Vectors and only add size tables, incurring their costs, to the subset of nodes that are affected when merging.

M..M-1 Invariant

To address the second problem, that the tree height is no longer bounded, we’ll reintroduce an invariant similar to the original Persistent Vector requirement that all nodes, except the rightmost edge, must contain $M$ slots. To permit a variable slot size, we’ll require nodes to have between $M$ and $M -1$ slots. With our example branching factor of 4, nodes may now have 3 or 4 slots. In our implementation where we use a branching factor of 32, nodes may have 31 or 32 slots.

Exactly how this impacts the height of the tree wasn’t immediately obvious to me, so let’s break it down. With a branching factor of 32 the maximum addressable space is 4,294,967,295. In the best case, every node has 32 children, so the height is given by $log_{32}(4,294,967,295)$ which is 6.39 so we’ll have 7 levels. In the worst case, we have $log_{31}(4,294,967,295)$ which is 6.45… also only 7 levels!

Not only does this constrain the height of the tree but it also constrains the number of possible extra search steps we have to perform during lookup. Higher levels of the tree may require more steps than lower levels since the offsetting compounds. We can calculate the total possible extra steps at the top of a tree of height $h$ by taking the difference between the maximum possible number of leaf elements, $M^h$ , and the minimum possible number of leaf elements, $(M-1)^h$ , and dividing it by the number of slots under that level, $M^{h-1}$ :

\frac{M^h - (M-1)^h}{M^{h-1}}

To make that concrete, the worst case for a tree with $5$ levels and a branching factor of $32$ is an additional $4.69$ steps. However, assuming a random distribution of node sizes between $M-1$ and $M$ , then on average we only have to perform $2.49$ additional steps which adds very little overhead. Especially since the relevant entries in the size table are small enough to fit in the CPU cache.

With the introduction of size tables and the $M..M-1$ invariant we now have a data structure permitting some flexibility in the number of slots at each node while maintaining efficient lookup. Let’s revisit our earlier example of merging two vectors to see how many replacement nodes we now need to create to ensure a balanced tree.

The second leaf node has only 2 slots yet we must ensure each node has at least 3. The most efficient way to redistribute the slots is either to move the last slot of the first node, or the first slot of the third node, into the second node. In both cases we create 2 new replacement nodes rather than the 3 necessitated when using Persistent Vectors. This may not seem like much, since the trees in this example are shallow and we’re using a branching factor of 4 but we no longer need to shuffle every element in the right vector to the left which means we will be able to leave entire subtrees untouched when merging.

Search Step Invariant

However, we can do even better. You may have noticed that we merged 4 slots across 2 nodes into a single node with a total of… 4 slots. Had we simply merged both trees without rebalancing we would still have the same number of additional search steps for each index (at most 1). Could we use a different invariant that permits us to do that? What if rather than a lower bound on the branching factor we instead placed an invariant on the number of additional search steps permitted?

We know that the optimal number of slots to contain $P$ nodes is $\lceil P / M \rceil$ because we can fully fill up the first $P/M$ slots and then use one additional slot to put the remainder in. Let’s say that we allow some constant, $E$ , of additional extra search steps then the total number of slots $S$ we may use to contain $P$ nodes is given by the inequality $S \le \lceil P / M \rceil + E$ . We’ll call this the Search Step Invariant. Bagwell & Rompf claim that with $M = 32$ , using $E = 2$ provides a good balance between a small impact on the indexing time and a low concatenation cost. Although we’re only using $M = 4$ in these examples we’ll use $E = 2$ too.

Applying this new invariant to our example, we see that no rebalancing is necessary since $4 \le \lceil 14 / 4 \rceil + 2$ , or $4 \le 6$ . In this simple example, merging is very efficient since we only need to recreate the parent node to contain the slots from both input trees. No children nodes need replacing since no rebalancing is required. We also know that lookup is efficient since there are, at most, only $E =2$ additional search steps required at this level.

Now lets look at another example so we can see how the rebalancing algorithm works in detail. Then we’ll finally cover how to merge RRB trees.

When rebalancing, we consider all $S$ slots together. Since each of the 2 nodes we are merging have at most $M$ slots, $S \le 2M$ . To make things easier and reduce the amount of copying done, we first calculate a plan of how many items each slot should contain in the following way:

Check if our invariant, $S \le \lceil P / M \rceil + E$ , is met. If not, proceed.
Skip any slots with at least $M-{E/2}$ items as these don’t need redistributing. Once we reach the first slot with fewer than $M-{E/2}$ items, there will always be enough space in the subsequent slots for us to distribute the items over. (I don’t have a good explanation for why this is the case. The papers include a simple proof if you’re curious.)
Remove the slot that needs redistributing and add as many of its items as possible to the next slot, then as many of the remainder to the next one, and so on.
Repeat.

Lets apply that to the example step by step:

First, we check the invariant $7 \le \lceil 16 / 4 \rceil + 2$ , or $7 \le 6$ , which does not hold - so we need to redistribute some items to reduce the slot count.
We skip slots with 3 items or more ( $M - {E/2}$ ), which means we skip the first slot (contains 3) but need to distribute the second slot (contains 2).
Slots can contain at most $M$ items so we add one item to the third slot and carry the remaining item over to the fourth slot.
This has reduced our total slot count by 1 so we check the invariant to see if we need to continue. If we do, we find the next slot with fewer than $M-E/2$ items and redistribute it; however, since $6 \le 6$ we are done.

We apply our plan to the original set of nodes, producing a new set conforming to our invariant. Nodes that already contain the desired number of items can be reused so we only need to copy items when creating slots that have been redistributed over. Bagwell & Rompf claim this new invariant reduces the concatenation cost by a factor of 3.

Despite the somewhat lengthy explanation, the implementation is fairly simple:

/**
 * Generate a plan of how the items in `node` should be
 * distributed that conforms to the search step invariant.
 */
const createConcatPlan = <T>(node: Branch<T>): number[] => {
  // our initial plan is the current distribution of items
  const plan = node.items.map(item => item.items.length)
  // count the total number of items
  const s = plan.reduce((a, b) => a + b, 0)
  // calculate the optimal number of slots necessary
  const opt = Math.ceil(s / M)

  let i = 0
  let n = plan.length
  // check if our invariant is met
  while (n > opt + E_MAX) {
    // skip slots that don't need redistributing
    while (plan[i] >= M - E_MAX / 2) i++

    // current slot needs distributing over its subsequent siblings

    // track remaining items to distribute, which starts as all
    // the items from the current slot we're going to distribute
    let r = plan[i]
    while (r > 0) {
      // replace the items in the current slot with all the items from the next
      // slot, plus as many remaining items we have to distribute as possible
      plan[i] = Math.min(r + plan[i + 1], M)
      // calculate the items remaining
      r = r + plan[i + 1] - plan[i]
      i += 1
    }

    // slots that were distributed over were shuffled one slot to the left so
    // we need to do the same for any remaining slots
    for (let j = i; j < n - 1; j++) {
      plan[j] = plan[j + 1]
    }

    // account for shuffling slots to the left
    i--
    n--
  }

  return plan.slice(0, n)
}

Merging

Finally, let’s merge some trees!

To reduce clutter, I won’t include the size tables in the subsequent diagrams but in the example implementation we create them as necessary when new nodes are created.

We start by walking down the left tree until we reach the slot pointing to the rightmost leaf node, and the right tree until we reach the slot pointing to the leftmost leaf node. We take these two slots and create a new node from them, thus leaving us with 3 nodes: the original left node minus its last slot, the newly created ‘middle’ node, and the original right node minus its first slot.

We consider the slots of all 3 nodes together and rebalance as necessary, placing the result in a new parent node that will be carried up to the next level and merged in a recursive fashion. Presently, the search step invariant is not met so we redistribute the second slot (which is the first with fewer than $M - E/2$ items) over its subsequent siblings, which is sufficient to rebalance the slots. We then create a parent node along with sufficient children to optimally contain the rebalanced slots. All newly created nodes are indicated in orange so you can see that the merging process is able to reuse a lot of the existing nodes, reducing the amount of work that needs to be done.

Since the parent node we created is 1 level higher than the nodes we merged, we move up a level an merge it with the other nodes at that level. Much like before, we drop the rightmost slot from the left node, and the rightmost slot from the left node, since those subtrees are already contained in our ‘middle’ node and then consider the slots of all 3 nodes together for rebalancing. This time we have 12 items distributed over 4 slots so the invariant is already met and all we need to do is create a new node to hold them.

This time the new parent node has no siblings to merge with since we’ve reached the top of the tree. As the parent node only contains a single slot the rebalancing process has introduced a redundant level which we simply chop off.

And that’s it! We’ve successfully merged both trees together, ensuring the search step invariant is maintained, while reusing most of the existing nodes.

The implementation is fairly lengthy but there’s nothing especially complex going on:

/**
 * Concatenate two RRB trees into a single balanced RRB tree.
 */
export const concat = <T>(left: Rrb<T>, right: Rrb<T>): Rrb<T> => {
  // create a single, balanced node containing all items from left and right
  const merged = concatNodes(left.root, right.root)
  return {
    count: left.count + right.count,
    root:
      // there may be a redundant extra level so we chop it off if necessary
      merged.items.length === 1 ? merged.items[0] : merged,
  }
}

/**
 * Concatenate two nodes into a single balanced branch.
 *
 * Since we always return a branch, but there may be M or fewer items, the
 * branch may be redundant and can be unwrapped by the caller.
 */
const concatNodes = <T>(
  left: Node<T>,
  right: Node<T>,
  top: boolean = true
): Branch<T> => {
  // first, we handle trees of different heights

  if (left.height > right.height) {
    assert(isBranch(left))
    const middle = concatNodes(array.last(left.items), right, false)
    return rebalance(left, middle, null, top)
  }

  if (left.height < right.height) {
    assert(isBranch(right))
    const middle = concatNodes(left, array.first(right.items), false)
    return rebalance(null, middle, right, top)
  }

  // then, we handle leaf nodes

  if (isLeaf(left) && isLeaf(right)) {
    const total = left.items.length + right.items.length
    if (top && total <= M) {
      return Branch(1, [Leaf([...left.items, ...right.items])])
    } else {
      // this may not be balanced but the outer
      // recursive step will rebalance it later
      return Branch(1, [left, right])
    }
  }

  // finally, we handle branches of equal height

  if (isBranch(left) && isBranch(right)) {
    const middle = concatNodes(
      array.last(left.items),
      array.first(right.items),
      false
    )
    return rebalance(left, middle, right, top)
  }

  throw Error("unreachable")
}

const calcSizes = <T>(items: Node<T>[]): number[] => {
  let prev = 0
  return items.map(item => (prev += sizeOf(item)))
}

/**
 * Create a single, balanced branch containing
 * all items from the input branches.
 */
const rebalance = <T>(
  left: Branch<T> | null,
  middle: Branch<T>,
  right: Branch<T> | null,
  top: boolean
): Branch<T> => {
  // merge into a single, unbalanced node that may contain up to 2M items
  const merged = Branch(middle.height, [
    ...(left ? array.init(left.items) : []),
    ...middle.items,
    ...(right ? array.tail(right.items) : []),
  ])
  // create a plan of how the items should be balanced
  const plan = createConcatPlan(merged)
  // create a single, balanced node that may contain up to 2M items
  const balanced = executeConcatPlan(merged, plan)

  if (plan.length <= M) {
    return top ? balanced : Branch(balanced.height + 1, [balanced])
  } else {
    // distribute the (up to 2M) items across 2 nodes in a new branch
    const left = Branch(balanced.height, balanced.items.slice(0, M))
    const right = Branch(balanced.height, balanced.items.slice(M))
    return Branch(balanced.height + 1, [left, right])
  }
}

/*
 * Distribute the items in `node` according to `plan`. Both the input and
 * output may have more than M items which will be handled in `rebalance`.
 */
const executeConcatPlan = <T>(node: Branch<T>, plan: number[]): Branch<T> => {
  const items: Node<T>[] = []

  let i = 0
  let offset = 0
  plan.forEach(target => {
    if (offset === 0 && node.items[i].items.length === target) {
      items.push(node.items[i])
      i += 1
    } else {
      const current: Node<T>[] | T[] = []
      while (current.length < target) {
        const required = target - current.length
        const size = node.items[i].items.length
        const available = size - offset
        const min = Math.min(required, available)
        current.push(
          ...(node.items[i].items.slice(offset, min + offset) as any)
        )
        if (min === available) {
          offset = 0
          i += 1
        } else {
          offset += min
        }
      }
      items.push(Node(node.height - 1, current))
    }
  })

  return Branch(node.height, items)
}

The example implementation can be found in full at https://github.com/peterhorne/rrb-tree.