1. Introduction
In this tutorial, we’ll take a look at a type of data structure called a B-tree and a variation of it – B+tree. We’ll learn about their features, how they’re created, and how they’re used effectively in Database Management Systems for creating and managing indexes.
Finally, we’ll use the knowledge we’ve gained to compare B-trees and B+trees.
2. What Is a B-tree?
B-trees are a type of self-balancing tree structure designed for storing huge amounts of data for fast query and retrieval. They can be often confused with their close relation – the Binary Search Tree. Although they’re both a type of m-way search tree, the Binary Search Tree is considered to be a special type of B-tree.
B-trees can have multiple key/value pairs in a node, sorted ascendingly by key order. We call this key/value pair a payload. Occasionally we may hear the value part of the payload being referred to as “satellite data” or just “data”.
In the context of databases, a key may be the primary key or indexed column of a row, and the value could be the actual row record itself or a reference to it.
In addition to the payload, nodes also keep references to their children. Since each node typically takes up a disk page, these child references usually refer to the page ID in which the child node lives in secondary storage. As any search tree, B-trees are organized so that the keys of any sub-tree must be larger than the keys of the sub-tree to its left.
2.1. Properties of a B-tree
For a tree to be classified as a B-tree, it must fulfill the following conditions:
The image below shows us an example of a B-tree. The maximum number of children in this tree is 3, so we can conclude that this is a tree of order 3. All the leaves are on the same level, and the root and internal nodes have a minimum of two children fulfilling all the conditions of a B-tree:
2.2. Building a B-tree
Now, let’s build our tree from scratch. To do this, we’ll learn how to insert items into the tree while keeping it balanced.
Since we’re starting with an empty tree, the first item we insert will become the root node of our tree. At this point, the root node has the key/value pair. The key is 1, but the value is depicted as a star to make it easier to represent, and to indicate it is a reference to a record.
Next, if we wanted to insert 3, for us to keep the tree balanced and fulfilling the conditions of a B-tree we need to perform what is called a split operation.
We can determine how to split the node by picking the middle key. When we choose the middle key we need to consider the keys already present in the node and also the key which is to be inserted. In this case, 2 is the middle key used to split the node. This means the values to the left of 2 will go to a left sub-tree and the values to the right of 2 will go to a right sub-tree and 2 itself will get promoted as part of a new root node:
By performing this splitting operation each time we are about to exceed the maximum number of keys in a node, we keep the tree self-balancing.
Next, we try to insert 7. However, since the rightmost tree is now at full capacity we know that we need to do another split operation and promote one of the keys.
But wait! The root node is also at full capacity, which means that it also needs to be split.
So, we end up doing this in two steps. First, we need to split the right nodes 5 and 6 so that 7 will be on the right, 5 will be on the left, and 6 will be promoted.