1. Introduction
Red-Black (RB) trees are a balanced type of binary search tree. In this tutorial, we’ll study some of its most important applications.
2. Motivation for the Use of RB Trees
In a previous tutorial, we studied binary search tree basic operations on a dynamic set in time .
These operations are fast if the height of the tree is small, but if it is large, then their performance becomes comparable to that obtained with a linked list. The RB trees represent one of the possible balanced tree schemes, which allow performing the basic operations in time in the worst case, with the number of elements of the tree.
3. Properties of RB Trees
An RB tree is a binary search tree that contains, in addition to the key and pointers of a standard binary tree, also a binary field called color, which can be RED or BLACK. Through precise rules for coloring the nodes on any path, we obtain that no path in an RB tree is more than double than any other, resulting in an approximately balanced tree.
3.1. Coloring
Each child of a non-existent node is worth NULL and has the following properties:
- Each node is red or black
- Each NULL node is black
- If a node is red, then both children are black
- Each simple path from one node to another contains the same number of black nodes
3.2. Height
The number of black nodes from a node , not included, to a descendant is called the b-height of the node, . The -height of an RB tree is the -height of its root.
The interest of RB trees lies in the fact that, for a tree of nodes, the maximum height is and supposes an improvement to classical binary trees.
The INSERT and DELETE operations take time in RB trees. Since they modify the structure of the tree, it is possible that the resulting tree violates the properties we listed above. In this case, it is necessary to change the color of some nodes and modify the structure of the pointers, a mechanism known as rotation.
4. Applications
RB trees guarantee, in relation to other algorithms, optimal computational times for INSERT, DELETE and SEARCH operations. This fact allows their use in sensitive applications from the point of view of computation time, such as in real-time applications.
However, due to their characteristics, we can also use RB trees as fundamental building blocks in data structures underlying numerous applications.
4.1. AVL Trees
An AVL tree (Adelson-Velsky and Landis tree) was the first self-balancing binary search tree invented. In an AVL tree, the difference between the heights of two child subtrees is at most one. If this condition is not met, a rebalancing is required.
The AVL tree is another structure supporting complexity time for SEARCH, INSERT, and DELETE, both in the average and the worst cases. AVL trees can be colored red-black. Thus they are a subset of RB trees. The worst-case height is 0.720 times the worst-case height of RB trees, so AVL trees are more rigidly balanced.
4.2. Tango Trees
The original description of the tango tree, a type of binary search tree optimized for fast searches, specifically uses RB trees as part of its data structure.
4.3. Functional Programming
RB trees are used in functional programming to construct associative arrays.
In this application, RB trees work in conjunction with 2-4 trees, a self-balancing data structure where every node with children has either two, three, or four child nodes.
For every 2-4 tree, there are corresponding RB trees with data elements in the same order. It’s possible to demonstrate that INSERT and DELETE operations on 2-4 trees are equivalent to color-flipping and rotations in RB trees.
This result is generalized to demonstrate that RB trees can be isometric to 2-3 trees or 2-4 trees, a result due to Guibas and Sedgewick (1978).
4.4. Java
In addition to the previous paragraph, we report some notes specifically concerning the use of RB trees in the programming languages C ++ and Java:
- TreeSet and TreeMap in Java use RB trees for ordering and sorting
- A HashMap also uses RB trees instead of a LinkedList to store different elements with colliding hashcodes. This results in the improvement of the time complexity of searching such an element from to where is the number of elements with colliding hashcodes
4.5. Computational Geometry
The applications of RB trees in computational geometry are numerous. Here we’ll give two interesting examples:
- Multiset class-template in the Cgal library (Computational Geometry Algorithms’ Library) – inspired to the multiset class-template in the standard template library (STL)
- Sweep-Line Algorithm for the Inclusion Hierarchy among Circles – the inclusion hierarchy works in time in the worst-case. The algorithm due to Deok-Soo Kim, Byunghoon Lee, and Kokichi Sugihara use the sweep-line method and an RB tree for efficient computation
4.6. Linux Kernel
The Completely Fair Scheduler (CFS) is the name of a process scheduler available from the 2.6.23 release of the Linux kernel. It manages CPU with the aim of maximizing their average utilization. CFS represents tasks in a tree and finds out which task to run next.
CFS stores each task in an RB tree using its virtual run time (vruntime). The leftmost node in the tree will be one with the least vruntime. When CFS needs to pick the next task to run, it picks the leftmost node.
Another use of RB-trees in the Linux kernel that is worth mentioning concerns memory management. RB-trees keep track of the virtual memory segments for a process, where the start address of the range serves as the key.
4.7. Machine Learning
Potentially, RB trees have ample application space in machine learning and data mining, where they can increase the performance of conventional algorithms.
For example, they are used in the K-mean clustering algorithm for reducing time complexity.
4.8. Database Engines
Data indexing in database engines uses RB trees directly or indirectly.
For example, MySQL uses B+ trees, which can be seen as a type of B tree. An RB tree is similar in structure to a B tree of order 4.
B+ trees have the following characteristics:
- Non-leaf nodes do not store data. They store indexes only (redundancy) – this allows storing multiple indexes
- The leaf node contains all index fields
- The leaf node is connected with a pointer which improves performances
5. Conclusion
In this tutorial, we have made a brief review of some important applications of RB trees. It is worth noting their relationship with other techniques, which makes it possible to enhance the efficiency of sorting and search operations within many implementations.
It is particularly interesting, in our opinion, their use in the Linux kernel. The freely available part of the code relating to RB trees is instructive of how the algorithm can be used within a complex practical problem.