1. Introduction

In this tutorial, we’ll benchmark the Skip List (SL) and the Binary Search Tree (BST) data structures. These data structures are usually the default choices when some ordering of elements is needed. In addition to ordering elements, they also support fast addition, removal, and lookup.

2. BST Overview

BST is probably the most popular data structure when it comes to ordering elements. An ordinary BST performs badly as the tree can easily degrade into a linear structure. Performing most of the operations on a linked list-like structure takes linear time. We can find an overview of BST in Binary Search Trees and Binary Tree vs. Binary Search Tree.

Computer scientists invented a balanced BST to overcome that limitation. A balanced BST guarantees the logarithmic worst-case performance for all the operations. The most popular balanced BST variants are Red-Black Tree (RBT), Adelson-Velsky and Landis Tree (AVL), and a variant of B-Tree called 2-3 tree. We can find details of RBT and AVL comparison in Red-Black Tree vs. AVL Tree.

3. Skip List Overview

SL was proposed as an alternative to BST. Probabilistic Skip Lists (PSL) were the first variant of SL. PSL shows good practical performance comparable with balanced BST variants. But the drawback of the structure is in its probabilistic nature – the logarithmic performance is only expected. In the worst case, PSL may perform as badly as O(n) . A more detailed overview of PSL can be found in The Skip List Data Structure.

Deterministic Skip Lists (DSL) appeared to overcome the limitations of PSL related to worst-case performance. DSL restricts the structure of the nodes, similar to balanced BST variants. By doing so, it makes sure there are O(log(n)) levels. Additionally, we compare no more than a constant number of elements at each level during lookup. In that way, DSL guarantees the logarithmic worst-case performance. An excellent article, Deterministic Skip Lists, by Thomas Papadakis et al. has a detailed insight into DSL.

4. Efficiency Comparison

Let’s now dive into the details of the SL and balanced BST comparison. In this section, we assume there are n elements in the data structures.

4.1. The Complexity of Operations

The table below shows the expected and worst-case time complexities of addition, removal, and lookup operations for SL and balanced BST variants:

Time Complexities

Addition (expected)

Addition (worst-case)

Removal (expected)

Removal (worst-case)

Lookup (expected)

Lookup (worst-case)

Balanced BST

log(n)

log(n)

log(n)

log(n)

log(n)

log(n)

DSL

log(n)

log(n)

log(n)

log(n)

log(n)

log(n)

PSL

log(n)

O(n)

log(n)

O(n)

log(n)

O(n)

As we see, DSL is a kind of synergy between balanced BST and PSL: while guaranteeing the worst-case performance of the former, it has the behavior of the latter.

4.2. Space Usage

Similar to time complexities, balanced BST and DSL have the same space complexity. In contrast, PSL is only expected to have the complexity of its counterparts:

Space Complexities

Expected Space

Worst-case Space

Balanced BST

O(n)

O(n)

DSL

O(n)

O(n)

PSL

O(n)

O(n * log(n)), assuming PSL is restricted to have no more than log(n) levels

4.3. Efficiency in Multithreaded Environments

It turns out PSL is far better than DSL and balanced BST when it comes to multithreading. It is easier to implement PSL for concurrent accesses, specifically:

  • PSL is the best in multithreading environments. Indeed, PSL has only to lock the nodes affected by the addition/removal of a node. So, the remaining nodes can be accessed parallelly by other threads
  • DSL performs worse than PSL and better than balanced BST. DSL isn’t as flexible in its structure as PSL, so a bigger chunk of the structure usually needs to be locked
  • Balanced BST performs the worst. The same issue of locking a big chunk of the structure exists for balanced BST. Recently, new concurrency schemes have been developed for BST, which allow them to have a comparable performance to PSL in multithreading environments. But the schemes are complex to comprehend and implement

The article Concurrent Programming Without Locks by K. Fraser and T. Harris discusses different approaches to locking. In the end, the authors perform a benchmark of several SL and RBT implementations with respect to various locking approaches. The article confirms the supremacy of SL over RBT in concurrent environments.

5. Implementation Comparison

Among the three data structures, PSL is the easiest one to implement. The easiness is achieved by the absence of strict conditions on the nodes’ structure. As a consequence, we get simple logic for operations and straightforward code:

  • Lookup is straightforward to implement – it goes right while the looked-up element is greater than the values on a level and goes down to the level below otherwise until it finds the element or marks that the element is missing
  • Addition and removal are based on lookup. They find the place in the first level to add/remove the element, after which they update the links of the affected nodes on the path of the lookup

It’s subjective to tell which one – DSL or balanced BST – is easier to implement. Indeed, there are several DSL and balanced BST variants, each with its specifics and implementation difficulties. But for sure, all of them are harder to implement than PSL. Because all of them need to preserve balance in addition and removal of elements:

  • RBT and AVL variants of balanced BST need to perform rotations after addition and removal to keep the balance, while 2-3 tree needs to split or join nodes
  • 1-2 DSL has to add/remove levels of some nodes and link/delete the pointers to/from appropriate nodes
  • Lookup for balanced BST is straightforward, and for DSL, it’s similar to PSL

6. Practicality Comparison

In practice, we usually tend to use standard implementations of data structures and abstract data types based on them. It’s time-consuming and error-prone to implement them from scratch. So, it’s important to know which data structures a programming language ecosystem supports. It’s worth mentioning that SL isn’t as popular as balanced BST in that regard. There are several reasons for that:

  • SL is a younger data structure. It was developed in 1989. In contrast, BST is a more mature and researched data structure
  • PSL isn’t considered in languages/libraries having strict requirements on worst-case behavior. DSL is an alternative to balanced BST in this scenario, but there are more kinds of balanced BST and much more literature about them
  • Some languages/libraries had already implemented their data structures using a balanced BST when SL became known. It’s usually not recommended to change anything in the implementation/behavior of a component that is used by many applications worldwide

Examples of using balanced BST in programming languages:

  • TreeSet collection and TreeMap map in Java
  • std::set, std::multiset, std::map and std::multimap containers in C++

In contrast, concurrent sets are implemented using SL. Some variants of concurrent balanced BST have been recently introduced, but they lose to SL in performance in concurrent environments. Examples of using SL are the ConcurrentSkipListSet collection and ConcurrentSkipListMap map in Java.

7. Conclusion

In this article, we’ve compared SL and balanced BST using various criteria. The goal of our benchmark is to give a clear overview of the strengths and weaknesses of each data structure. With that knowledge, we’ll be able to make better practical choices for the discussed data structures.


« 上一篇: 并查集数据结构
» 下一篇: 循环缓冲区