1. Overview
In this short tutorial, we’ll learn about Scala’s Vector data structure and see how it’s different from other alternatives such as List and Array.
If we need to choose an immutable sequence data structure for our program, we can either choose an indexed sequence or a linear sequence. Here, we’re trying to find a general-purpose collection that performs well with most of the algorithms.
2. What Is Vector?
Vector is an immutable data structure in Scala. It was introduced in Scala version 2.8 to address the inefficiency of random access of other existing data structures. Vector* extends AbstractSeq and *IndexedSeq. Therefore, it supports constant-time element access and length computation. Vector is currently the default implementation of immutable IndexedSeq.
3. Vector Implementation in Scala
Vector is implemented as a base-32 integer trie. There are 0 to 32(2^5) nodes at the root level, and there can be another 32 nodes connecting to each node at the root level. Thus, the second level can accommodate a maximum of 32*32(2^10) elements. The third level will have 32*32*32(2^15) elements, so and so forth.
Thus, a Vector of 5 levels can hold up to 2^30 elements. Hence, accessing elements from such a collection of massive sizes takes traversing just 5 levels of nodes, making it an “effectively constant time” (eC) operation.
The vector elements exist together in memory as batches of 32, thus providing it a very high locality.
4. Performance Characteristics
A linear sequence collection will have a marginal improvement over an indexed sequence for head access and iteration algorithms. For everything else, an indexed sequence will perform much faster, especially in huge collections. Common operations on a collection can be grouped into four categories. Let’s take a look at these.
4.1. Prepend and Append
Vector is a persistent data structure using structural sharing. This technique makes the append/prepend operation almost as fast as any mutable data structure. When we add a new element by appending or prepending, it modifies the structure using structure sharing and returns the result as a new Vector.
Let’s see the actual time difference by running few experiments:
def appendPrependSeq(seq:Seq[Int], f: (Seq[Int], Int) => Seq[Int], it:Int): (Seq[Int], Double) ={
val begin = System.currentTimeMillis
var modifiedSeq = seq
for (j <- 0 until it) {
modifiedSeq = f(modifiedSeq, j)
}
val elapsedTime = System.currentTimeMillis - begin
return (modifiedSeq, elapsedTime)
}
val vec:Vector[Int] = Vector()
val lst:List[Int] = List()
val numElements = 10000
val appendTimeRatio = appendPrependSeq(lst, (a,b) => a:+b, numElements)._2/appendPrependSeq(vec, (a,b) => a:+b, numElements)._2
val prependTimeRatio = appendPrependSeq(vec, (a,b) => b+:a, numElements)._2/appendPrependSeq(lst, (a,b) => b+:a, numElements)._2
println("Append test with %s elements, Vector is ~ %s times faster than List".format(numElements, appendTimeRatio))
println("Prepend test with %s elements, List is ~ %s times faster than Vector".format(numElements, prependTimeRatio))
The test results clearly indicate that the prepend operation is comparable, but the append is significantly faster-using Vector:
Append test with 10000 elements, Vector is ~ 67.7 times faster than List
Prepend test with 10000 elements, List is ~ 3.0 times faster than Vector
4.2. Random Access and Random Updates
This is where the Vector shines really well compared to any other immutable collections. Thanks to its implementation using a trie data structure, accessing an element at any position requires traversing a few levels and branches in the tree.
Let’s run some experiments to see the actual performance benefits in action:
def randomAccessSeq(seq:Seq[Int], it:Int): (Double) ={
val begin = System.currentTimeMillis
for (j <- 0 until it) {
val idx = Random.nextInt(it)
val elem = seq(idx)
}
val elapsedTime = System.currentTimeMillis - begin
return (elapsedTime)
}
val numElements = 10000
val vec:Vector[Int] = (1 to numElements).toVector
val lst:List[Int] = (1 to numElements).toList
val randomAccessTimeRatio = randomAccessSeq(lst, numElements)/randomAccessSeq(vec, numElements)
println("Random access test with %s elements, Vector is ~ %s times faster than List".format(numElements, randomAccessTimeRatio))
Running the test with just 10,000 elements itself shows a significant difference in execution time:
Random access test with 10000 elements, Vector is ~ 36.0 times faster than List
4.3. Head/Tail Access
Another widespread operation is head/tail element access. As we can guess, accessing the first and last elements from an indexed tree data structure are both high-speed operations.
Let’s see another example to see how Vector performs in such scenarios:
def headTailAccessSeq(seq:Seq[Int], f: (Seq[Int]) => Int): (Int, Double) ={
val begin = System.currentTimeMillis
val headOrTail = f(seq)
val elapsedTime = System.currentTimeMillis - begin
return (headOrTail, elapsedTime)
}
val numElements = 1000000
val vec:Vector[Int] = (1 to numElements).toVector
val lst:List[Int] = (1 to numElements).toList
println(" Vector of %s elements took %s for head access".format(numElements, headTailAccessSeq(vec, (seq) => seq.head)._2 ))
println(" List of %s elements took %s for head access".format(numElements, headTailAccessSeq(lst, (seq) => seq.head)._2 ))
println(" Vector of %s elements took %s for tail access".format(numElements, headTailAccessSeq(vec, (seq) => seq.last)._2 ))
println(" List of %s elements took %s for tail access".format(numElements, headTailAccessSeq(lst, (seq) => seq.last)._2 ))
The output we get indicates an equal performance for head access and better performance in the case of tail access:
Vector of 1000000 elements took 0.0 for head access
List of 1000000 elements took 0.0 for head access
Vector of 1000000 elements took 0.0 for tail access
List of 1000000 elements took 15.0 for tail access
4.4. Iteration
Some algorithms require iterating through all elements in the entire collection. Although traversing all elements in a Vector is slightly more complicated than a linked list, the performance does not show any significant difference.
Once again, let’s compare the performance of Vector and List in an iterative algorithm:
def calculateSumSeq(seq:Seq[Int]): (Int, Double) ={
val begin = System.currentTimeMillis
var total = 0
for (elem <- seq) {
total = total + elem
}
val elapsedTime = System.currentTimeMillis - begin
return (total, elapsedTime)
}
val numElements = 10000000
val vec:Vector[Int] = (1 to numElements).toVector
val lst:List[Int] = (1 to numElements).toList
println(" Vector iteration of %s elements took %s milliseconds".format(numElements, calculateSumSeq(vec)._2))
println(" List iteration of %s elements took %s milliseconds".format(numElements, calculateSumSeq(lst)._2))
We’ll get an output indicating a similar iteration performance for both List and Vector:
Vector iteration of 10000000 elements took 63.0 milliseconds
List iteration of 10000000 elements took 53.0 milliseconds
5. Conclusion
In this tutorial, we’ve learned the Vector data structure in Scala, its implementation details, and performance characteristics in various situations. Vector performs much better in random access, especially in huge collections. It gives a very competent performance for all other common operations compared to any other immutable collection in Scala.
As usual, the full source code can be found over on GitHub.