1. Introduction
Scala has a very powerful and extensive Collections API in the standard library. These APIs make it easy for users to apply methods to single collections or seamlessly combine and perform operations on multiple collections.
In this tutorial, we’ll explore two methods for combining multiple collections: zip() and lazyZip().
2. The zip() Method
We can use the zip() method to combine two collections into a single collection of tuples. It chooses elements from the corresponding index in each collection and creates a tuple of two elements. Let’s look at an example:
val list1 = List(1, 2, 3)
val list2 = List("a", "b", "c")
val zipped = list1.zip(list2)
zipped shouldBe List((1, "a"), (2, "b"), (3, "c"))
We can observe that applying the zip() operation to two lists generated a new list of tuples.
The zip() method is implemented in the IterableOps trait in the Scala Collections. As a result, it is available for any collections that implement the trait. However, the ordering of elements might not be reliable on non-indexed collections such as Set and Map.
When the zip() operation is used with collections of different sizes, the resulting collection will be equivalent to the smaller of the two input collections.
We can also chain zip operations on multiple collections, creating a nested tuple structure:
val res: List[((Int, Int), Int)] = list.zip(list).zip(list)
It generates a nested tuple by executing successive zip() operations on the initial result.
3. The lazyZip() Method
Scala provides an alternative form of the zip() method, lazyZip(). Unlike the zip() method, which performs eager evaluation, lazyZip() adopts lazy evaluation, postponing computation until the elements are accessed.
Let’s look at a sample code:
val list1 = List(1, 2, 3)
val list2 = List("a", "b", "c")
val zipped = list1.lazyZip(list2)
zipped.toList shouldBe List((1, "a"), (2, "b"), (3, "c"))
The variable zipped represents a lazy result. When we apply toList(), it converts the result into a List.
This lazy evaluation is beneficial when working with large or potentially infinite collections. Let’s consider a scenario where we use lazyZip() on infinite collections:
val infiniteNumbers: LazyList[Int] = LazyList.from(1)
val infiniteStrings: LazyList[String] = LazyList.iterate("a")(_ + "a")
val result = infiniteNumbers.lazyZip(infiniteStrings)
result.take(3).toList shouldBe List((1, "a"), (2, "aa"), (3, "aaa"))
Here, we have two infinite collections using LazyList. Even though we apply lazyZip() to the infinite collection, it doesn’t evaluate immediately. Zipping occurs only when we apply an on-demand operation.
Unlike the zip() method, lazyZip() automatically flattens chained operations up to four levels; beyond that, it creates a nested structure:
val list = List(1, 2, 3)
val level4Res = list.lazyZip(list).lazyZip(list).lazyZip(list).toList
level4Res shouldBe List((1, 1, 1, 1), (2, 2, 2, 2), (3, 3, 3, 3))
val level5Res =
list.lazyZip(list).lazyZip(list).lazyZip(list).lazyZip(list).toList
level5Res shouldBe List(
((1, 1, 1, 1), 1),
((2, 2, 2, 2), 2),
((3, 3, 3, 3), 3)
)
We can observe that the fifth chaining generates the result as a nested structure.
In Scala 2.13, the lazyZip() method was introduced to replace the zipped() method found in Scala 2.12 and earlier versions.
4. Simple Performance Comparison
In this section, we’ll explore a basic method to compare the time taken by zip() and lazyZip() operations:
def timed[T](f: => T): T = {
val startTime = System.nanoTime()
val res = f
val endTime = System.nanoTime()
println(s"Time taken for operation: ${(endTime - startTime) / 1000000} milliseconds")
res
}
@main
def main(): Unit = {
val largeList = (1 to 10000000).toList
println("--- zip ---")
timed(largeList.zip(largeList).take(100)) // eager evaluation
println("--- lazyZip without eval ---")
timed(largeList.lazyZip(largeList)) // lazy evaluation of lazyZip
println("--- lazyZip with partial eval ---")
timed(largeList.lazyZip(largeList).take(100).toList) // force partial evaluation of lazyZip
println("--- lazyZip with full eval ---")
timed(largeList.lazyZip(largeList).toList) // force full evaluation of lazyZip
}
Here, we perform different operations using zip() and lazyZip() methods and calculate the time taken for each:
We can observe that the partial evaluation took very little time despite the collection’s size.
4. Conclusion
In this article, we looked at zip() and lazyZip() methods on Scala Collections.
While zip() may perform better on smaller datasets because of its eager evaluation, lazyZip() is beneficial for managing very large collections, prioritizing memory efficiency, and deferred computation. Additionally, we observed that lazyZip() offers the added benefit of automatically flattening the result during chaining operations. Depending on the scenario, we should choose between zip() and lazyZip() accordingly to ensure optimal performance.
As always, the sample code used in this tutorial is available over on GitHub.