1. Introduction
In this tutorial, we’ll showcase the immutable BitSet and its usage in Scala.
BitSet is a data structure that enables us to perform bit operations on many bits quickly and efficiently. For example, we use BitSets for 2D collision detection in video games and instead of boolean arrays in bloom filter implementations. Although BitSets are not commonly used, they are often used in libraries to enable compactness and speed.
2. Under the Hood
Internally, a BitSet stores its bits in a Long Array. In this way, the first 64 bits are stored in the first Long of the Array; then the next 64 bits are stored in the second Long of the Array, etc.:
Internally, the Long Array is memory efficient as long as the numbers are relatively small. Indeed, using Long, a 64-bit data type, the internal Array size is as small as possible. In contrast, if the internal Array stored integers, the array size would be multiplied by two since Int is a 32-bit data type. This is illustrated by the diagram above, where the second element is added to the Array when the number 64 is added in the BitSet. We refer to the elements of this internal array as words.
Another integral part of a BitSet is its bit mask. A bit mask refers to the Long numbers that hold the internal representation of the bits representing the underlying set.
Not unexpectedly, bit masks containing the Long number representation are easier to read than the binary representation.
3. Common Operations With BitSet
Now that we better understand BitSet and how it works, it’s time to demonstrate their use.
3.1. Initialization
First of all, let’s create an empty BitSet instance:
"create an empty instance" in {
assert(BitSet() === BitSet.empty)
}
Additionally, non-empty BitSets are initialized via the apply method, by using the SetOps trait operators, by using the incl function:
"create an non-empty instance when arguments are provided" in {
assert(BitSet(2) === BitSet() + 2)
assert(BitSet(2) === BitSet().incl(2))
}
3.2. Contains
To verify if an element is present in a BitSet, we use the method contains:
"BitSet.contains" should {
"return true for existing elements" in {
assert(BitSet(3, 4).contains(3))
}
"return false for non-existing elements" in {
assert(!BitSet(3, 4).contains(12))
}
}
To make our examples easier to understand, we first need to describe in simple terms what BitSet(3, 4) means. When we use the BitSet(x,y,z) or BitSet.apply(x,y,z), the bit at each specified index is set to 1 – so in our case of 3, 4, the 3rd and 4th bit are the only bits set to 1, and every other bit is set to 0.
As expected, the first assertion asserts that contains(3) returns true since 3 is included in the BitSet, while the second assertion asserts that contains(12) returns false because 12 is not in the set.
3.3. Include and Exclude
As the names imply, incl and excl functions include or exclude an element from the set.
Let’s write a test case to demonstrate incl and excl behavior:
"BitSet.excl" should {
"remove an element if present" in {
assert(BitSet(3, 4).excl(3) === BitSet(4))
assert(BitSet(4).excl(3) === BitSet(4))
}
}
"BitSet.incl" should {
"include an element if not present" in {
assert(BitSet(3).incl(4) === BitSet(3, 4))
assert(BitSet(3, 4).incl(3) === BitSet(3, 4))
}
}
3.4. Difference, Concatenate, and Union
The diff function returns a new BitSet representing the difference between two BitSets. Having said that, for two given BitSets, b1 and b2, the call to diff will return only elements that are present in b1 and not in b2:
"BitSet.diff" should {
"return the difference of two BitSet instances" in {
assert(BitSet(3, 4).diff(BitSet(2, 3)) == BitSet(4))
}
"return an empty BitSet for equal sets" in {
assert(BitSet(3, 4).diff(BitSet(3, 4)) == BitSet.empty)
}
}
On the other hand, the concat method returns a new BitSet, including every element from two BitSets:
"BitSet.concat" should {
"return a BitSet containing the elements of both sets" in {
assert(BitSet(3, 4).concat(BitSet(2, 3)) == BitSet(2, 3, 4))
}
}
Like the concat function, the union function also returns a new BitSet containing every element of two BitSets:
"BitSet.union" should {
"return a BitSet containing the elements of both sets" in {
assert(BitSet(3, 4).union(BitSet(2, 3)) == BitSet(2, 3, 4))
assert(
BitSet(3, 4).union(BitSet(2, 3)) == BitSet(3, 4).concat(BitSet(2, 3))
)
}
}
3.6. Intersect and Xor
In terms of set theory, the intersection of sets A and B is a set that contains elements that are present at A and B. As expected, the intersect function implements the functionality we just described:
"BitSet.intersect" should {
"return a BitSet containing only elements existing in both sets" in {
assert(BitSet(3, 4).intersect(BitSet(2, 3)) == BitSet(3))
}
}
On the contrary, elements that aren’t present in both sets are obtainable via the xor function:
"BitSet.xor" should {
"return a BitSet containing only elements existing in either set1 or set2" in {
assert(BitSet(3, 4).xor(BitSet(2, 3)) == BitSet(2, 4))
}
}
3.7. Bit Masks
As already described, a bit mask is a less verbose way to describe a BitSet. To obtain a bit mask, we use the toBitMask method:
"BitSet.toBitMask" should {
"return an array containing the words that are stored internally" in {
assert(BitSet(3, 4).toBitMask.sameElements(List(24)))
}
}
Admittedly, the above snippet isn’t self-explanatory, so let’s elaborate. A BitSet(3, 4) translates to an underlying binary representation of three zeros and two ones followed by 59 zeros. Also, the binary representation we just described equals the decimal 24 (2^3 + 2^4); hence, the toBitMask function returns an array with the number 24.
Conversely, the fromBitMask method yields a BitSet with an internal array of words containing the given argument:
"BitSet.fromBitMask" should {
"return a BitSet containing internally as words the given arguments" in {
assert(BitSet.fromBitMaskNoCopy(Array(25L)) == BitSet(0, 3, 4))
}
}
Again, the above test case succeeds since 2^0 + 2^3 + 2^4 equals 25.
4. Conclusion
In this article, we presented the BitSet data structure, its uses, and its applications. While most of the applications we develop won’t benefit from BitSets, it’s important to remember BitSets and how they work, especially when it comes to performing caching or other performance-critical operations.
As always, the code for this article is available over on GitHub.