1. Introduction

In this tutorial, we’ll study and understand the XOR swap of two variables.

2. XOR Operation

XOR is a bit operation. It accepts two operands and compares their values. XOR evaluates to 0 (or False) if the operands are the same. If they differ, it outputs outputs 1 (or True). Here’s its truth table:

x_1

x_2

\mathrm{XOR}(x_1, x_2)

1

1

0

1

0

1

0

1

1

0

0

0

Usually, we denote XOR with \oplus.

2.1. Properties

XOR is commutative:

    [a \oplus b = b \oplus a]

It’s also associative:

    [a \oplus (b \oplus c) = b \oplus (a \oplus c) = c \oplus (b \oplus a)]

Its neutral element is zero:

    [a \oplus 0 = a]

 XORing any \boldsymbol{a} with itself, we get \boldsymbol{0} as the result:

    [a \oplus a = 0]

So, each element is its own inverse with respect to XOR.

2.2. XOR and Numbers

We can define XOR for numbers by applying the bitwise XOR to the pairs of corresponding bits. Let’s take a=17 and b=12 as an example. In the binary notation, we have a=10001 and b=01100. Now:

    [a \oplus b = 10001 \oplus 01100 = 11101]

We get that the result of the XOR operation on a and b is 29 (binary 11101).

2.3. Why Do We Use XOR for Swapping?

In the old days of single-core CPU systems, memory access was a costly operation and CPU registers were very precious resources. So, we used XOR swapping to avoid costly memory dereferences and loading CPU register values with the stack as in the ordinary swap with a temporary variable.

3. XOR Swap Pseudocode

The most common use of XOR is to swap two variables without any auxiliary storage:

algorithm SwapUsingXOR(a, b):
    // INPUT
    //   a, b = Two variables to be swapped
    // OUTPUT
    //   a and b swap their values using XOR operation

    // Swap using the XOR bit operation
    a <- a XOR b
    b <- b XOR a
    a <- a XOR b

The following figure shows this operation as a circuit gate operation:

XOR Swap

As evident from the circuit diagram above, we feed inputs a and b to the first XOR gate in the middle. Then, we feed the output of the first XOR gate and input a to the second XOR gate on the top. This gives b as the output. Lastly, we pass the output of the first XOR gate and input b to the third XOR gate (the one at the bottom). This gives us a as the result. So, we find the values are swapped.

3.1. Explanation

Now, we are ready for a detailed explanation of three swapping steps from the pseudocode.

In the first instruction, we apply XOR on a and b and store the result back in a. Let:

    [  tmp_{1} =a \oplus b]

Now, in the next step, we do the XOR on b and the result of the first step (tmp_{1}). Let the result be tmp_{2}:

    [  tmp_{2} = b \oplus tmp_{1} = b \oplus (a \oplus b)]

Now, using the, associativity, inverse, and neutral-element properties, we get:

    [  tmp_{2} = a \oplus  (b \oplus b)= a \oplus 0 = a]

Thus, at the end of step 2, tmp_{2} is original value of a. Since we assigned it to b, we get that b is the original a.

Going further, we move to step 3. Here, we apply XOR to the result of steps 1 and 2 (tmp_{1} and tmp_{2}):

    [  tmp_{3} = tmp_{1} \oplus tmp_{2}= a \oplus (b \oplus a)]

Now, using the properties of XOR, we get:

    [ tmp_{3} = (a \oplus a) \oplus b= 0 \oplus b = b]

Hence, at the end we have a=tmp_{3}=b. So the swap went down successfully.

4. Problems

In the present days of massive pipelines and quad-core CPU systems, XOR swapping can slow down the whole system. Why? Because XOR swapping holds intermediate values and has the massive dependency of one step on another.

This is because the system can execute each step on different CPU cores. Thus, the system can store the result of each step in different CPU registers. However, this dependency on reading from a CPU register and writing to different CPU register could cause the whole pipeline to stall. As a result, a stalled pipeline puts the code on a slower path.

Almost all modern-day compilers perform swaps using temporary storage in a single instruction, so XOR swaps don’t benefit from compiler-time optimization.

5. Conclusion

In this article, we have gone through in detail how we swap two variables without using any auxiliary storage variable. We find XOR swap useful in the old days of single-core CPU systems but it slows down the execution in multicore CPU systems.