1. Introduction

In this tutorial, we’ll explain random variables.

2. Background

Let’s say that \Omega is the set of all possible outcomes of a random process we’re analyzing. We call Omega the sample space. For instance, when tossing a coin, there are two outcomes: head (H) and tail (T), so Omega = \{H, T\}. Similarly, when flipping a coin 4 times in a row, there are 2^4 outcomes in the sample space:

Sample space: 4 coin flips

An event is any subset of \Omega. For example:

Example of an event

If we define a probability P telling us how likely each event is, we get a probability space. More precisely, P maps the events defined over \Omega to [0, 1]. That’s where random variables come into play.

2.1. Random Variables

Usually, we’re interested in the numerical values the events represent or can be assigned. For example, if we toss a coin 100 times, we may be interested only in the number of heads and not the exact sequence of the Hs and Ts.

Intuitively speaking, random variables are numerical interpretations of events. Those numerical values aren’t arbitrary. They represent precisely those quantities we’re interested in.

So, a random variable \boldsymbol{X} maps events to numbers \mathcal{B}. Using the events’ probability P, we derive the probability P_X with which X takes values from \mathcal{B}.

For example, if our coin is fair, each outcome in \Omega is equally likely. If we define X as the number of heads in four flips, we get this P_X:

    [\begin{pmatrix} 0 & 1 & 2 & 3 & 4 \\ \frac{1}{16} & \frac{1}{4} & \frac{3}{8} & \frac{1}{4} & \frac{1}{16} \end{pmatrix}]

There are two main types of random variables: discrete and continuous.

3. Discrete Variables

We say that \boldsymbol{X} is discrete if the set of values it can take with a non-zero probability is countable.

For instance, if \mathcal{B} is finite, X is a discrete random variable. But, a variable can take infinitely many values and still be discrete.

3.1. Countability

Let’s say we’re flipping a coin until we get two heads in a row. We can get HH in the first two flips, but there may be a sequence of 100 Ts before we get the first H. In fact, we may never get two Hs one after another since there’s always a non-zero chance to get a T after an H.

So, if our random variable X represents the number of tosses until getting two Hs in a row, the set of its values \mathcal{B} will be infinite:

    [2, 3, 4, \ldots]

However, it’s still countable! That means we can arrange it as an array. Since each value is possible with a non-zero probability, we say that the X is discrete.

3.2. The Probability Mass Function

Mathematically speaking, the probability P_X is defined over subsets of \mathcal{B} just as the probability P is defined over subsets of \Omega. The function mapping individual values \boldsymbol{x \in \mathcal{B}} of \boldsymbol{X} to their probabilities is known as the probability mass function (PMF) \boldsymbol{p_X}.

The distinction is technical for the most part since we can define one using the other. Here’s how we get PMF from P_X:

    [p_X(x) = P_X(\{ x\}) \quad  (\forall x \in \mathcal{B})]

and vice versa:

    [P_X(E) = \sum_{x \in E}p_X(x) \quad (\forall E \subseteq \mathcal{B})\\]

3.3. The Cumulative Distribution Function

Let x be any value X can take. The cumulative distribution function (CDF) of X is defined as:

    [\mathrm{CDF}_X(x) = P_X(X \leq x)]

For discrete variables, we calculate the CDF by summing individual probabilities:

    [\mathrm{CDF}_X(x) = \sum_{z \in \mathcal{B} \mid z \leq x} p_X(z)]

In our example with 4 tosses and with X denoting the number of heads, \mathrm{CDF}_X(x) shows us the probability to get x or fewer heads:

CDF - 4 flips

As we see, plotting F_X against sorted \mathcal{B} reveals a non-decreasing staircase function. The probability that X gets a value between a and b is the corresponding area under the CDF.

We calculate it as follows:

    [P_X(a\leq X \leq b) = F_X(b) - F_X(a)]

3.4. Examples

We differentiate between various types of variables depending on the shapes of their CDFs.

For instance, a uniform discrete variable X assigns equal probabilities to each value in \mathcal{B}:

Uniform discrete

If there are only two values X can take, which we usually denote as 0 and 1, we have a Bernoulli random variable:

Bernoulli variable

4. Continuous Variables

Let X model the time (in minutes) we spend waiting for an order in a restaurant. Let’s also say that the restaurant guarantees the waiting time is 15 minutes at most. So, X=[0, 15]. In what ways is this X different from the count of Hs discussed above?

First, there are uncountably many different values it can take: 10, 11, 10.5, 10.55, 10.555 minutes, and so on. But that’s not the most important difference.

The probability \boldsymbol{P_X} is spread over the uncountably many values in the range [0, 15]. Since all those values are possible, we need to allocate some probability to each one. However, because there are infinitely many of them, and the total probability is finite (=1), the allocated amounts get so small that they’re practically zero (=\frac{1}{\infty}). So, if we single out an individual value x \in [0, 15], the probability of its realization P_X(x) is zero.

That’s the definition of continuous variables. A random variable \boldsymbol{X} is continuous if \boldsymbol{P_X(x)=0} for every value \boldsymbol{x} it can take.

4.1. Continuous CDF

The CDF of a continuous random variable is continuous everywhere:

Continuous CDF

The jumps and the staircase shape of a discrete variable’s CDF happen at the points at which P_X(x) > 0. Since P_X(x)=0 for each x \in \mathcal{B} if X is continuous, there can be no jumps in the CDF plot. By definition, that means the corresponding CDF is continuous.

4.2. Probability Density Function

If \mathrm{CDF}_X has a derivative f_X, it holds that:

    [\mathrm{CDF}_X(x) = \int_{-\infty}^{x}f_X(u)du]

We call such a function f_X the probability density function (PDF) of X. It’s zero outside the variable’s support, and the integral over it must be equal to 1:

    [\int_{-\infty}{\infty}f_X(u)du = 1]

Otherwise, it isn’t a proper density since we’ll get a total probability greater or lower than 100%, which doesn’t make sense.

The PDF of a continuous variable is analogous to the PMF of a discrete one. Both functions take the variable’s individual values as arguments. However, while PMF reveals their probabilities, the PDF shows us only how likely the values are one versus the other.

4.3. Examples

A continuous uniform variable’s PDF is constant over the range it’s defined over:

    [f_X(u) = \frac{1}{b-a} \quad \text{ if } \mathcal{B} = [a, b]]

So the corresponding CDF is:

    [\mathrm{CDF}_X(x) = \frac{x - a}{b - a}]

Another example is the class of exponential variables. Their densities drop exponentially, so small values are more likely than larger ones. The rate at which the density decreases is controlled by a parameter we usually denote as \lambda:

Various exponential densities

5. Discrete vs. Continuous Variables

Here’s a summary of the differences between discrete and continuous random variables:

Discrete

Continuous

Countably many values with positive probabilities

No values with positive probabilities

Non-continuous step CDF

Continuous CDF

Usually denote counts

Usually represent measurements

6. Determinism and Randomness

In a non-probabilistic context, a math variable holds an unknown but fixed value. So, chance plays no part in calculating it.

In programming, we can update a variable:

    [x \leftarrow x + 1]

But at every point during our code’s execution, x always holds one and only one value. So, each time we use a deterministic x, it evaluates to a single value. Unless we update it, that value stays the same.

In contrast, a random variable models a random process or an event. It doesn’t hold values but samples them according to the underlying probability. Each time we “use” a random variable, it can generate a different value due to randomness in the process or phenomenon.

6.1. The Nature of Randomness

There are two main interpretations of randomness.

In the frequentist school of thought, randomness is a property of physical reality. From this viewpoint, some natural (and even human-driven) processes are governed by inherently random laws. Those laws determine the long-term frequencies of the processes’ possible outcomes through probability functions. In other words, a random law doesn’t define the outcomes but the chances they’ll materialize. The laws’ true analytical forms are unknown, and the goal of statistics and science is to uncover or approximate them.

In the subjectivist (or Bayesian) tradition, probabilities quantify and represent our uncertainty about the world. They aren’t laws of nature or human society but mathematical tools we use to formalize our belief states. Hence, probabilities don’t exist independently from us and aren’t unique. Each conscious being can develop its own beliefs about a process or an event and express them using a functional form different from those others choose. Therefore, randomness originates from our inability to understand the world completely and aligns with the limitations of our knowledge.

7. Mixed and Multivariate Variables

Apart from continuous and discrete variables, there are also mixed ones. A mixed variable’s CDF consists of the step-like and continuous parts:

    [\mathrm{CDF}_X(x) = \int_{-\infty}^{x}f_X(u)du + \sum_{z_i \leq x}p_X(z_i)]

where the z_i are those values at which P_X is positive.

All the variables we discussed were univariate (one-dimensional). However, a random variable can have more than one dimension. In that case, we call it multivariate. We consider each dimension a univariate variable, so a multivariate variable denotes an array of one-dimensional ones.

8. Conclusion

In this tutorial, we explained random variables. We use them to quantify our belief states or the outcomes of random processes and events.