1. Overview
In this tutorial, we’ll study how to draw charts and plots in LaTeX documents.
We’ll first start by discussing the usage of LaTeX as a tool for visualization in computer science.
Then, we’ll see a guided example that builds a plot for the comparison of non-linear activation functions. We’ll also learn how to draw box plots and bar charts for comparing multiple distributions.
At the end of this tutorial, we’ll know how to draw basic charts in LaTeX.
2. Drawings in LaTeX
2.1. LaTeX, Charts, and Computer Science
LaTeX is a powerful programming and markup language for the creation of customizable documents. It’s commonly used in the literature on machine learning because it facilitates the drafting of papers that describe datasets, the architecture of models, and their algorithmic optimization.
It also facilitates the creation of plots and charts and their full customization. This is done through the usage of dedicated packages, as we’ll see in the next section since the original LaTeX implementation is somehow deficient in this aspect.
The drawing of charts is a typical task in computer science and machine learning, and the branch of visualization, in particular, is dedicated to it. In scientific literature, LaTeX is a central software because it outputs full-fledged documents rather than just the charts. As a consequence, all major conferences have templates dedicated to LaTeX and demand that the figures be embedded in them.
This makes it particularly important to know how to draw charts in LaTeX, and we’ll here learn how to do that.
2.2. LaTeX Packages for Charts
Most of the drawing in LaTeX is made through dedicated packages. The two most important packages are:
- PGF/TikZ, commonly called just TikZ, which is a general package for drawing geometry
- And PGFPlots, which builds upon TikZ and further extends its functionalities
The usage of these packages is sufficient for most tasks related to drawing both 2D and 3D charts. In this article, we specifically focus on drawing 2D plots for distributions and functions. However, the packages also allow the drawing of other objects, such as graphs and finite-state machines, if we need to do that.
3. Plotting Functions in TikZ
3.1. Drawing the Cartesian Plane
We can now study how to handle the task of plotting a logistic function, such as the one for logistic regression. We’ll also plot other sigmoidal functions alongside it, to show how their shapes differ.
The first task when plotting a function is the drawing of the Cartesian axes. We can do this inside the tikzpicture environment, by using twice the \draw[->] command:
\documentclass{article}
\usepackage{tikz}
\begin{document}
\begin{tikzpicture}
\draw[->] (-3.1,0) -- (3.1,0) node[right] {$x$};
\draw[->] (0,-2.1) -- (0,2.1) node[above] {$y$};
\end{tikzpicture}
\end{document}
This is the output:
3.2. Scaling and Gridlines
The plot now shows two axes with the appropriate labels. We have decided to make the horizontal dimension preponderant because, as we know, the codomain of the sigmoid functions is very narrow while the domain isn’t. The chart appears to be small in the page, though, which is something we can fix by passing a scale parameter to the tikzpicture environment:
...
\begin{tikzpicture}[scale=2]
...
\end{tikzpicture}
...
This doubles the size of the chart:
Now the size is appropriate, though the plot still appears to be empty. Because we’re using a function that’s contained in a finite interval, we can use gridlines to show its values with respect to the axes. The simplest way to draw gridlines is with the \draw[dotted] grid command:
\draw[dotted] (-3.1,-2.1) grid (3.1,2.1);
Now the chart has a dotted grid:
3.3. Drawing the Function
The plane itself is now complete. We’re ready to add the first function, which we can do by using the plot command inside \draw. This is its syntax:
\draw[parametersOfDraw] plot[parametersOfPlot] function{functionDefinition};
As a general rule, we should always pass an id parameter to plot, so that we can reference the plot later if necessary. Regarding our task, we want to draw the logistic function in blue:
\draw[color=blue] plot[id=logistic] function{1/(1+exp(-x))};
The command produces this chart:
Notice how the function extends much farther to the left than our horizontal axis, so we have to cut it at its extremes. We can do so, for all present and future plots in our picture, by specifying the domain parameter of the tikzpicture environment:
\begin{tikzpicture}[scale=2, domain=-3:3]
...
\end{tikzpicture}
This limits the plot to the interval :
The curve now fits into the chart. We want to also add a label to it, which we can do by slightly modifying the \draw command that created the curve in the first place:
\draw[color=blue] plot[id=logistic] function{1/(1+exp(-x))} node[right] {$f_1(x) = \frac{1}{1+e^{-x}}$};
This places a label to the right of the rightmost point of the plot:
3.4. Plotting More Sigmoidal Functions
Because we want to show a comparison between the logistic function and other sigmoidal functions, we can now repeat the same procedure to draw these three additional functions:
- the hypertangent function,
- an algebraic function,
- the arctangent function,
We can do so by repeating the \draw command, once per function:
\draw[color=red] plot[id=hypertan] function{tanh(x)} node[above right] {$f_2(x) = tanh(x)$};
\draw[color=orange] plot[id=algebraic] function{x/sqrt(1+x*x)} node[below right] {$f_3(x) = \frac{x}{\sqrt{1+x^2}}$};
\draw[color=brown] plot[id=arctangent] function{atan(x)} node[above right] {$f_4(x) = arctan(x)$};
This produces three extra plots inside the chart:
3.5. Finishing Touches
The labels of the two functions are almost overlapping now, though. We can spread the labels further by assigning the value 0.15cm to the parameters below right, right, and above right:
We can also note that the line plots of the functions aren’t very nitid and visible. In order to highlight them, we can give the parameter very thick to the relevant \draw commands:
Finally, we can add a title to our picture by using the \node command, with this syntax:
\node[align=center, font=\bfseries, yshift=2em, xshift=-4.5em] (title) at (current bounding box.north) {Sigmoidal functions};
This places an invisible node containing the title as a label, at the upper bounding of the current chart:
3.6. Full Code
This concludes the development of our chart for comparing the sigmoidal functions. And this is its full code:
\documentclass{article}
\usepackage{tikz}
\begin{document}
\begin{tikzpicture}[scale=2, domain=-3:3]
\draw[->] (-3.1,0) -- (3.1,0) node[right] {$x$};
\draw[->] (0,-2.1) -- (0,2.1) node[above] {$y$};
\draw[dotted] (-3.1,-2.1) grid (3.1,2.1);
\draw[very thick, color=blue] plot[id=logistic] function{1/(1+exp(-x))} node[right=0.15cm] {$f_1(x) = \frac{1}{1+e^{-x}}$};
\draw[very thick, color=red] plot[id=hypertan] function{tanh(x)} node[above right=0.15cm] {$f_2(x) = tanh(x)$};
\draw[very thick, color=orange] plot[id=algebraic] function{x/sqrt(1+x*x)} node[below right=0.15cm] {$f_3(x) = \frac{x}{\sqrt{1+x^2}}$};
\draw[very thick, color=brown] plot[id=arctangent] function{atan(x)} node[above right=0.15cm] {$f_4(x) = arctan(x)$};
\node[align=center, font=\bfseries, yshift=2em, xshift=-4.5em] (title) at (current bounding box.north) {Sigmoidal functions};
\end{tikzpicture}
\end{document}
4. Drawing Box Plots in PGFPlots
4.1. PGFPlots for Statistical Analysis
To draw charts, we can also use the package PGFPlots, which is built upon TikZ but further extends its capabilities. PGFPlots simplifies the drawing of common structures and shortens slightly the time required to develop simple charts. We’re now going to see how to build a different type of chart, though, and not simple function plots, by using commands available in PGFPlots but not in TikZ.
In this guided tutorial, we imagine that we want to compare the performances of students from four separate classes who have attended our exam, which we scored in percentage points. We labeled the four distributions of scores according to the general level of attention that the students in a given class pay during the lectures, and we want to verify through statistical analysis whether our expectations are correct.
The chart we use for this task is a box plot for the representation of the main statistics in univariate distributions. Box plots are particularly useful in preliminary data analysis when we study the structure of a dataset on which we’ll train a machine learning algorithm.
In PGFPlots, box plots are included in the library statistics, which we, therefore, have to include in the document preamble:
\documentclass{article}
\usepackage{pgfplots}
\pgfplotsset{compat=1.16}
\usepgfplotslibrary{statistics}
\begin{document}
\begin{tikzpicture}
% Our box plot will go here
\end{tikzpicture}
\end{document}
Notice that we usually have to specify a \pgfplotsset command, and indicate with the compat parameter the version of PGFPlots that we use for compatibility. PGFPlots is somewhat tricky to this regard, and backward compatibility isn’t granted.
Inside tikzpicture, we can now set an axis environment. This environment accepts, upon initialization, multiple parameters that control most of the macroscopic characteristics of our plot. We can, for example, define its title and the labels for the axes:
...
\begin{axis}[xlabel={$x$}, ylabel={$y$}, title=Box Plot]
% Plot goes here
\end{axis}
...
This is our plot, currently empty:
4.2. Adding the Box Plot
We can now add the actual box plot to our chart. Box plots themselves are non-parametric; a box plot chart, though, is defined in PGFPlots according to these parameters:
- a median for the distribution
- a lower and an upper quartile, which correspond, respectively, to the 25th and 75th percentiles of the sorted distribution
- and a lower and upper whisker, that typically indicate one standard deviation from the mean of the distribution
If we have precomputed these parameters for a given distribution, we can then use the \addplot command inside the axis environment, to add a boxplot prepared:
...
\addplot+ [boxplot prepared={lower whisker=25, lower quartile=37, median=65, upper quartile=72, upper whisker=81},]
table[row sep=\\,y index=0] {1\\ 92\\ 95\\};
...
This is the resulting boxplot:
The set table, in the code above, indicates the outliers for this particular box plot. If we haven’t precomputed the parameters for a boxplot prepared, we can let PGFPlots calculate them by passing a univariate distribution to the boxplot handler, which will then return them to boxplot prepared.
4.3. Comparison Between Multiple Box Plots
We can now repeat the \addplot+ command to insert additional box plots into our chart:
Notice how \addplot+ automatically changes both the color and the marker for the outliers, with each successive box plot that it adds.
4.4. Adding Labels to the Chart
The box plots are now drawn correctly, but the labels on the vertical axis aren’t informative. We can add our labels by specifying the ytick and yticklabels parameters of the axis environment:
\begin{axis}[xlabel={Score}, title={Exam results}, ytick={1,2,3,4},
yticklabels={Attentive students, Inattentive students, Normal students,
Highly participating students},]
...
\end{axis}
Notice that we also replaced the text of the axes and the title of the box plot chart to reflect the nature of our task better. This is the output:
4.5. Full Code
It appears that the expectations we held were correct and that the labels we assigned to each class were generally representative of their results at the exam. This, in turn, concludes the task of building the box plot for the analysis of the four distributions.
This is code builds the final version of our chart:
\documentclass{article}
\usepackage{pgfplots}
\pgfplotsset{compat=1.16}
\usepgfplotslibrary{statistics}
\begin{document}
\begin{tikzpicture}
\begin{axis}[xlabel={Score}, title={Exam results}, ytick={1,2,3,4},
yticklabels={Attentive students, Inattentive students, Normal students, Highly participating students},]
\addplot+ [boxplot prepared={lower whisker=25, lower quartile=37, median=65,
upper quartile=72, upper whisker=81},] table[row sep=\\,y index=0] {1\\ 92\\ 95\\};
\addplot+ [boxplot prepared={lower whisker=12, lower quartile=17, median=25,
upper quartile=52, upper whisker=61},] table[row sep=\\,y index=0] {72\\};
\addplot+ [boxplot prepared={lower whisker=12, lower quartile=25, median=50,
upper quartile=75, upper whisker=87},] table[row sep=\\,y index=0] {\\};
\addplot+ [boxplot prepared={lower whisker=62, lower quartile=64, median=70,
upper quartile=72, upper whisker=81},] table[row sep=\\,y index=0] {5\\9\\13\\};
\end{axis}
\end{tikzpicture}
\end{document}
5. Drawing Bar Charts in PGFPlots
5.1. Defining the Environment
We can also use PGFPlots to draw other types of charts, such as bar charts, for comparing multiple distributions. The bar chart is particularly suitable when we want to quickly visualize the difference between distributions that relate to the same type of measurement. A common example of this is, for instance, the comparison of financial data in time series across multiple organizations.
For this reason, in this guided example, we’ll use a bar chart to compare expenses for three companies, including ours, during the last financial year. We can start by defining an axis environment, to which we pass a title and ybar as its options:
\begin{axis}[ybar, title={Company expenses in 2019}]
...
\end{axis}
This is the empty plot:
If we use xbar instead, we end up plotting horizontal bars; but since financial data is usually plotted vertically, we also here use the same convention. The horizontal ticks, then, should represent the four sections of our budget, . Because we’re not interested in treating as a numeral, but rather as categorical values, we can tell the axis environment to use those four symbols as values for . To do so, we specify the symbolic x coords parameter:
\begin{axis}[ybar, title={Company expenses in 2019}, symbolic x coords={Salaries, Capital, Loans, Taxes}]
This changes the horizontal ticks:
This automatically takes care of removing extra ticks and lets us use the string value associated with each year as the coordinate for any data points. It also created an overlapping between the labels, though, but we’ll fix that in a second.
5.2. Adding the Bars
We can first plot the data on the chart by using the \addplot command, as we did in the previous exercise:
\addplot+ coordinates {(Salaries, 150) (Capital, 158) (Loans, 142) (Taxes, 164)};
Which correspondingly plots the relevant bars:
With coordinates, we specify that we want the data to originate from the specific tuples that we indicate, as opposed to, say, a CSV file containing them. The data is now ok, but it’s time to remove the overlapping between the labels. We can do so with the enlarge x limits option for the axis environment:
\begin{axis}[..., enlarge x limits=0.2]
Now the labels are displayed correctly:
We can then repeat the same procedure and add data for the other two companies:
5.3. Finishing Touches
The graph is almost finished. We can now add a legend and the value for each bar in the proximity of their tops.
Regarding the first, we can use the \legend command. We also pass legend pos to the axis environment to avoid overlapping. Regarding the values for each bar, we can draw them with the nodes near coords option for the same environment. Finally, we can also remove the -axis, now redundant, and the top portion of the -axis:
\begin{axis}[ ... ,
legend pos=north west, nodes near coords, axis y line=none, axis x line=bottom]
...
\legend{Our company, Competitor 1, Competitor 2}
\end{axis}
Now the chart is neat and clean:
5.4. Full Code
Our bar chart is now complete. This is the full code that replicates it:
\documentclass{article}
\usepackage{pgfplots}
\pgfplotsset{compat=1.16}
\begin{document}
\begin{tikzpicture}
\begin{axis}[ybar, title={Company expenses in 2019}, symbolic x coords={Salaries, Capital, Loans, Taxes},
legend pos = north west, axis y line=none, axis x line=bottom, nodes near coords, enlarge x limits=0.2, ]
\addplot+ coordinates {(Salaries, 150) (Capital, 158) (Loans, 142) (Taxes, 164)};
\addplot+ coordinates {(Salaries, 143) (Capital, 146) (Loans, 169) (Taxes, 182)};
\addplot+ coordinates {(Salaries, 162) (Capital, 156) (Loans, 149) (Taxes, 165)};
\legend{Our company, Competitor 1, Competitor 2}; \end{axis}
\end{tikzpicture}
\end{document}
6. Conclusion
In this article, we studied how to draw basic charts in LaTeX.
Specifically, we learned how to draw the plots for functions with the TikZ package, and box plots and bar charts with PGFPlots.