为图表的Y轴选择一个合适的线性刻度

In this tutorial, we’ll study how to choose a linear scale for a chart that represents a distribution.

2. Representing Distributions

We studied in our article on drawing charts on LaTeX the general techniques for representing distributions. Also, in our tutorial on the auto-layout of graphs, we discussed the problem of generating representations that support the process of understanding by our readers. In that context, we noted that not all representations are equal and that some of them work better than others.

In this article, instead, we focus on determining the correct linear scale for a chart. Specifically, we study a procedure for identifying the lower and upper bound of the chart, as well as the position of its ticks.

We’re going to first take an example that clarifies the nature of the problem, and then discuss its solution.

3. A Wrong Representation

Let’s start by representing a distribution, and by assigning five ticks to the axis:

Rendered by QuickLaTeX.com

Intuitively, we can tell that there’s something wrong with the scale that we use for the $\textbf{y}$ axis. In fact, we expect the scatterplot to cover most of the plane, not just the top portion.

We can also notice that the ticks are denser in an area of the chart that has no observations. On the other hand, in an area dense of observations such as the interval $10 \leq y \leq 16$ , there’s barely any tick.

Further, one tick is evidently different from all others, given that it follows a 2-decimal precision, whereas all others are rounded to the nearest integer.

From this consideration, we can state that the criterion which was followed for selecting the scale of $\textbf{y}$ is probably wrong.

3. A Better Representation

Let’s compare the previous chart with a new one, that contains the same observations and the same number of ticks, but uses a different scale:

Rendered by QuickLaTeX.com

This looks much better. With a quick glance, we can immediately understand what observation is higher than which other, and by approximately how much. The ticks are rounded nicely, and all hold integer values that are uniformly distributed between the minimum and maximum values of the distribution.

4. Criteria for Choosing a Linear Scale

Therefore, it appears that some kind criteria for finding the optimal linear scale for a chart, given the distribution and a number of ticks, exist.

These criteria are:

the axis should extend from slightly below the lower bound, to slightly above the upper bound
the ticks should be uniformly distributed between the lower bound and the upper bound
preference should be given to round values for the ticks

We can see that the first chart above doesn’t follow any of these criteria, while the second one does. As a consequence, it looks better.

5. Procedure for Identifying Scale and Ticks

We can formalize in a series of steps the procedure through which we can assign ticks to the axis of a distribution.

First, we take the lower and the upper bound of the distribution and compute its range:

$\text{range} = \text{upper} - \text{lower}$

Then, we divide the range of the distribution by the desired number of ticks, and obtain the range of the ticks:

$\text{tick range} = \frac {\text{range}} {\text{N. ticks}}$

If the tick range corresponds to an unpleasant value, say, 6.7, we can round it up to the nearest nice round value. The meaning of “nice” is, of course, largely subjective. As a general rule, we can say that it corresponds to the multiples of 25, 10, 5, 2, or 1, in this order of preference.

Further, we can identify a new lower and upper bound, according to the rounded tick range that we’ve just calculated. We can compute the lower bound as:

$\text{new lower} = \text{rounded tick range} \times \text{ceiling}( \frac {\text{old lower}} {\text{rounded tick range}} )$

We can also compute the upper bound in an analogous manner:

$\text{new upper} = \text{rounded tick range} \times \text{ceiling}(1+ \frac {\text{old upper}} {\text{rounded tick range}} )$

Notice that we add 1 inside the ceiling operator, in order to avoid the edge case in which the lower and upper bound correspond.

Finally, we can calculate the position of each tick. We do this by starting from the lower bound, and then iteratively add $\text{rounded tick range}$ to it.

6. Conclusion

In this tutorial, we studied how to determine a nice scale for the axis in a chart.

Persistence

REST