1. Introduction

In this tutorial, we’ll show how to count words in LaTeX documents.

We’ll present 4 approaches to solving this problem. The first is the utility detex, available on most Linux installations. Then, there are two Perl scripts latexcount.pl and texcount.pl, both available on the web. Finally, we’ll use the shell script wordcount.sh, also available on the web.

2. A LaTeX Running Example

We’ll assume we have a LaTeX file example_Latex_document.tex as follows:

\documentclass{article}

\title{Example \LaTeX\ document}
\author{Gonzo T. Clown}
\date{\today}

\begin{document}
\maketitle
\thispagestyle{empty}

\section{The First Section}

This is an example of a \LaTeX\ source file. We can
write ordinary English as well as in-line mathematics,
such as $s=ut+ 1/2 at^2$.

\section{The Second Section}

In addition, we can also use arrays of equations.

\begin{eqnarray}
 v &=& u+at\\ 
 e &=& mc^2\\
 P_1V_1 &=& P_2V_2 
\end{eqnarray}

\section{The Third Section}

We can also present material in a tabular format.

\begin{tabular}{|c|c|}\hline
Type & Characteristics\\\hline
Mammals & Warm-blooded\\
Birds & Can fly\\
Reptiles & Cold-blooded\\\hline
\end{tabular}

\end{document} 

When we run this through pdflatex or similar command we get the file example_Latex_document.pdf:

img_6336407f02055

3. Word Count Using detex

We can use detex to strip out all LaTeX commands from a document. Here is how we apply detex to the previous example:

$ detex example_Latex_document.tex

This is the output we get:

Example  document
Gonzo T. Clown

The First Section

This is an example of a  source file. We can
write ordinary English as well as in-line mathematics,
such as .

The Second Section

In addition we can also use arrays of equations.

The Third Section

We can also present material in a tabular format.

We see that all LaTeX commands are stripped out, including the \section and \tabular commands.

So, to obtain the word count, we feed the output to wc -w:

$ detex example_Latex_document.tex | wc -w
53

The sole period preceded by a blank can be filtered out:

$ detex example_Latex_document.tex | sed 's/ \.//g' | wc -w
52

The period has to be escaped to avoid its normal meaning in sed (match any character).

4. Using the Perl Script latexcount.pl

We can run latexcount.pl like this:

$ perl latexcount.pl example_Latex_document.tex

79 words in the main text
 in the footnotes
79 total

The result we get is 79 words. In contrast, the detex approach counted only 52. The reason is that detex disregarded all the words in the table and headings since it filters out LaTeX commands.

5. Using the Perl Script texcount.pl

Script texcount.pl gives us more detailed information:

$ perl texcount.pl example_Latex_document.tex
File: example_Latex_document.tex
Encoding: ascii
Words in text: 39
Words in headers: 12
Words outside text (captions, etc.): 0
Number of headers: 4
Number of floats/tables/figures: 0
Number of math inlines: 1
Number of math displayed: 1
Subcounts:
  text+headers+captions (#headers/#floats/#inlines/#displayed)
  0+3+0 (1/0/0/0) _top_
  21+3+0 (1/0/1/0) Section: The First Section
  9+3+0 (1/0/0/1) Section: The Second Section
  9+3+0 (1/0/0/0) Section: The Third Section

We can obtain a brief output thus:

$ perl texcount.pl -brief example_Latex_document.tex
39+12+0 (4/0/1/1) File: example_Latex_document.tex

We see that the result (39+12=51) is close to the one we got with detex.

6. Using the wordcount.sh Script

To run wordcount.sh, we must place the script file wordcount.tex in the same directory as our LaTeX file. This approach works on Unix/Linux systems.

After making the script executable, we can run it as follows:

$ ./wordcount.sh example_Latex_document.tex

The count will be on the last line of the output:

example_Latex_document.tex contains 437 characters and 73 words.

To filter out the unnecessary parts of the output, we can feed it to tail:

$ ./wordcount.sh example_Latex_document.tex | tail -n1
example_Latex_document.tex contains 437 characters and 73 words.

The end result we get is close to that of latexcount.pl.

7. A Comparison of the Approaches

We found that detex and texcount.pl gave similar results, which was also the case with latexcount.pl and wordcount.sh:

tool

plain text

inline math

displayed math

tabular material

detex

69

47

47

27

latexcount.pl

71

62

68

53

texcount.pl

69

47

47

27

wordcount.sh

69

50

48

37

We can see that there are wide variations in the words reported by each of the tools. This is so because they strip out different classes of LaTeX commands and define a word in different ways.

8. Conclusion

In this article, we described four different methods to count the words in a LaTeX document. All returned different estimates of the count.