1. Introduction
Many Linux tools convert text to speech or audio files from the command line, improving accessibility. Some of them also come with features like multiple voice options, multiple languages, pitch adjustment, and word gap control.
In this tutorial, we’ll discuss four commands for getting speech output from command line text.
2. espeak
espeak is a speech synthesizer that supports various languages. It can convert standard input, text files, and texts passed as arguments to speech or WAV files. It also has pitch, word gap, and amplitude control plus many other features.
2.1. Installation
We can install espeak on a Debian machine:
$ sudo apt install espeak
We can also install it on a Red Hat Linux:
$ sudo dnf install espeak
2.2. Converting Text to Speech
espeak can convert text passed as argument to speech:
$ espeak "Welcome to Baeldung"
When passing more than one word to the espeak command, quotation marks are vital. Without them, the command will only read the first word.
espeak can also convert standard input to speech:
$ espeak --stdin
Welcome to Baeldung
After entering the text in the prompt, we must press Ctrl + D to exit the standard input stream. After that, espeak will give us the speech output.
When we use the -f flag, espeak will produce speech from text files:
$ espeak -f Baeldung.txt
We can also get our speech output as a WAV file using the -w flag in any of the following ways:
$ espeak "Welcome to Baeldung" -w welcome
$ espeak --stdin -w welcome
Welcome to Baeldung
$ espeak -f Baeldung.txt -w welcome
When we run any of the three commands above, we’ll create a WAV file named welcome. The said file will say “Welcome to Baeldung” when we play it.
2.3. Changing Pitch
We can vary the pitch of the espeak speech output between 0 and 99, where 99 is the highest pitch and 0 is the lowest. But to change the pitch, we need the -p flag.
The default pitch is 50, so with that in mind, we have a sense of how high or low we need to go.
Let’s raise the pitch to 70:
$ espeak -f Baeldung.txt -p 70
2.4. Changing Word Gap (Speech Rate)
espeak varies its speech rate or word gap using the -g option, and it works with units of 10ms (milliseconds).
So, if we want it to leave a gap of 1 second (1000ms) between each word, we’ll pass 100 to the -g flag:
$ espeak -f Baeldung.txt -g 100
2.5. Changing Amplitude (Volume)
espeak works with a default amplitude (volume) of 100. But we can make it go as high as 200 and as low as 0 by passing our desired value to the -a flag.
So, let’s raise the amplitude to 150:
$ espeak -f Baeldung.txt -a 150
We can pass values above 200 to the -a flag. But it’s better to stay within the recommended limits.
3. say
say is one of the tools from the GNUStep toolset. It is a lightweight text-to-speech tool that works in two ways: converting text arguments to speech and converting text files to speech.
3.1. Installation
We can install say on a Debian machine by installing the GNUStep GUI Runtime:
$ sudo apt install gnustep-gui-runtime
3.2. Converting Text to Speech
say can convert text to speech by passing the text as an argument:
$ say Welcome to Baeldung
say can also convert text files to speech:
$ say -f Baeldung.txt
4. Google Speech
google_speech is a CLI text-to-speech tool based on Google Translate TTS. It is lightweight, like say, but while it can convert text to audio files, say can’t.
On the other hand, while say can produce speech from text files, google_speech can’t.
4.1. Installation
To install google_speech, we’ll run the following commands:
$ sudo apt-get install libsox-fmt-all
$ sudo apt-get install sox
$ sudo pip install sox
$ sudo pip install google_speech
The first three commands install sox and some of its dependencies. Then the last command installs google_speech. google_speech needs sox to work. Hence, the installations.
4.2. Converting Text to Speech
Let’s make google_speech say “Welcome to Baeldung”:
$ google_speech "Welcome to Baeldung"
The quotation marks are important. Without them, the command may throw an error.
google_speech also converts text to audio files using -o:
$ google_speech -o welcome.mp3 "Welcome to Baeldung"
When specifying an output filename for google_speech, adding one of mp3, flac, or ogg as the file extension will prevent format error.
4.3. Using the Language Option
We can make google_speech produce speech output in one of 75 languages using the -l flag. But first, we’ll get a list of all supported languages by running google_speech -l:
$ google_speech -l
usage: google_speech [-h]
[-l {af,ar,bn,bs,ca,...,pl,pt,pt-br,pt-pt,ro,ru,si,sk,sq,sr,su,sv,sw,ta,te,th,tl,tr,uk,vi,zh-cn,zh-tw}]
[-e SOX_EFFECTS [SOX_EFFECTS ...]] [-v {warning,normal,debug}] [-o OUTPUT]
speech
google_speech: error: argument -l/--lang: expected one argument
Now, let’s make google_speech say “Welcome to Baeldung” using a French accent:
$ google_speech -l fr "Welcome to Baeldung
5. gTTS
gTTS is not as lightweight as google_speech and say. But unlike them, it only produces an audio file from the text passed to it.
5.1. Installation
We can install gTTS using pip:
$ sudo pip install gTTS
5.2. Converting Text to Speech
We can convert text to an mp3 file using gtts-cli:
$ gtts-cli 'Welcome to Baeldung' --output welcome.mp3
Next, we’ll convert a text file to an mp3 file:
$ gtts-cli -f Baeldung.txt --output welcome.mp3
5.3. Using the Language Option
Like google_speech, gtts-cli can has a language option, -l.
We can run gtts-cli –all to see all supported languages:
$ gtts-cli --all
af: Afrikaans
ar: Arabic
bg: Bulgarian
...truncated...
en: English
es: Spanish
...truncated...
zh-TW: Chinese (Mandarin/Taiwan)
zh: Chinese (Mandarin)
Then when we run the following command, gtts-cli will use a Spanish accent:
$ gtts-cli "Welcome to Baeldung" -l es --output welcome.mp3
5.4. Reducing Speech Rate
While gtts-cli can also reduce speech rate, it does not have the same precision as espeak.
To make speech slower with gtts-cli, we simply add the -s flag:
$ gtts-cli -s 'Welcome to Baeldung' --output welcome.mp3
6. Conclusion
In this article, we saw four ways to convert text to speech from the command line. Of all four tools discussed, espeak is the most robust as it allows pitch, amplitude, and speech rate variation unlike the other three.
While gtts-cli does not offer direct speech output, it offers more than google_speech. But overall, say offers the fewest features.