从命令行获取输入文本的语音输出

1. Introduction

Many Linux tools convert text to speech or audio files from the command line, improving accessibility. Some of them also come with features like multiple voice options, multiple languages, pitch adjustment, and word gap control.

In this tutorial, we’ll discuss four commands for getting speech output from command line text.

2. espeak

espeak is a speech synthesizer that supports various languages. It can convert standard input, text files, and texts passed as arguments to speech or WAV files. It also has pitch, word gap, and amplitude control plus many other features.

2.1. Installation

We can install espeak on a Debian machine:

$ sudo apt install espeak

We can also install it on a Red Hat Linux:

$ sudo dnf install espeak

2.2. Converting Text to Speech

espeak can convert text passed as argument to speech:

$ espeak "Welcome to Baeldung"

When passing more than one word to the espeak command, quotation marks are vital. Without them, the command will only read the first word.

espeak can also convert standard input to speech:

$ espeak --stdin
Welcome to Baeldung

After entering the text in the prompt, we must press Ctrl + D to exit the standard input stream. After that, espeak will give us the speech output.

When we use the -f flag, espeak will produce speech from text files:

$ espeak -f Baeldung.txt

We can also get our speech output as a WAV file using the -w flag in any of the following ways:

$ espeak "Welcome to Baeldung" -w welcome

$ espeak --stdin -w welcome
Welcome to Baeldung

$ espeak -f Baeldung.txt -w welcome

When we run any of the three commands above, we’ll create a WAV file named welcome. The said file will say “Welcome to Baeldung” when we play it.

2.3. Changing Pitch

We can vary the pitch of the espeak speech output between 0 and 99, where 99 is the highest pitch and 0 is the lowest. But to change the pitch, we need the -p flag.

The default pitch is 50, so with that in mind, we have a sense of how high or low we need to go.

Let’s raise the pitch to 70:

$ espeak -f Baeldung.txt -p 70

2.4. Changing Word Gap (Speech Rate)

espeak varies its speech rate or word gap using the -g option, and it works with units of 10ms (milliseconds).

So, if we want it to leave a gap of 1 second (1000ms) between each word, we’ll pass 100 to the -g flag:

$ espeak -f Baeldung.txt -g 100

2.5. Changing Amplitude (Volume)

espeak works with a default amplitude (volume) of 100. But we can make it go as high as 200 and as low as 0 by passing our desired value to the -a flag.

So, let’s raise the amplitude to 150:

$ espeak -f Baeldung.txt -a 150

We can pass values above 200 to the -a flag. But it’s better to stay within the recommended limits.

3. say

say is one of the tools from the GNUStep toolset. It is a lightweight text-to-speech tool that works in two ways: converting text arguments to speech and converting text files to speech.

3.1. Installation

We can install say on a Debian machine by installing the GNUStep GUI Runtime:

$ sudo apt install gnustep-gui-runtime

3.2. Converting Text to Speech

say can convert text to speech by passing the text as an argument:

$ say Welcome to Baeldung

say can also convert text files to speech:

$ say -f Baeldung.txt

4. Google Speech

google_speech is a CLI text-to-speech tool based on Google Translate TTS. It is lightweight, like say, but while it can convert text to audio files, say can’t.

On the other hand, while say can produce speech from text files, google_speech can’t.

4.1. Installation

To install google_speech, we’ll run the following commands:

$ sudo apt-get install libsox-fmt-all
$ sudo apt-get install sox
$ sudo pip install sox
$ sudo pip install google_speech

The first three commands install sox and some of its dependencies. Then the last command installs google_speech. google_speech needs sox to work. Hence, the installations.

4.2. Converting Text to Speech

Let’s make google_speech say “Welcome to Baeldung”:

$ google_speech "Welcome to Baeldung"

The quotation marks are important. Without them, the command may throw an error.

google_speech also converts text to audio files using -o:

$ google_speech -o welcome.mp3 "Welcome to Baeldung"

When specifying an output filename for google_speech, adding one of mp3, flac, or ogg as the file extension will prevent format error.

4.3. Using the Language Option

We can make google_speech produce speech output in one of 75 languages using the -l flag. But first, we’ll get a list of all supported languages by running google_speech -l:

$ google_speech -l
usage: google_speech [-h]
                     [-l {af,ar,bn,bs,ca,...,pl,pt,pt-br,pt-pt,ro,ru,si,sk,sq,sr,su,sv,sw,ta,te,th,tl,tr,uk,vi,zh-cn,zh-tw}]
                     [-e SOX_EFFECTS [SOX_EFFECTS ...]] [-v {warning,normal,debug}] [-o OUTPUT]
                     speech
google_speech: error: argument -l/--lang: expected one argument

Now, let’s make google_speech say “Welcome to Baeldung” using a French accent:

$ google_speech -l fr "Welcome to Baeldung

5. gTTS

gTTS is not as lightweight as google_speech and say. But unlike them, it only produces an audio file from the text passed to it.

5.1. Installation

We can install gTTS using pip:

$ sudo pip install gTTS

5.2. Converting Text to Speech

We can convert text to an mp3 file using gtts-cli:

$ gtts-cli 'Welcome to Baeldung' --output welcome.mp3

Next, we’ll convert a text file to an mp3 file:

$ gtts-cli -f Baeldung.txt --output welcome.mp3

5.3. Using the Language Option

Like google_speech, gtts-cli can has a language option, -l.

We can run gtts-cli –all to see all supported languages:

$ gtts-cli --all
 af: Afrikaans
  ar: Arabic
  bg: Bulgarian
...truncated...
  en: English
  es: Spanish
...truncated...
  zh-TW: Chinese (Mandarin/Taiwan)
  zh: Chinese (Mandarin)

Then when we run the following command, gtts-cli will use a Spanish accent:

$ gtts-cli "Welcome to Baeldung" -l es --output welcome.mp3

5.4. Reducing Speech Rate

While gtts-cli can also reduce speech rate, it does not have the same precision as espeak.

To make speech slower with gtts-cli, we simply add the -s flag:

$ gtts-cli -s 'Welcome to Baeldung' --output welcome.mp3

6. Conclusion

In this article, we saw four ways to convert text to speech from the command line. Of all four tools discussed, espeak is the most robust as it allows pitch, amplitude, and speech rate variation unlike the other three.

While gtts-cli does not offer direct speech output, it offers more than google_speech. But overall, say offers the fewest features.

Persistence

REST

Security