Production, Technology

Creating a Natural Voice using Text to Speech

Author: Sara
/ November 8, 2019

If you’ve used one of the newer text to speech services, you’ve witnessed the huge improvements this industry has seen in the past decade. The voices we have today are much more lifelike than those most people associate with “text to speech.” When you’re working with TTS, you can produce even better quality files when you follow these few simple steps.

Work sentence by sentence

Most high-quality TTS editors can generate several sentences at once, but if you’re determined to get the best sound, try creating one sentence at a time. Often you’ll see a huge improvement in both intonation and pausing when you work through each sentence individually. Plus, you can add silences between sentences more easily by working with your clips post-production (more on this below).

Add silences

Silences between words and sentences create rhythmic, natural-sounding speech. As living, breathing beings, voice actors take natural pauses to inhale. In your TTS editor, you can cue the artificial intelligence (AI) to replicate these pauses by adding commas, periods, dashes, and ellipses. Think of these punctuation marks as percussive notes, not as grammatical tools, and you’ll be well on your way to generating natural AI voice recordings.

Let me give a brief example. In this first clip, I entered the following text into WellSaid Studio. I used punctuation in a grammatically-minded way:

Text to speech is a scalable alternative to traditional voice acting.

Created using WellSaid Studio

Now, listen to the same sentence with percussive punctuation marks added to create an appealing rhythm. Notice how the sentence, while grammatically incorrect, has a natural-sounding cadence to it:

Text to speech, is a scalable alternative, to traditional voice acting.

Created using WellSaid Studio

Use inventive spelling

Modern TTS services train on neural networks. As a result, they work predictively, and this means they sometimes mispronounce words. Often this happens with words that are spelled the same but are pronounced differently. Think about the homonyms “read” as in, “I can read!” and “read,” as in “I haven’t read this book yet.” Other words that are frequently mispronounced include abbreviations like “CEO” or “USC.” A neural-trained AI voice will read these as funny short words rather than pronouncing the letters.

To get the right results, spell phonetically. You’ll sometimes need to be explicit with the text to voice editor about how you want a word pronounced, just as you would do with a voice actor. “Read” might need to be entered as “reed,” and “CEO” as “see eeh oh.”

Play with intonation

Punctuation marks not only add pausing, they also change intonation and play an important role when building a voiceover track for an eLearning course. If you want a specific word emphasized, try putting it in quotation marks. If you want a different intonation than the one you’re hearing, try seLECTive caps or ALL caps. You can also insert commas and periods before or after the word you want emphasized, as long as the resulting pause is acceptable.

Using the same example sentence I showed you above, I added some intonation marks to achieve a more lively rendering. “Scalable” is unusual enough that the editor needs a little help, so I entered “scaelable” to prompt the right phonemes.

Here’s the sentence and the audio result:

Text to speech, is a scaelable alternative, to “traditional” VOIce acting.

Created using WellSaid Studio

Edit post-production

You don’t need to be an expert to get the final polish to your WAV files with a sound editor. Many basic, inexpensive audio editing apps let you add post-production pauses. Add some silence at the start of your clips to mimic a voice actor’s inhale. Add a small amount of silence between your clips as well, and you’ve got quality, human-sounding audio production on your hands.

Credits

Photo by palesa on Unsplash
Music by purple-planet

Try WellSaid Studio

Create engaging learning experiences, trainings, and product tours.

Try WellSaid Studio

Create engaging learning experiences, trainings, and product tours.

Business, Technology

Crafting Clarity: WellSaid Labs’ new SSML tag

April 12, 2024

Audio by Ramona J. using WellSaid Labs AI solutions are truly only as powerful as their commands. And that’s certainly true in the realm of text-to-speech (TTS) technologies, where Speech

Advertising, Business

How WellSaid Labs Transformed Waymark’s Video Creation Platform

April 8, 2024

Audio by Tobin A. using WellSaid Labs In a truly exciting collaboration, Waymark transformed their digital advertising offering with WellSaid Labs’ leading AI voice technology. In this case study, we’ll

Announcements, Technology

Explore, Preview, Choose: Introducing the Voices Page

April 4, 2024

Audio by Jodi P. using WellSaid Labs Welcome to the latest chapter in the WellSaid Labs story. Today, we’re beyond excited to pull back the curtain on a transformative update

Join the WellSaid mailing list

Get the latest news, updates and releases

Creating a Natural Voice using Text to Speech

Work sentence by sentence

Add silences

Use inventive spelling

Play with intonation

Edit post-production

Credits

Try WellSaid Studio

Try WellSaid Studio

TABLE OF CONTENTS

Related Articles

Join the WellSaid mailing list

Beautiful voices, on-demand.