Science Health & Safety

New Voices For The Voiceless: Synthetic Speech Gets An Upgrade

1:23pm March 13, 2013

by Alix Spiegel

Samantha Grimaldo was born with a rare disorder, Perisylvian syndrome, and has never been able to speak. Ellen Webber for NPR

Play Pause

There was an error loading the media player.

Hide caption
Samantha Grimaldo was born with a rare disorder, Perisylvian syndrome, and has never been able to speak.

Previous Next

Ellen Webber for NPR
Hide caption
Samantha uses a text-to-speech iPhone app to help her communicate. Here she shows the app interface.

Previous Next

Ellen Webber for NPR
Hide caption
Joseph, Alexandra and Samantha Grimaldo sit around the kitchen counter in the family's home in Marlborough, Mass., playing with Samantha's voice app, though they mostly use sign language at home.

Previous Next

Ellen Webber for NPR
Hide caption
When Samantha was younger, she carried this device with her to help her communicate.

Previous Next

Ellen Webber for NPR
Hide caption
Samantha speaks with her mom, Ruane, about going to the movies with a friend.

Previous Next

Ellen Webber for NPR
Hide caption
Samantha watches her brother Nicholas play piano. Their mother says that a new customized voice created by researcher Rupal Patel from a young Samantha's voice sample is happy and has a sweetly familiar quality. "My son — my son Nicholas — I could hear some of his voice in it," she says.

Previous Next

Ellen Webber for NPR

1 of 6

View slideshow i

Ever since she was a small child, Samantha Grimaldo has had to carry her voice with her.

Grimaldo was born with a rare disorder, Perisylvian syndrome, which means that though she's physically capable in many ways, she's never been able to speak. Instead, she's used a device to speak. She types in what she wants to say, and the device says those words out loud. Her mother, Ruane Grimaldo, says that when Samantha was very young, the voice she used came in a heavy gray box.

The text-to-speech iPhone app that Samantha Grimaldo uses has three voice options for her to choose from.

Ellen Webber for NPR

"She used to have to carry this device around that was at least 4 or 5 pounds," Ruane says, "and she was only, like, 70 pounds herself. The poor thing had to carry this back and forth to school every day on the bus." It was miserable having to lug her voice around that way — a clunky box sitting on the seat next to her.

Today, fortunately, Samantha's voice takes up much less space. She types into a special program on an iPhone or iPad, and a synthesized voice in the program says the words aloud. The voice, one of several types on the market, is called "Heather." That's a nice enough name — easygoing and accessible — but Grimaldo doesn't like to use the voice if she can help it.

Her mother has noticed that when the family goes out to restaurants, Samantha prefers to write out her menu choices. Apparently, as she explains to her mother, this is because Samantha has some reservations about the voice itself — the cold metal sound of it.

"Because [it's] weird," Samantha says of the mechanical voice — speaking in the voice itself.

It's not just that the voice is artificial and disjointed. It sounds, Samantha says, "older." Samantha is only 17, and the sound of the voice — deep, methodical, mature — doesn't exactly align with her sense of herself. Like any teenager, she feels self-conscious about it.

"I don't want [people to] hear," she says.

The Voice For The Voiceless

If you don't have a voice, who speaks for you? Today there are more than 60 different options for people who need to use synthetic voices to communicate, but for the majority of people who use them, there is a single answer to that question: "Perfect Paul."

Rupal Patel, a speech scientist at Northeastern University, estimates that between 50 and 60 percent of the people who use synthetic voices use the same one — the Perfect Paul voice. If you have ever heard Stephen Hawking speak, or listened to the weather radio, you have heard the voice of Perfect Paul.

Perfect Paul is used so widely because some studies have shown that his voice is easiest to understand in a variety of situations, including classrooms and public outdoor spaces. Still, some in the community of people who rely on synthetic voices have found the Perfect Paul version frustrating — not because it's a bad voice, but because it's limiting.

In fact, it was through confronting the clear limits of Perfect Paul that speech scientist Patel came to the conclusion that people like Samantha Grimaldo needed new options.

It happened around 10 years ago when Patel was at a conference for the makers and users of synthetic voices.

Rupal Patel is a speech scientist at Northeastern University.

Courtesy of Mary Knox Merrill/Northeastern University

"I was watching a demonstration of a new technology, and someone came up and said something in their synthesized voice, and then someone else came up," Patel says.

Both spoke in the same voice — Perfect Paul's. Then a third person arrived, and another.

"It was the same voice saying different things," says Patel. "And sometimes they were saying the same phrase, but off by a few seconds ... so it felt like it was this echo going on. It was just a strange thing."

Standing there, in the middle of all these radically different people with the exact same voice, Patel had an idea: Isn't there something we can do to make these voices more individuated?

So, around seven years ago, Patel started working to change synthetic voices. When a person speaks, two things are happening. First, the source of speech comes from the voice box, which vibrates to produce sound. Then, the mouth shapes those sounds into speech.

In many people who have speech disorders, it's mainly the second part of the system that doesn't work. "In people with speech disorders, the source is pretty preserved," Patel says. "I thought, 'That's where the melody is — that's where someone's identity is, in terms of their vocal identity.' "

So Patel decided to capture the melody of a voice. She primarily works with kids, and so she asked kids with speech disorders who can still make some sounds to come into her lab and do something really simple. "We just need them to say a sustained sound, like ahhhhh," she says.

Patel can take that sound, run it through a computer and find out all kinds of things about how that person would sound if that person could speak words. "We can determine their pitch, the loudness, the breathiness of their voice, the changes in clarity," she says.

She then takes a recording of the voice of what she calls a "healthy donor" — for example, the voice of a child who is roughly the same age as the child she's trying to help — and gets them to say a large number of words. So she ends up with samples of the sounds they produce when they talk. She then combines that voice with the pitch, breathiness and other characteristics of the child with the voice disorder.

Patel played me examples of two different voices she's created. If you listen, you can clearly hear different pitch and clarity in the different voices.

These voices Patel can make are unique for each individual. Which brings us back to Samantha Grimaldo.

'You Need A Voice'

When Patel was getting started, Samantha was one of the first kids with a voice disorder who came to her lab to give a voice sample. At the time, Patel wasn't at the stage where she was actually constructing voices. But she's since figured it out, and recently, she created a new voice using Samantha's ahhhhh sample.

Last week, she gave the personalized voice to Ruane and Samantha so they could hear it. The voice was constructed from a sample taken when Samantha was much younger. For a current version of Samantha's voice, you'd need to take a new sample. Still, it was the first time that Samantha and her mother had heard anything close to Samantha's voice.

Ruane had listened earlier in the day, when Samantha was still at school, and was clearly deeply moved by the experience. It made her realize in a fresh way, she says, how difficult it had been for her to never hear her daughter's voice.

"When I heard it, I thought, 'Yeah! This could be it!' " Ruane says through tears. To her ear, the voice had a sweetly familiar quality. "My son — my son Nicholas — I could hear some of his voice in it," she says.

And so, when Samantha got home from school that afternoon, they sat down together to listen. Samantha's young voice, it turns out, is clear and light.

Ruane told me that when Samantha heard the voice, her eyes lit up and a smile broke out on her face. Both thought that the voice sounded happy.

Personalized voices like these aren't yet available to everyone. Patel has figured out how to do it, but not how to make it work on all of the different electronic devices that people use to play a synthetic voice. But Ruane Grimaldo hopes that voices like these will be available one day, very soon.

"You need a voice," she says. "You need a voice."

Transcript

STEVE INSKEEP, HOST:

On a Monday, it's MORNING EDITION from NPR News. I'm Steve Inskeep.

RENEE MONTAGNE, HOST:

And I'm Renee Montagne. Today in "Your Health," we hear from a teenager who cannot speak for herself, and from the researcher trying to give that teenager her voice back. NPR's Alix Spiegel has been learning about the synthetic voices that are often used by people who are physically unable to talk, and how they're evolving.

ALIX SPIEGEL, BYLINE: Let's begin with an introduction.

SAMANTHA GRIMALDO: My name is Samantha Grimaldo.

SPIEGEL: Samantha Grimaldo, the 17-year-old girl you just heard introduce herself, was born with a rare disorder - Perisylvian syndrome. This means that while Samantha is physically capable in many ways, she's never been able to speak. And so ever since she was a small child, Grimaldo has had to carry her voice with her. Her mother, Ruane Grimaldo, says that when she was young, the voice she used came in a heavy, gray box.

RUANE GRIMALDO: She used to have to carry this device around that was at least 4 or 5 pounds, and she was only like, 70 pounds herself. And the poor thing had to carry this back and forth to school every day on the school bus.

SPIEGEL: It was miserable having to lug her voice around that way, this clunky box sitting on the seat next to her. Fortunately today, Samantha's voice takes up much less space. She types what she wants to say into a special program on an iPad, and a voice in the program says her words out loud. Still, Grimaldo doesn't like to use this voice, if she can help it. Her mother has noticed that when the family goes out to restaurants, Samantha much prefers to write her orders out.

RUANE GRIMALDO: Why don't you like to use it all the time?

SAMANTHA GRIMALDO: Because that weird.

RUANE GRIMALDO: Because it's weird, she said.

SPIEGEL: This voice - the one that you just heard describe itself as weird - is one of just a small number of voices available to people who cannot speak for themselves. Like the other voices, this voice has a name: Heather. It's a nice enough name; easygoing, accessible. But Samantha doesn't seem impressed.

SAMANTHA GRIMALDO: I don't like Heather voice.

RUANE GRIMALDO: Why don't you like it?

SAMANTHA GRIMALDO: Older.

RUANE GRIMALDO: Oh. She said it sounds older.

SAMANTHA GRIMALDO: Yes.

RUANE GRIMALDO: Yeah.

SPIEGEL: Samantha Grimaldo, as I said, is 17 years old. And so the sound of Heather's voice - deep, methodical, mature - doesn't exactly align with her image of herself. And like any teenager, she feels self-conscious.

SAMANTHA GRIMALDO: I don't want the people hear.

SPIEGEL: If you don't have a voice, who speaks for you? Today, there are around 60 different options - voices like Heather. But really, for the majority of people who use a synthetic voice, there's a single answer to that question: Perfect Paul. Perfect Paul is the voice that speaks for you. Here's Perfect Paul describing the weather.

PERFECT PAUL: The eastern United States-Canada area, including the Eastern United States...

RUPAL PATEL: Sort of this robotic, adult male voice; it's what you think of when you think of a computer talking.

SPIEGEL: This is a speech scientist named Rupal Patel, who's on the faculty of Northeastern University. Patel estimates that between 50 and 60 percent of the people who use synthetic voices use Perfect Paul because the voice of Perfect Paul is seen as easier to understand than other synthesized voices. So for years - whether you were a man or woman, 4 or 40 - you used Perfect Paul; which actually, is how Rupal Patel first got the idea that it was time for people like 17-year-old Samantha to have a different kind of voice. You see, she was at a conference for the makers and users of synthetic voices.

PATEL: I was watching a demonstration of a new technology. And someone came up and said something in their synthesized voice, and then someone else came up.

SPIEGEL: Both were using Perfect Paul. Then a third person arrived - Perfect Paul; then another.

PATEL: It was the same voice, saying different things. And sometimes they were saying the same phrase but off by a few seconds. And so it felt like it was this like, echo that was going on. And it was just a strange feeling.

SPIEGEL: Standing there in the middle of these radically different people with the exact same voice, Patel had an idea.

PATEL: Isn't there something we can do to make these voices more individuated?

SPIEGEL: So around seven years ago, Rupal Patel started working to change synthetic voices. Now, to produce speech, there are two things involved. The source of speech comes from the voice box, which vibrates to produce sound; then, the mouth shapes those vibrations into speech. And in many people who have disorders, it's mainly the second part of the system that doesn't work.

PATEL: In people with speech disorder, the source is pretty preserved. And I thought, well, that's where the melody is; that's where someone's identity is, in terms of their vocal identity.

SPIEGEL: So Patel decided to capture the melody of a voice by asking kids with speech disorders - and she mostly works with kids - to come into her lab and do something really simple.

PATEL: We need them to say a sustained sound; like, they say ahhhh.

SPIEGEL: Patel can then take that sound, run it through a computer, and find out all kinds of things about how that person would sound if that person could talk.

PATEL: Can determine their pitch, the loudness, the breathiness of their voice, the changes in clarity.

SPIEGEL: She then takes the voice of what she calls a healthy donor - for example, the voice of a child roughly the same age as the disordered child she's trying to help - and gets them to say a large number of words so that she can sample the sounds they produce. She then combines that voice with the pitch, breathiness, etc., of the disordered child. Patel plays me examples of different voices she's created. And if you listen, you can clearly hear different pitch and clarity.

CREATED VOICE #1: Rice is often served in round bowls.

CREATED VOICE #2: Rice is often served in round bowls.

SPIEGEL: Voices individuated. Unique. Which brings us back to Samantha Grimaldo.

SAMANTHA GRIMALDO: I don't want the people hear.

SPIEGEL: When Patel was getting started, Samantha was one of the first voice-disordered kids who came to her lab to give a voice sample.

SAMANTHA GRIMALDO: Ahhhh.

SPIEGEL: Now, at the time, Patel wasn't at the stage where she was actually constructing voices. But she's since - obviously - figured it out and recently, she created a new voice with Samantha's sample. Last week, she gave the voice to Ruane and Samantha so they could hear it. Now, this voice was constructed from a sample taken when Samantha was much younger. For a current version of Samantha's voice, you would need to take another sample. Still, it was the first time that Samantha and her mother had heard anything close to Samantha's voice.

Ruane had listened early in the day, when Samantha was still at school. And the experience clearly moved her; made her realize, in a fresh way, how difficult it was for her that she had never heard her daughter's voice.

RUANE GRIMALDO: When I heard it, I thought yeah, that could be it. 'Cause I could hear - like my son, Nicholas, I could hear some of his voice in it.

SPIEGEL: And so in the afternoon, when Samantha got home from school, they sat down together.

RUANE GRIMALDO: Do you want to hear the voice?

SAMANTHA GRIMALDO: Yes.

SPIEGEL: The pitch of Samantha's voice, it turns out, is even higher than the other child voices you heard earlier and much, much higher than Heather's voice. It's clear and light.

SAMANTHA GRIMALDO: Rice is often served in round bowls.

SPIEGEL: Ruane told me over the phone that when Samantha heard the voice, her eyes lit up and a smile broke out on her face.

RUANE GRIMALDO: What do you think about that?

SAMANTHA GRIMALDO: Thoughts like me, my voice.

RUANE GRIMALDO: I think it sounds really happy? Don't you think?

SAMANTHA GRIMALDO: Like happy. Yes.

SPIEGEL: Now, individuated voices like these are not yet available to everyone. Patel has figured out how to do it - but not how to distribute it on all of the different devices people use. But Samantha's mother, Ruane, hopes that it will be available one day very soon.

PATEL: You need a voice. You need a voice.

SPIEGEL: Alix Spiegel, NPR News, Washington.

(SOUNDBITE OF MUSIC)

MONTAGNE: And if you' like to see photos of Samantha Grimaldo, go to npr.org. Transcript provided by NPR, Copyright NPR.

300x250 Ad

Support quality journalism, like the story above, with your gift right now.

Donate

New Voices For The Voiceless: Synthetic Speech Gets An Upgrade

Transcript

More Morning Edition

Crowdsourced voting fraud claims could become grist for Republican lawsuits

After a chaotic vote count in 2020, here's what Detroit will do differently this year

Support quality journalism, like the story above, with your gift right now.