New Voices For The Voiceless: Synthetic Speech Gets An Upgrade
STEVE INSKEEP, HOST:
On a Monday, it's MORNING EDITION from NPR News. I'm Steve Inskeep.
RENEE MONTAGNE, HOST:
And I'm Renee Montagne. Today in "Your Health," we hear from a teenager who cannot speak for herself, and from the researcher trying to give that teenager her voice back. NPR's Alix Spiegel has been learning about the synthetic voices that are often used by people who are physically unable to talk, and how they're evolving.
ALIX SPIEGEL, BYLINE: Let's begin with an introduction.
SAMANTHA GRIMALDO: My name is Samantha Grimaldo.
SPIEGEL: Samantha Grimaldo, the 17-year-old girl you just heard introduce herself, was born with a rare disorder - Perisylvian syndrome. This means that while Samantha is physically capable in many ways, she's never been able to speak. And so ever since she was a small child, Grimaldo has had to carry her voice with her. Her mother, Ruane Grimaldo, says that when she was young, the voice she used came in a heavy, gray box.
RUANE GRIMALDO: She used to have to carry this device around that was at least 4 or 5 pounds, and she was only like, 70 pounds herself. And the poor thing had to carry this back and forth to school every day on the school bus.
SPIEGEL: It was miserable having to lug her voice around that way, this clunky box sitting on the seat next to her. Fortunately today, Samantha's voice takes up much less space. She types what she wants to say into a special program on an iPad, and a voice in the program says her words out loud. Still, Grimaldo doesn't like to use this voice, if she can help it. Her mother has noticed that when the family goes out to restaurants, Samantha much prefers to write her orders out.
RUANE GRIMALDO: Why don't you like to use it all the time?
SAMANTHA GRIMALDO: Because that weird.
RUANE GRIMALDO: Because it's weird, she said.
SPIEGEL: This voice - the one that you just heard describe itself as weird - is one of just a small number of voices available to people who cannot speak for themselves. Like the other voices, this voice has a name: Heather. It's a nice enough name; easygoing, accessible. But Samantha doesn't seem impressed.
SAMANTHA GRIMALDO: I don't like Heather voice.
RUANE GRIMALDO: Why don't you like it?
SAMANTHA GRIMALDO: Older.
RUANE GRIMALDO: Oh. She said it sounds older.
SAMANTHA GRIMALDO: Yes.
RUANE GRIMALDO: Yeah.
SPIEGEL: Samantha Grimaldo, as I said, is 17 years old. And so the sound of Heather's voice - deep, methodical, mature - doesn't exactly align with her image of herself. And like any teenager, she feels self-conscious.
SAMANTHA GRIMALDO: I don't want the people hear.
SPIEGEL: If you don't have a voice, who speaks for you? Today, there are around 60 different options - voices like Heather. But really, for the majority of people who use a synthetic voice, there's a single answer to that question: Perfect Paul. Perfect Paul is the voice that speaks for you. Here's Perfect Paul describing the weather.
PERFECT PAUL: The eastern United States-Canada area, including the Eastern United States...
RUPAL PATEL: Sort of this robotic, adult male voice; it's what you think of when you think of a computer talking.
SPIEGEL: This is a speech scientist named Rupal Patel, who's on the faculty of Northeastern University. Patel estimates that between 50 and 60 percent of the people who use synthetic voices use Perfect Paul because the voice of Perfect Paul is seen as easier to understand than other synthesized voices. So for years - whether you were a man or woman, 4 or 40 - you used Perfect Paul; which actually, is how Rupal Patel first got the idea that it was time for people like 17-year-old Samantha to have a different kind of voice. You see, she was at a conference for the makers and users of synthetic voices.
PATEL: I was watching a demonstration of a new technology. And someone came up and said something in their synthesized voice, and then someone else came up.
SPIEGEL: Both were using Perfect Paul. Then a third person arrived - Perfect Paul; then another.
PATEL: It was the same voice, saying different things. And sometimes they were saying the same phrase but off by a few seconds. And so it felt like it was this like, echo that was going on. And it was just a strange feeling.
SPIEGEL: Standing there in the middle of these radically different people with the exact same voice, Patel had an idea.
PATEL: Isn't there something we can do to make these voices more individuated?
SPIEGEL: So around seven years ago, Rupal Patel started working to change synthetic voices. Now, to produce speech, there are two things involved. The source of speech comes from the voice box, which vibrates to produce sound; then, the mouth shapes those vibrations into speech. And in many people who have disorders, it's mainly the second part of the system that doesn't work.
PATEL: In people with speech disorder, the source is pretty preserved. And I thought, well, that's where the melody is; that's where someone's identity is, in terms of their vocal identity.
SPIEGEL: So Patel decided to capture the melody of a voice by asking kids with speech disorders - and she mostly works with kids - to come into her lab and do something really simple.
PATEL: We need them to say a sustained sound; like, they say ahhhh.
SPIEGEL: Patel can then take that sound, run it through a computer, and find out all kinds of things about how that person would sound if that person could talk.
PATEL: Can determine their pitch, the loudness, the breathiness of their voice, the changes in clarity.
SPIEGEL: She then takes the voice of what she calls a healthy donor - for example, the voice of a child roughly the same age as the disordered child she's trying to help - and gets them to say a large number of words so that she can sample the sounds they produce. She then combines that voice with the pitch, breathiness, etc., of the disordered child. Patel plays me examples of different voices she's created. And if you listen, you can clearly hear different pitch and clarity.
CREATED VOICE #1: Rice is often served in round bowls.
CREATED VOICE #2: Rice is often served in round bowls.
SPIEGEL: Voices individuated. Unique. Which brings us back to Samantha Grimaldo.
SAMANTHA GRIMALDO: I don't want the people hear.
SPIEGEL: When Patel was getting started, Samantha was one of the first voice-disordered kids who came to her lab to give a voice sample.
SAMANTHA GRIMALDO: Ahhhh.
SPIEGEL: Now, at the time, Patel wasn't at the stage where she was actually constructing voices. But she's since - obviously - figured it out and recently, she created a new voice with Samantha's sample. Last week, she gave the voice to Ruane and Samantha so they could hear it. Now, this voice was constructed from a sample taken when Samantha was much younger. For a current version of Samantha's voice, you would need to take another sample. Still, it was the first time that Samantha and her mother had heard anything close to Samantha's voice.
Ruane had listened early in the day, when Samantha was still at school. And the experience clearly moved her; made her realize, in a fresh way, how difficult it was for her that she had never heard her daughter's voice.
RUANE GRIMALDO: When I heard it, I thought yeah, that could be it. 'Cause I could hear - like my son, Nicholas, I could hear some of his voice in it.
SPIEGEL: And so in the afternoon, when Samantha got home from school, they sat down together.
RUANE GRIMALDO: Do you want to hear the voice?
SAMANTHA GRIMALDO: Yes.
SPIEGEL: The pitch of Samantha's voice, it turns out, is even higher than the other child voices you heard earlier and much, much higher than Heather's voice. It's clear and light.
SAMANTHA GRIMALDO: Rice is often served in round bowls.
SPIEGEL: Ruane told me over the phone that when Samantha heard the voice, her eyes lit up and a smile broke out on her face.
RUANE GRIMALDO: What do you think about that?
SAMANTHA GRIMALDO: Thoughts like me, my voice.
RUANE GRIMALDO: I think it sounds really happy? Don't you think?
SAMANTHA GRIMALDO: Like happy. Yes.
SPIEGEL: Now, individuated voices like these are not yet available to everyone. Patel has figured out how to do it - but not how to distribute it on all of the different devices people use. But Samantha's mother, Ruane, hopes that it will be available one day very soon.
PATEL: You need a voice. You need a voice.
SPIEGEL: Alix Spiegel, NPR News, Washington.
(SOUNDBITE OF MUSIC)
MONTAGNE: And if you' like to see photos of Samantha Grimaldo, go to npr.org. Transcript provided by NPR, Copyright NPR.