Videoage International October 2023

18 Can the science and technology behind space exploration improve the PA system in the New York City subway system? Not just yet, unfortunately, but what might actually be able to fix the garbled, scratchy, echoing sound is a technology called AI-driven synthetic neural voice services, which was developed, among other companies, by Speech Morphing, an 11-year-old firm based in San José, California. Over the past six years, Speech Morphing has developed, among other services, a technology that enables the recreation of audio voices that have been either damaged or lost. VideoAge had a lot of questions for the firm: Can AI systems enable automatic speech translation for dubbing, going beyond mere accuracy to achieve real expressiveness? Can damaged or lost voice tracks be seamlessly patched or recreated from scratch? Can AI voices spare human talent the drudgery of looping during post-processing? Can they help visually challenged audiences follow the on-screen action? And can any of this tantalizing potential be realized while protecting talent rights? The answers coming from Speech Morphing were complex, written in a Silicon Valley lingo that required four drafts, a series of e-mail clarifications and the involvement of Nadya Patrick, Speech Morphing’s CEO, the company’s chief linguist Mark Seligman, and the patience of Ettore Botta, president of the Burbank, California-based SpaceWow, and a media consultant for Speech Morphing who was tasked with translating hardware lingo into software-comprehensible language. The first challenge concerned the term “neural”, as in “synthetic neural voice.” The term “neural” was described by Seligman. “Most people in technical circles have heard of neural networks, even if they don’t know what they are or how they work.” He then explained that it “refers to networks that are utilized for machine-learning, the technology that allows computers to learn from examples.” After that there was a need to understand what exactly “speech synthesis” is, and Seligman said that it “is basically artificial voices, also known as text-to-speech (TTS). The process consists of giving a text segment to the computer and receiving an artificial speech segment in return.” With these notions we were now able to proceed to explore what Speech Morphing Inc. (SMI) actually does, and for that we turned to Patrick: “Using an actor’s voice”, she began, “SMI can dub a movie, and in the actor’s voice, enable them to speak another language. This includes removing the delays and maintaining sync with the actor’s lips”, she said, adding: “When you play a movie or a television show performed in the language originally used, you have the options of playing it in your native language and/or adding subtitles. However, as many, I suspect, have experienced, their lips are moving, and a delay occurs, or their lips don’t quite match the words we are hearing from our television set. This is no longer true. Artificial intelligence has changed the way we now entertain ourselves.” We were told that SMI has developed software for text-to-speech applications, dubbing, and voice synthesis with the ability to master emotion and voice modeling. The company crosses between entertainment and educational television, to speech restoration and reconstruction, to audio descriptions for the blind and partially blind, and finally, to voice cloning for the speech impaired or disabled. An inspiring case for speech synthesis involves the requirement for audio descriptions for sight-impaired audiences. Since these participants can’t see the onscreen actions, they can benefit from audio descriptions of them. These spoken descriptions, mandated by law in some countries, can be automatically generated. SMI can reconstruct damaged audio. Iconic speeches, movies, and interviews can be restored back to their original-sounding presentations without the scratchy sound in the background. Seligman then introduced the human element: “An especially pressing concern about simulation tech relates to labor and compensation issues”, he said, “Till now, all of the voice needs have been met by human talents. If speech synthesis is used instead, much cost saving, flexibility, and convenience can be gained. However, all the related talent must be fully compensated. For example, when actors’ voices are used for dubbing in any language, fair compensation should be in the contracts.” Seligman also explained, “Overall, as progress in artificial intelligence charges ahead, traditional practices will need to adjust, and artificial voice development will be no exception.” Meanwhile he added, “talent rights can be protected in two ways: contractual and technical. For the former, talents should insist on including compensation for use of their voices for artificial voice production. For the latter, technology exists to determine whether a voice segment is artificial or not, and, if it is artificial, to determine facts about its production, including information about the original voice.” As far as the costs and timeframes for SMI services, Patrick reported that completing a 60-minute narration involving a single voice can be done in a couple of days. When it comes to pricing for a documentary, one voiceover and two voices for interviews (one female and one male), the total cost would be U.S.$1,800. For feature films with sync voice dubbing, that would be $7,000 and up. And for voice modeling, the base price is $5,000 and up. Finally, VideoAge asked if SMI would be willing to dub a program on spec (to be used as a sales pitch) with the understanding that if the show is sold, they would be dubbing the whole series for a pre-set fee. “Yes, we would”, answered Patrick. “The requirement to do on spec should not be a showstopper.” She then proceeded to illustrate the model that they used for an Australia-based video distributor, where the video’s English narration was replaced with a synchronized Spanish narration with the original voice. “Spanish phrases were inserted into the Spanish voice track so that they matched the time spans of the original English phrases, with adjustments for differences of length”, she concluded. Decoding How AI Helps Dubbing And Protects Talent October 2023 Money Talks Iconic speeches. movies, and interviews can be restored ... without the scratchy sound in the background. Nadya Patrick, Speech Morphing’s CEO

RkJQdWJsaXNoZXIy MTI4OTA5