Behind the Scenes Part IV
Audio link: https://www.youtube.com/watch?v=4UN-K8OIMCs
Ever since we began the video project on Interpreters and Music, we’ve felt a lot of energy from the interpreters who have been kind to join us on our initiative to explore the relation between AI and interpreters including such questions as: What will the future look like for these two? Will interpreters become a conduit for AI?
This all ultimately led us to compare AI and interpreters. We are on a journey to highlight professional interpreter’s strength over AI, and human voices are one of the areas we’d like to compare.
We have invited quite a few interpreters to participate in the “Interpreters and Voices” project. After only hearing the premise and description of the project, many interpreters have described the project as “interesting”, “great fun”, “wonderful”, “something new and exciting”, “fascinating”, and “riveting”. We thank all of the interpreters who we’ve reached for genuinely sharing their thoughts and we have included some of their ideas in this blog post.
Many of you may agree with us that AI recordings are usually boring, if not unsettling. The natural inflexions of speech are missing, and their robotic, monotonous tone drive us crazy! Even so, the threat of AI recording looms large as a dark cloud on the horizon. We don’t enjoy AI recordings at all and really wouldn’t want to see the world develop in a way where AI is primarily used over human voices. It would undoubtedly be a tremendous loss to humanity. It’s something that we truly hope won’t occur. It would really take away the joy and emotion that can only be experienced through the vehicle of human voices. By the way, if you know of any interesting AI recording samples, please do feel free to share them with us for our learning purposes.
AI is a wonderful tool of convenience and efficiency in modern times; but there are still certain things that AI simply cannot produce. Even though AI may be able to replicate the breathing, pauses, and quivering of human voices, these elements always sound timed and programmed. If spoken or read by a human, the words would be conveyed in different styles with emotions and be paced at different speeds. Let’s take the diversity and richness audio file as an example, it was read by 3 interpreters with 3 main styles, professional, expressive, and artistic. The reading speeds at times are normal, fast, or slow. Yet the combination, or mix, or interaction of these elements creates a charming, attractive, even sensational atmosphere that entices us to listen. Honestly, we’ve listened to it more than a handful of times! The more we listen to it, the more we feel the pleasure and ambience, as if we were brought to another world. We are totally immersed.
Distinctness between each speaker is something else that AI cannot replicate. In the recordings, we hear richness of styles—all of which communicate the individuality of the reader and storyteller. These styles can range from being personal, professional, emotional, poetic, artistic, expressive, explanatory, upbeat, engaging but not overemphasizing, measured, modulative, and sporadic in regards to pacing. We are also delighted to hear little off-script moments like singing instead of purely reading. Yeah, a little surprise is the spice that keeps the audience interested and entertained and it’s that exact element that keeps blowing us away!
For this project on human voices, the only instruction we gave the interpreters was to give the recording their best shot and then leave the rest for us. The recordings read by quite a few interpreters, not only sound different to the ear because of accent, gender, tone/pitch in their voices, but also the background each interpreter has come from. The interpreters are long time devotees to language and communication, and are also people with interests in poetry, storytelling, and cultural performances. The interpreters apply their own unique interpretations to reflect how they perceive the text. One interpreter might make a certain sentence or phrase sound more important or expressive, whereas another interpreter might read it in a lighter sense. We all parse and deliver messages differently despite general similarities.
Interpreters use their voices to produce work every day, therefore, it’s wonderful to aggregate their voices in a collage. We have rotated different voices, even deliberately paired up different voices, male and female, low and high, fast and slow, personal and professional to show contrast. The different takes on the recordings is where the teamwork gets really upgraded to exciting and magical levels. AI is limited in the sense that it lacks the kind of creativity and artistic expression that comes to humans so naturally. Humans are best at creating and putting our own spin on things. Having different interpreters read the same blog and then combining them all into one recording is a great way to showcase how individualistic we sound as humans even when we’re doing the same activity.
The participating interpreters took time out of their busy schedules to do the work pro bono. The common goal is to demonstrate that professional interpreters do so much better in recording than AI does. Hearing the interpreters’ recordings really helps drive home the difference between human individuality and the monotonous nature of AI. We really think this project is special because it helps to show something that everyone, not just people in our industry need to see. Please feel welcome to share this with everyone, but certainly not for the purposes of training AI! Last but not least, a final, big shout-out to the participating interpreters. Thank you so much for the wonderful and abundant voices. They are truly music to our ears. Working together, we have demonstrated the power and strength of professional interpreters in our voices over AI, and the journey must go on!
Originally we were thinking about comparing AI narration to humans by putting them one after another, but we realized that the contrast is not as significant as we initially believed. This is because humans are typically only able to endure AI narration for nothing more than a few paragraphs. After that, humans recognize the mechanical, robotic patterns that are inherent to AI narration. After realizing that, humans make the decision to stop listening. Please feel welcome to let us know after how much of the AI recording you listened to before you decided to give up! We think it will be a very interesting fact to explore!
Leave Your Comments Below