An estimated 30,000 people in Japan have lost their voices due to laryngeal cancer or similar conditions. The impact of no longer being able to speak with one's own voice is profound; quality of life decreases because of both difficulties in communication and the psychological pain of losing one's identity. In 2021, Nagoya University Hospital launched the "Save the Voice Project" to help address these issues. The project collects audio recordings from patients, taken before they have their vocal cords removed, and uses these data to recreate their voices.
Lose Your Voice or Lose Your Life: A Heart-Wrenching Decision
Associate Professor Naoki Nishio of the Nagoya University Graduate School of Medicine leads the project. As an otolaryngologist at Nagoya University Hospital, he serves as the attending physician for patients with cancers of the mouth and throat. Dr. Nishio has repeatedly witnessed situations where a patient is forced to make the devastating choice between saving their voice or saving their life.
Although alternative vocalization methods are available after vocal cord removal, the shock of losing one's own voice is immeasurable for many patients. One such method is the electrolarynx, a device that allows patients to speak without additional surgery. However, its sound is mechanical and lacks inflection. Many patients are hesitant to use it in public, and this self-consciousness about how others perceive them may lead to social withdrawal.
Encountering Voice Conversion Technology
Dr. Nishio had performed surgeries while reassuring himself with the notion that "it is necessary to save lives," but he never stopped wondering if there could be a better way. One day, he was watching a TV program where a person with amyotrophic lateral sclerosis (ALS) communicated using a technology that converts gaze-inputted text into speech. Dr. Nishio was struck by inspiration: "If we were to record someone speaking before surgery and convert it, could they speak again in their own voice?"
He immediately began searching for a researcher knowledgeable in voice conversion and discovered that Professor Tomoki Toda of the Nagoya University Information Technology Center had been working on vocal support research for many years. Professor Toda had previously worked on converting electrolarynx voices into more human-like speech, so the project took off quickly.
Professor Toda, who led the development of the voice conversion technology at the core of the vocal support system, said, "Collaborating with Dr. Nishio and other medical researchers, and drawing on their expertise, brought a strong sense of fulfillment in tackling the shared challenge of voice support."
Associate Professor Nishio graduated from Nagoya University School of Medicine. He remarks, "Nagoya University has a calm atmosphere, making it an ideal environment for research."
Recording the Voice Before Surgery and Reproducing It Through Speech Synthesis
This project focuses on developing technology that converts the mechanical sound of the electrolarynx into the person's own voice. Patients scheduled to undergo vocal cord removal record approximately 500 example sentences in advance, allowing their voices to be preserved. After surgery, they also record speech produced using the electrolarynx. By applying machine learning, these pre- and post-surgery voice samples are synthesized and converted, enabling them to be reproduced through a mobile application. The system uses machine learning to analyze each person's distinctive speech patterns and intonation, enabling a personalized voice reproduction. Because it is essential to record the voice before surgery, an easy-to-use voice recording application was also developed so that patients can record anywhere.
To date, a prototype of the device that converts electrolarynx speech into the patient's voice has been completed. Several people have already tested it and responded positively, noting that the output closely resembles their own voices.
To capture higher-quality audio, recordings are conducted in a studio. The example sentences being read aloud were developed by the Toda Laboratory and include the pronunciation elements necessary for voice conversion.
Challenges to Implementation
As the project has progressed, several challenges have emerged. One of the issues is the time lag that occurs during voice conversion when used in actual conversations. When converting speech after a phrase is fully spoken, the result closely resembles the patient's original voice with fewer errors. In real-time conversion, however, where minimal time lag is essential, there are frequent instances where the spoken words are inaccurately converted.
Dr. Nishio emphasized, "If patients cannot converse in real time, they will not use it. There are many people currently living without a voice, and we want to bring this into practical use as soon as possible."
Additionally, during the conversion process, the original mechanical voice from the electrolarynx can still be heard by others. To resolve this, the electrolarynx itself must be improved. There remain numerous technical hurdles before the system can be put into widespread use.
Accelerating the Project Through Crowdfunding
To address these issues, a crowdfunding campaign for the project was launched on March 3, 2025. In addition to raising funds to advance research for the voice conversion application, Dr. Nishio also aims to raise awareness, stating, "We want more people to know that this technology to preserve their voice exists."
Reflecting on the project's beginnings, he noted, "As doctors, we tend to focus on treating the 'disease' itself. But this has reminded me how crucial it is to focus on the 'patients living with the disease' as well." He emphasized the significance of the project, saying, "We want patients to feel more positive about undergoing vocal cord removal surgery and to reduce their anxiety as much as possible." The project continues to aim for a world where those who have lost their voices can communicate with others as naturally as anyone else.
Alongside the push for rapid implementation, the project also envisions expanding to other vocalization methods and global deployment.
The International Communications Office, Nagoya University wishes to thank the Public Relations Office, Nagoya University for the use of this article. It was originally written by Tatsuhiko Maruyama in Japanese for the "Public Relations Meidai" web magazine. It has been translated and edited for clarity and readability by the International Communications Office while preserving the original content's intent. This article was originally published on March 3, 2025, and some information may not be up to date. For the original in Japanese, please see here.