BEIJING, June 19, 2021 /PRNewswire/ -- iQIYI Inc. (NASDAQ:
IQ) ("iQIYI" or the "Company"), an innovative, market-leading
online entertainment company in China, is proud to announce that its
Multi-Speaker, Multi-Style, Voice Cloning Challenge ( "M2VoC" or
"the Challenge") successfully concluded this week, with the results
announced at the 2021 International Conference on Acoustics,
Speech, & Signal Processing (ICASSP2021). M2VoC, an
ICASSP2021 Signal Processing Grand Challenge, aimed at providing a
common sizable dataset and a fair test bed for benchmarking voice
cloning tasks. The flagship challenge attracted researchers from
both academia and the industry. In light of recent advances in
transfer learning, style transfer, speaker embedding, and factor
disentanglement—all of which foreshadow potential solutions to
low-resource voice cloning—iQIYI was excited to join forces with
other leading organizations to host M2VoC.
The Challenge attracted 153 teams from academic institutions and
Internet companies. The academic institutions represented included
Peking University, Tsinghua University, National Taiwan University,
The University of Crete, The
Institute of Automation of the Chinese Academy of Sciences,
University of Tsukuba, Nagoya University, Fudan University, and The
Chinese University of Hong Kong, among
others. Leading Internet companies including Huya, Microsoft, Didi
Chuxing, Tencent, and Netease, among
others, also fielded teams of their own.
The M2VoC had two main tracks, including one for teams working
from limited samples and one for very limited samples. In the
limited samples track, each team was provided with 100 available
training samples, each with a different speaking style. In the very
limited samples track, each team was provided with just five
training samples of different speaking styles. The organizers also
provided participants with two base datasets to be used for
building basic training models. Ultimately, a panel of expert
judges evaluated the outcomes according to four criteria:
similarity to the original speech, voice quality, style and
expressiveness, and pronunciation accuracy.
As the world's first Multi-speaker Multi-style Voice Cloning
Challenge, M2VoC brought together leading teams from industry and
academia at the cutting edge of voice cloning technology. A total
of 18 related papers were included in the Challenge, among which 6
papers were included in ICASSP2021.
The participating teams achieved remarkable results in various
areas including acoustic modeling, speaker representation,
vocoding, and speaker adaptation strategy. Their innovative
solutions can be applied in many scenarios, including internet
radio, UGC dubbing, audiobooks, and stylized speech synthesis.
These advancements are well placed to help meet ever-advancing
voice customization needs, especially in multi-style, low-quality
speech scenarios.
The M2VoC showcased the excellent performance of current speech
cloning techniques. The Challenge also demonstrated that with
advances in deep learning, speech cloning based on limited samples
could deliver competitive outcomes, but speech cloning based on a
single sample remains an unsolved challenge. In real-world
scenarios that require speech cloning applications, the impacts of
low quality (noisy) audio as well as the time/cost constraints for
training/adaptation/inference are also key factors to be
considered.
Through hosting the Challenge, iQIYI hopes to provide more
opportunities for exploration of cutting-edge technologies such as
voice cloning and speech recognition, helping broaden the
application of AI technologies and open new development
possibilities for the audio-visual industry.
View original
content:http://www.prnewswire.com/news-releases/iqiyi-hosts-m2voc-challenge-with-6-papers-included-in-icassp2021-301315850.html
SOURCE iQIYI