首页 /研究 /Voice Mapping of Text-to-Speech Systems: A Metric-Based Approach for Voice Quality Assessment

OTHER

Voice Mapping of Text-to-Speech Systems: A Metric-Based Approach for Voice Quality Assessment

Huanchen Cai, Sten Ternström

发表年份: 2026
访问权限: 开放获取

摘要

This study investigates voice mapping as an evaluation framework for text-to-speech (TTS) synthesis quality. The study analyzes six TTS models, including historical and recent ones. The metrics are crest factor, spectrum balance, and cepstral peak prominence (CPPs). We investigated 6 influential TTS models: Merlin, Tacotron 2, Transformer TTS, FastSpeech 2, Glow-TTS, and VITS. The results demonstrate that voice range serves as a primary indicator of model capability, with VITS showing the largest range among tested models. Glow-TTS exhibited superior performance in soft phonation, indicated by higher spectrum balance, despite limited voice range. The results showed that the CPPs values between 7-8 dB indicate natural voice quality, while with CPPs exceeding 10 dB, the speech tends to sound robotic. These findings underscore the need for voice mapping to evaluate vocal effort, and capture how TTS systems handle voice dynamic and expressiveness.

关键词

eess.AScs.AIeess.SP

Voice Mapping of Text-to-Speech Systems: A Metric-Based Approach for Voice Quality Assessment

摘要

关键词

相关论文

Statistical Learning Theory

Fractional Differential Equations

Applied Nonlinear Control

Genetic Programming: On the Programming of Computers by Means of Natural Selection