This is the demo page for the paper "Improving Emotional TTS with an Emotion Intensity Input from Unsupervised Extraction" submitted to SSW'21. It is currently for review purpose only.
System | Samples | ||||||
---|---|---|---|---|---|---|---|
angry | sad | happy | fearful | surprised | happy | neutral | |
baseline | |||||||
attention | |||||||
transformer | |||||||
rank | |||||||
copy synth |
UI of the listening test. 25 samples were randomly selected. Each one had to be rated on 5-scale MOS and in terms of perceived emotion at the same time.
Scaling | Samples | ||||||
---|---|---|---|---|---|---|---|
angry | sad | happy | fearful | surprised | happy | neutral | |
0 | |||||||
1 | |||||||
4 | |||||||
7 | |||||||
10 |