Sound demos for “Incremental Text-to-Speech Synthesis with Prefix-to-Prefix Framework”
Groundtruth: Vocoder with groundtruth-mel:
Full-sentence | Our lookahead-2 k1=1,k2=1 | Our lookahead-1 k1=1,k2=0 | Our lookahead-0 k1=0,k2=0 | Yanagita et al. (2019) 2 word | Yanagita et al. (2019) 1 word | Yanagita et al. (2019) lookahead-0 | Lookahead-0-indep |
---|---|---|---|---|---|---|---|
latency: 0.47s | latency: 0.21s | latency: 0.14s | latency: 0.14s | latency: 0.28s | latency: 0.06s | latency: 0.23s | latency: 0.17s |
Groundtruth: Vocoder with groundtruth-mel:
Full-sentence | Our lookahead-2 k1=1,k2=1 | Our lookahead-1 k1=1,k2=0 | Our lookahead-0 k1=0,k2=0 | Yanagita et al. (2019) 2 word | Yanagita et al. (2019) 1 word | Yanagita et al. (2019) lookahead-0 | Lookahead-0-indep |
---|---|---|---|---|---|---|---|
latency: 0.45s | latency: 0.24s | latency: 0.17s | latency: 0.16s | latency: 0.16s | latency: 0.11s | latency: 0.16s | latency: 0.14s |
Groundtruth: Vocoder with groundtruth-mel:
Full-sentence | Our lookahead-2 k1=1,k2=1 | Our lookahead-1 k1=1,k2=0 | Our lookahead-0 k1=0,k2=0 | Yanagita et al. (2019) 2 word | Yanagita et al. (2019) 1 word | Yanagita et al. (2019) lookahead-0 | Lookahead-0-indep |
---|---|---|---|---|---|---|---|
latency: 0.69s | latency: 0.19s | latency: 0.12s | latency: 0.11s | latency: 0.15s | latency: 0.08s | latency: 0.14s | latency: 0.15s |
Groundtruth: Vocoder with groundtruth-mel:
Full-sentence | Our lookahead-2 k1=1,k2=1 | Our lookahead-1 k1=1,k2=0 | Our lookahead-0 k1=0,k2=0 | Yanagita et al. (2019) 2 word | Yanagita et al. (2019) 1 word | Yanagita et al. (2019) lookahead-0 | Lookahead-0-indep |
---|---|---|---|---|---|---|---|
latency: 0.56s | latency: 0.24s | latency: 0.12s | latency: 0.11s | latency: 0.14s | latency: 0.12s | latency: 0.14s | latency: 0.14s |
Groundtruth: Vocoder with groundtruth-mel:
Full-sentence | Our lookahead-2 k1=1,k2=1 | Our lookahead-1 k1=1,k2=0 | Our lookahead-0 k1=0,k2=0 | Yanagita et al. (2019) 2 word | Yanagita et al. (2019) 1 word | Yanagita et al. (2019) lookahead-0 | Lookahead-0-indep |
---|---|---|---|---|---|---|---|
latency: 0.91s | latency: 0.29s | latency: 0.21s | latency: 0.20s | latency: 0.17s | latency: 0.09s | latency: 0.16s | latency: 0.14s |
Groundtruth: Vocoder with groundtruth-mel:
Full-sentence | Our lookahead-2 k1=1,k2=1 | Our lookahead-1 k1=1,k2=0 | Our lookahead-0 k1=0,k2=0 | Yanagita et al. (2019) 2 word | Yanagita et al. (2019) 1 word | Yanagita et al. (2019) lookahead-0 | Lookahead-0-indep |
---|---|---|---|---|---|---|---|
latency: 1.26s | latency: 0.20s | latency: 0.15s | latency: 0.13s | latency: 0.18s | latency: 0.12s | latency: 0.17s | latency: 0.13s |
Groundtruth: Vocoder with groundtruth-mel:
Full-sentence | Our lookahead-2 k1=1,k2=1 | Our lookahead-1 k1=1,k2=0 | Our lookahead-0 k1=0,k2=0 | Yanagita et al. (2019) 2 word | Yanagita et al. (2019) 1 word | Yanagita et al. (2019) lookahead-0 | Lookahead-0-indep |
---|---|---|---|---|---|---|---|
latency: 1.27s | latency: 0.28s | latency: 0.17s | latency: 0.16s | latency: 0.17s | latency: 0.12s | latency: 0.16s | latency: 0.17s |
Groundtruth: Vocoder with groundtruth-mel:
Full-sentence | Our lookahead-2 k1=1,k2=1 | Our lookahead-1 k1=1,k2=0 | Our lookahead-0 k1=0,k2=0 | Lookahead-0-indep |
---|---|---|---|---|
latency: 0.65s | latency: 0.17s | latency: 0.10s | latency: 0.10s | latency: 0.15s |
Groundtruth: Vocoder with groundtruth-mel:
Full-sentence | Our lookahead-2 k1=1,k2=1 | Our lookahead-1 k1=1,k2=0 | Our lookahead-0 k1=0,k2=0 | Lookahead-0-indep |
---|---|---|---|---|
latency: 0.66s | latency: 0.10s | latency: 0.05s | latency: 0.04s | latency: 0.09s |
Groundtruth: Vocoder with groundtruth-mel:
Full-sentence | Our lookahead-2 k1=1,k2=1 | Our lookahead-1 k1=1,k2=0 | Our lookahead-0 k1=0,k2=0 | Lookahead-0-indep |
---|---|---|---|---|
latency: 1.06s | latency: 0.12s | latency: 0.05s | latency: 0.04s | latency: 0.10s |
Groundtruth: Vocoder with groundtruth-mel:
Full-sentence | Our lookahead-2 k1=1,k2=1 | Our lookahead-1 k1=1,k2=0 | Our lookahead-0 k1=0,k2=0 | Lookahead-0-indep |
---|---|---|---|---|
latency: 0.56s | latency: 0.16s | latency: 0.05s | latency: 0.01s | latency: 0.08s |
Groundtruth: Vocoder with groundtruth-mel:
Full-sentence | Our lookahead-2 k1=1,k2=1 | Our lookahead-1 k1=1,k2=0 | Our lookahead-0 k1=0,k2=0 | Lookahead-0-indep |
---|---|---|---|---|
latency: 0.66s | latency: 0.12s | latency: 0.06s | latency: 0.05s | latency: 0.11s |
Groundtruth: Vocoder with groundtruth-mel:
Full-sentence | Our lookahead-2 k1=1,k2=1 | Our lookahead-1 k1=1,k2=0 | Our lookahead-0 k1=0,k2=0 | Lookahead-0-indep |
---|---|---|---|---|
latency: 0.89s | latency: 0.12s | latency: 0.06s | latency: 0.05s | latency: 0.11s |