Incremenal Text-to-Speech demos

Sound demos for “Incremental Text-to-Speech Synthesis with Prefix-to-Prefix Framework”

English

1. This courtroom charisma is like the opposite of the repulsion I create everywhere else in life.

Groundtruth: Vocoder with groundtruth-mel:

Full-sentence Our lookahead-2
k1=1,k2=1
Our lookahead-1
k1=1,k2=0
Our lookahead-0
k1=0,k2=0
Yanagita et al. (2019)
2 word
Yanagita et al. (2019)
1 word
Yanagita et al. (2019)
lookahead-0
Lookahead-0-indep
latency: 0.47s latency: 0.21s latency: 0.14s latency: 0.14s latency: 0.28s latency: 0.06s latency: 0.23s latency: 0.17s

2. Some days you’re butch , some days fem.

Groundtruth: Vocoder with groundtruth-mel:

Full-sentence Our lookahead-2
k1=1,k2=1
Our lookahead-1
k1=1,k2=0
Our lookahead-0
k1=0,k2=0
Yanagita et al. (2019)
2 word
Yanagita et al. (2019)
1 word
Yanagita et al. (2019)
lookahead-0
Lookahead-0-indep
latency: 0.45s latency: 0.24s latency: 0.17s latency: 0.16s latency: 0.16s latency: 0.11s latency: 0.16s latency: 0.14s

3. The second one I am thinking was a man who used her bib.

Groundtruth: Vocoder with groundtruth-mel:

Full-sentence Our lookahead-2
k1=1,k2=1
Our lookahead-1
k1=1,k2=0
Our lookahead-0
k1=0,k2=0
Yanagita et al. (2019)
2 word
Yanagita et al. (2019)
1 word
Yanagita et al. (2019)
lookahead-0
Lookahead-0-indep
latency: 0.69s latency: 0.19s latency: 0.12s latency: 0.11s latency: 0.15s latency: 0.08s latency: 0.14s latency: 0.15s

4. Must be a fellow marathoner thing.

Groundtruth: Vocoder with groundtruth-mel:

Full-sentence Our lookahead-2
k1=1,k2=1
Our lookahead-1
k1=1,k2=0
Our lookahead-0
k1=0,k2=0
Yanagita et al. (2019)
2 word
Yanagita et al. (2019)
1 word
Yanagita et al. (2019)
lookahead-0
Lookahead-0-indep
latency: 0.56s latency: 0.24s latency: 0.12s latency: 0.11s latency: 0.14s latency: 0.12s latency: 0.14s latency: 0.14s

5. I often put laxatives in my dishwasher to help relax my bowls.

Groundtruth: Vocoder with groundtruth-mel:

Full-sentence Our lookahead-2
k1=1,k2=1
Our lookahead-1
k1=1,k2=0
Our lookahead-0
k1=0,k2=0
Yanagita et al. (2019)
2 word
Yanagita et al. (2019)
1 word
Yanagita et al. (2019)
lookahead-0
Lookahead-0-indep
latency: 0.91s latency: 0.29s latency: 0.21s latency: 0.20s latency: 0.17s latency: 0.09s latency: 0.16s latency: 0.14s

6. Still lot of years left to compile more stats.

Groundtruth: Vocoder with groundtruth-mel:

Full-sentence Our lookahead-2
k1=1,k2=1
Our lookahead-1
k1=1,k2=0
Our lookahead-0
k1=0,k2=0
Yanagita et al. (2019)
2 word
Yanagita et al. (2019)
1 word
Yanagita et al. (2019)
lookahead-0
Lookahead-0-indep
latency: 1.26s latency: 0.20s latency: 0.15s latency: 0.13s latency: 0.18s latency: 0.12s latency: 0.17s latency: 0.13s

7. Worry is the interest paid in advance on a debt you may never owe.

Groundtruth: Vocoder with groundtruth-mel:

Full-sentence Our lookahead-2
k1=1,k2=1
Our lookahead-1
k1=1,k2=0
Our lookahead-0
k1=0,k2=0
Yanagita et al. (2019)
2 word
Yanagita et al. (2019)
1 word
Yanagita et al. (2019)
lookahead-0
Lookahead-0-indep
latency: 1.27s latency: 0.28s latency: 0.17s latency: 0.16s latency: 0.17s latency: 0.12s latency: 0.16s latency: 0.17s

Chinese

1. 钱伟长想到上海来办学校是经过深思熟虑的。

Groundtruth: Vocoder with groundtruth-mel:

Full-sentence Our lookahead-2
k1=1,k2=1
Our lookahead-1
k1=1,k2=0
Our lookahead-0
k1=0,k2=0
Lookahead-0-indep
latency: 0.65s latency: 0.17s latency: 0.10s latency: 0.10s latency: 0.15s

2. 遇到颠簸时,应听从乘务员的安全指令,回座位坐好。

Groundtruth: Vocoder with groundtruth-mel:

Full-sentence Our lookahead-2
k1=1,k2=1
Our lookahead-1
k1=1,k2=0
Our lookahead-0
k1=0,k2=0
Lookahead-0-indep
latency: 0.66s latency: 0.10s latency: 0.05s latency: 0.04s latency: 0.09s

3. 一种表示商品所有权的财物证券,也称商品证券,如提货单、交货单。

Groundtruth: Vocoder with groundtruth-mel:

Full-sentence Our lookahead-2
k1=1,k2=1
Our lookahead-1
k1=1,k2=0
Our lookahead-0
k1=0,k2=0
Lookahead-0-indep
latency: 1.06s latency: 0.12s latency: 0.05s latency: 0.04s latency: 0.10s

4. 从运行轨迹上来说,它也不可能是星星。

Groundtruth: Vocoder with groundtruth-mel:

Full-sentence Our lookahead-2
k1=1,k2=1
Our lookahead-1
k1=1,k2=0
Our lookahead-0
k1=0,k2=0
Lookahead-0-indep
latency: 0.56s latency: 0.16s latency: 0.05s latency: 0.01s latency: 0.08s

5. 路上关卡很多,为了方便撤离,只好轻装前进。

Groundtruth: Vocoder with groundtruth-mel:

Full-sentence Our lookahead-2
k1=1,k2=1
Our lookahead-1
k1=1,k2=0
Our lookahead-0
k1=0,k2=0
Lookahead-0-indep
latency: 0.66s latency: 0.12s latency: 0.06s latency: 0.05s latency: 0.11s

6. 这场抗议活动究竟是如何发展演变的,又究竟是谁伤害了谁?

Groundtruth: Vocoder with groundtruth-mel:

Full-sentence Our lookahead-2
k1=1,k2=1
Our lookahead-1
k1=1,k2=0
Our lookahead-0
k1=0,k2=0
Lookahead-0-indep
latency: 0.89s latency: 0.12s latency: 0.06s latency: 0.05s latency: 0.11s