Voice Conversion Challenge 2016(VCC 2016) データセットは、音声処理に有用な、パラレル音源データセットを提供してくれる。その特性をまとめる。

基本情報

10話者のパラレル発話¹162+54センテンス²を含むデータセット.

S: Source
T: Target
M: male
F: female

を意味しており³、SF1 ~ SF3, SM1 & SM2, TF1 ~ TF3, TM1 & TM2の10話者データがある。
同じファイル名 (100001.wavなど) は同じ内容の発話 ⁴
16 kHz⁵, 16-bit⁶, RIFF/WAVE format⁷の形式。 and 54 utterances for evaluation from each of 5 source and 5 target speakers, ref

ダウンロード

ここ
VCC training data: training data released to participants during the challenge (23.30Mb)には10話者各162発話の (challenge時にtrainingとして使われた) データがある。
evaluationがなんか歯抜けで入っており、よくわからない

url_prefix = 'https://datashare.is.ed.ac.uk/bitstream/handle/10283/2211/'
data_files = ['vcc2016_training.zip', 'evaluation_all.zip']

このリンクを使ってダウンロードすると全部取ってこれるのだが…なんなんだ？

1~6の引用元はここ
ref

> Each speaker utters the same sentence ↩
> a common dataset consisting of 162 utterances for training↩
> ’S' denotes ‘source’, ’T' denotes ‘target’, while ’M' and ‘F’ for ‘male’ and ‘female’, respectively. ↩
> The same file name means the same linguistic content ↩
> The sampling rate is 16 kHz↩
> stored in 16-bit format.↩
> The waveforms in the directory are in RIFF/WAVE format. ↩

たれぱんのびぼーろく

わたしの備忘録、生物学とプログラミングが多いかも

VCC2016データセット

基本情報

ダウンロード