提案モデル: mel-spec input (pitch-less), multiband LPCNet1
デモ
中国語デモ
wavecoder.github.io
ConditioningNetwork
入力にmel-specをそのまま利用2, 3, 4(pitch無し5, 80 dim6)。
Mel2LPcoeff
LP係数はmel-specから計算7。各バンドではmel-specのうち特定バンドのみからLP係数を算出8。
Original Paper
@misc{2005.05551, Author = {Qiao Tian and Zewang Zhang and Heng Lu and Ling-Hui Chen and Shan Liu}, Title = {FeatherWave: An efficient high-fidelity neural vocoder with multi-band linear prediction}, Year = {2020}, Eprint = {arXiv:2005.05551}, }
-
“we merged multi-band into LPCNet framework which only conditioned on mel spectrograms.”↩
-
“It consists of a condition network that operates on input frames of mel spectrograms” from the paper↩
-
“only mel spectrograms, which are widely used in neural TTS systems, are adopted as input conditional features.” from the paper↩
-
“Since only mel-spectrograms are used in condition network” from the paper↩
-
“Since we use mel-spectrograms to extract the LP filters, the proposed model doesn’t depend on pitch extraction.”↩
-
“The 80 order melspectrograms were extracted as the conditions for all neural vocoders”↩
-
“the LP coefficients were estimated from the melspectrograms”↩
-
“M order linear prediction coefficients of each sub frequency band, αbk, can be extracted from the corresponding frequency bins of mel-spectrogram frame.”↩