論文解説: Tian (2020) FeatherWave: An efficient high-fidelity neural vocoder with multi-band linear prediction

提案モデル: mel-spec input (pitch-less), multiband LPCNet¹

デモ

ConditioningNetwork

入力にmel-specをそのまま利用²^, ³^, ⁴（pitch無し⁵, 80 dim⁶）。

Mel2LPcoeff

LP係数はmel-specから計算⁷。各バンドではmel-specのうち特定バンドのみからLP係数を算出⁸。

Original Paper

@misc{2005.05551,
Author = {Qiao Tian and Zewang Zhang and Heng Lu and Ling-Hui Chen and Shan Liu},
Title = {FeatherWave: An efficient high-fidelity neural vocoder with multi-band linear prediction},
Year = {2020},
Eprint = {arXiv:2005.05551},
}

“we merged multi-band into LPCNet framework which only conditioned on mel spectrograms.”↩
“It consists of a condition network that operates on input frames of mel spectrograms” from the paper↩
“only mel spectrograms, which are widely used in neural TTS systems, are adopted as input conditional features.” from the paper↩
“Since only mel-spectrograms are used in condition network” from the paper↩
“Since we use mel-spectrograms to extract the LP filters, the proposed model doesn’t depend on pitch extraction.”↩
“The 80 order melspectrograms were extracted as the conditions for all neural vocoders”↩
“the LP coefficients were estimated from the melspectrograms”↩
“M order linear prediction coefficients of each sub frequency band, α^b_k, can be extracted from the corresponding frequency bins of mel-spectrogram frame.”↩

たれぱんのびぼーろく

わたしの備忘録、生物学とプログラミングが多いかも

論文解説: Tian (2020) FeatherWave: An efficient high-fidelity neural vocoder with multi-band linear prediction

デモ

ConditioningNetwork

Mel2LPcoeff

Original Paper