たれぱんのびぼーろく

わたしの備忘録、生物学とプログラミングが多いかも

論文解説: Tian (2020) FeatherWave: An efficient high-fidelity neural vocoder with multi-band linear prediction

Paper

提案モデル: mel-spec input (pitch-less), multiband LPCNet1

デモ

中国語デモ
wavecoder.github.io

ConditioningNetwork

入力にmel-specをそのまま利用2, 3, 4(pitch無し5, 80 dim6)。

Mel2LPcoeff

LP係数はmel-specから計算7。各バンドではmel-specのうち特定バンドのみからLP係数を算出8

Original Paper

Paper

@misc{2005.05551,
Author = {Qiao Tian and Zewang Zhang and Heng Lu and Ling-Hui Chen and Shan Liu},
Title = {FeatherWave: An efficient high-fidelity neural vocoder with multi-band linear prediction},
Year = {2020},
Eprint = {arXiv:2005.05551},
}

  1. “we merged multi-band into LPCNet framework which only conditioned on mel spectrograms.”

  2. “It consists of a condition network that operates on input frames of mel spectrograms” from the paper

  3. “only mel spectrograms, which are widely used in neural TTS systems, are adopted as input conditional features.” from the paper

  4. “Since only mel-spectrograms are used in condition network” from the paper

  5. “Since we use mel-spectrograms to extract the LP filters, the proposed model doesn’t depend on pitch extraction.”

  6. “The 80 order melspectrograms were extracted as the conditions for all neural vocoders”

  7. “the LP coefficients were estimated from the melspectrograms”

  8. “M order linear prediction coefficients of each sub frequency band, αbk, can be extracted from the corresponding frequency bins of mel-spectrogram frame.”