たれぱんのびぼーろく

わたしの備忘録、生物学とプログラミングが多いかも

Parallel WaveGAN

TTS

Transformer-based parameter estimator + Parallel WaveGAN vocoder

ref: Transformer TTS & FastSpeech
Base: FastSpeech

  • i/o: phoneme sequences + accent -> mel-spectrograms
  • model: a six-layer encoder and a six-layer decoder (each was based on 8 multi-head attention)
    • opt: RAdam optimizer
    • lr: warmup learning rate scheduling (1.0 at start,
    • dynamic batch size (average 64) strategy
  • training
    • epoch: 1000

accent as an external input for pitch accent language (e.g., Japanese) [29]