HQ-VAE: Hierarchical Discrete Illustration Studying with Variational Bayes
Authors: Yuhta Takida, Yukara Ikemiya, Takashi Shibuya, Kazuki Shimada, Woosung Choi, Chieh-Hsin Lai, Naoki Murata, Toshimitsu Uesaka, Kengo Uchida, Wei-Hsiang Liao, Yuki Mitsufuji
Summary: Vector quantization (VQ) is a method to deterministically be taught options with discrete codebook representations. It’s generally carried out with a variational autoencoding mannequin, VQ-VAE, which may be additional prolonged to hierarchical buildings for making high-fidelity reconstructions. Nonetheless, such hierarchical extensions of VQ-VAE usually undergo from the codebook/layer collapse difficulty, the place the codebook will not be effectively used to specific the info, and therefore degrades reconstruction accuracy. To mitigate this downside, we suggest a novel unified framework to stochastically be taught hierarchical discrete illustration on the premise of the variational Bayes framework, referred to as hierarchically quantized variational autoencoder (HQ-VAE). HQ-VAE naturally generalizes the hierarchical variants of VQ-VAE, corresponding to VQ-VAE-2 and residual-quantized VAE (RQ-VAE), and offers them with a Bayesian coaching scheme. Our complete experiments on picture datasets present that HQ-VAE enhances codebook utilization and improves reconstruction efficiency. We additionally validated HQ-VAE when it comes to its applicability to a distinct modality with an audio dataset.