Parallelizing Autoregressive Era with Variational State Area Fashions
Authors: Gaspard Lambrechts, Yann Claes, Pierre Geurts, Damien Ernst
Summary: Consideration-based fashions reminiscent of Transformers and recurrent fashions like state house fashions (SSMs) have emerged as profitable strategies for autoregressive sequence modeling. Though each allow parallel coaching, none allow parallel technology because of their autoregressiveness. We suggest the variational SSM (VSSM), a variational autoencoder (VAE) the place each the encoder and decoder are SSMs. Since sampling the latent variables and decoding them with the SSM could be parallelized, each coaching and technology could be performed in parallel. Furthermore, the decoder recurrence permits technology to be resumed with out reprocessing the entire sequence. Lastly, we suggest the autoregressive VSSM that may be conditioned on a partial realization of the sequence, as is widespread in language technology duties. Apparently, the autoregressive VSSM nonetheless allows parallel technology. We spotlight on toy issues (MNIST, CIFAR) the empirical features in speed-up and present that it competes with conventional fashions when it comes to technology high quality (Transformer, Mamba SSM)