All the samples are generated by our model, SoundReactor-ECT (NFE=4), on the causal stereo full-band VAE. The model is trained on the VGGSound[1] dataset.