Skip to main content


Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: Stochastic Restoration of Heavily Compressed Musical Audio using GANs (Read 717 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Stochastic Restoration of Heavily Compressed Musical Audio using GANs

Just found this very interesting paper:
We introduce a Generative Adversarial Network (GAN) architecture for the restoration of MP3-encoded musical audio signals. We train different stochastic and deterministic generators on MP3s with different compression rates.
Using these models, we investigate if
  • restorations of the models considerably improve the MP3 versions,
  • if we can systematically pick samples among the outputs of the stochastic generators which are closer to the original than such of the deterministic generators, and
  • if the stochastic generators generally output higher-quality restorations than the deterministic generators.
To that end, we perform an extensive evaluation of the different experiment setups utilizing objective metrics and listening tests. We find that the models are successful in points 1 and 2, but the random outputs of the stochastic generators are approximately on a par (i.e., do not improve) the overall quality compared to the deterministic models (point 3).
The proposed GAN architecture is based on dilated convolutions with skip connections, combined with a novel concept which we call Frequency Aggregation Filters. These are convolutional filters spanning the whole frequency range, which contribute to the stability of
the training and constitute a consequent take on the problem of non-local correlations in the frequency spectrum. We also find that using so-called self-gating considerably reduces the memory requirement of the architecture by halving the number of input maps to each layer without degradation of the results. In order to prevent mode collapse, we propose a regularization that enforces a correlation between differences in the noise input and differences in the model output. As opposed to most other works (but in line with few other approaches using GANs and U-Net-based architectures), we input (and output) directly the (non-linearly scaled) complex-valued
spectrum to the generator, eliminating the need to deal with phase information separately
Unofficial implementation:
Hybrid Multimedia Production Suite will be a platform-indipendent open source suite for advanced audio/video contents production.
Official git: