Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: [OPEN SOURCE] Fat Llama: a (wannabe) lossy to lossless "upscaler" (Read 6323 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

[OPEN SOURCE] Fat Llama: a (wannabe) lossy to lossless "upscaler"

Just discovered:
Quote
About
fat_llama is a Python package for upscaling audio files to FLAC or WAV formats using advanced audio processing techniques. It utilizes CUDA-accelerated calculations to enhance audio quality by upsampling and adding missing frequencies through FFT (Fast Fourier Transform), resulting in richer and more detailed audio.

Features
  • Upscale MP3 files to high-quality FLAC format.
  • Optional iterative soft thresholding (IST) for enhanced audio processing.
  • Gain adjustment, equalization, and optional Wiener filtering.
  • Supports GPU-accelerated processing with CuPy.

How it works


Algorithm Explanation
The upscaling process involves several steps:
  • Reading Audio File: The audio file is read, and the audio samples are extracted along with the sample rate and bitrate.
  • Calculating Upscale Factor: The upscale factor is calculated to achieve the target bitrate.
  • Upscaling Channels: The audio channels are upscaled using an interpolation algorithm. Each sample is repeated multiple times to increase the resolution.
  • Iterative Soft Thresholding (IST): IST is applied to enhance the audio by adding missing frequencies. This process uses FFT to transform the signal into the frequency domain, apply a threshold to keep significant frequencies, and then inverse transform back to the time domain.
  • Scaling Amplitude: The amplitude of the upscaled audio is scaled to match the original.
  • Normalizing Audio: The audio is normalized to the range -1 to 1.
  • Writing FLAC File: The processed audio is written to a FLAC file.

Spectrogram Results


Resources
Git: https://github.com/bkraad47/fat_llama#readme
PyPi page: https://pypi.org/project/fat-llama/
CPU version: https://pypi.org/project/fat-llama-fftw/

note: to me this sounds like Spectral Recovery feature of Izotope RX or Stereotool's Delossifier/Absolute Highs but open source.
Hybrid Multimedia Production Suite will be a platform-indipendent open source suite for advanced audio/video contents production.
Official git: https://forart.it/HyMPS/

Re: [OPEN SOURCE] Fat Llama: a (wannabe) lossy to lossless "upscaler"

Reply #1
If the project was actually backed with correct theory, maybe it could have done something good to audio.

But as shown in the spectogram, this is a weird "alias of just the higher frequencies", not even trying to expand it.
One thing to mention too is that, if the spectogram is correct, then the input MP3 uses 11Khz sampling rate, so it is creating aliasing in 6,5Khz, 13Khz and 19,5Khz.

Re: [OPEN SOURCE] Fat Llama: a (wannabe) lossy to lossless "upscaler"

Reply #2
Oh yes, that's hopelessly bad.
a fan of AutoEq + Meier Crossfeed

Re: [OPEN SOURCE] Fat Llama: a (wannabe) lossy to lossless "upscaler"

Reply #3
I've opened an issue @ project's GH with some suggestions:
Quote
  • During a lossy audio treating, the best approach is to carefully decode and preserve it as much as possible: that's why a floating point internal computing and output grants better fidelity;
  • Operations like normalization (that alters the level of the original signal) should be optional and not part of the process;
  • Since your software relies on FFMPEG, I strongly suggest you to always use the -drc_scale 0 parameter when decoding lossy sources (even if it's AC3-specific feature);
  • Always seek for new scientific papers/code - such as @zawi01's Audio Dequantization Using (Co)Sparse (Non)Convex Methods - that may help to achieve better results;
  • While you've probably already done this, I recommend you to check how Izotope' Spectral Recovery and Stereotool's Delossifier functions works, for possible inspirations.

...anyway nowdays the best approach is to "reconstruct" signal using machine learning IMHO.
Hybrid Multimedia Production Suite will be a platform-indipendent open source suite for advanced audio/video contents production.
Official git: https://forart.it/HyMPS/

Re: [OPEN SOURCE] Fat Llama: a (wannabe) lossy to lossless "upscaler"

Reply #4
...anyway nowdays the best approach is to "reconstruct" signal using machine learning IMHO.

Yeah, it is a much better option. Just train an AI with downsampled and compressed music, then the original files, so it will learn how the harmonics work. Then you can scale to convert CD quality to "High-Res". It is better than a dull algorithm, because the algorithm barely can replicate harmony, and can't make much transformations; AI can do "magic" with the audio.

I just wonder how much processing power requires that...  :-\

Re: [OPEN SOURCE] Fat Llama: a (wannabe) lossy to lossless "upscaler"

Reply #5
Only the same as creating hi-res images from lo-res ones.  The reconstructed image may be crisp, but the detail is invented.  Won't the same be true of audio?  Do you want detail in the audio which was never there in the first place??
It's your privilege to disagree, but that doesn't make you right and me wrong.

Re: [OPEN SOURCE] Fat Llama: a (wannabe) lossy to lossless "upscaler"

Reply #6
Only the same as creating hi-res images from lo-res ones.  The reconstructed image may be crisp, but the detail is invented.  Won't the same be true of audio?  Do you want detail in the audio which was never there in the first place??

Pretty much this right here.  If it was lossy compressed to begin with then it can't be made lossless because those details are gone.  Fat Llama is just another snake's oil scam to me.  Doesn't matter if it uses a traditional algorithm or an AI based algorithm, still the same, tired, old thing.

Re: [OPEN SOURCE] Fat Llama: a (wannabe) lossy to lossless "upscaler"

Reply #7
Only the same as creating hi-res images from lo-res ones.  The reconstructed image may be crisp, but the detail is invented.  Won't the same be true of audio?  Do you want detail in the audio which was never there in the first place??

Pretty much this right here.  If it was lossy compressed to begin with then it can't be made lossless because those details are gone.  Fat Llama is just another snake's oil scam to me.  Doesn't matter if it uses a traditional algorithm or an AI based algorithm, still the same, tired, old thing.

Given that regenerating the original signal "simplified" by a lossy encoding is obviously impossible, GANs works quite differently from everything else: check out the very interesting Stochastic Restoration of Heavily Compressed Musical Audio using Generative Adversarial Networks scientific paper to understand why.

Just a shot:
Hybrid Multimedia Production Suite will be a platform-indipendent open source suite for advanced audio/video contents production.
Official git: https://forart.it/HyMPS/

 

Re: [OPEN SOURCE] Fat Llama: a (wannabe) lossy to lossless "upscaler"

Reply #8
Are you saying the results sound pleasing?  I suppose there might be some value in that.
It's your privilege to disagree, but that doesn't make you right and me wrong.

Re: [OPEN SOURCE] Fat Llama: a (wannabe) lossy to lossless "upscaler"

Reply #9
Given that regenerating the original signal "simplified" by a lossy encoding is obviously impossible, GANs works quite differently from everything else: check out the very interesting Stochastic Restoration of Heavily Compressed Musical Audio using Generative Adversarial Networks scientific paper to understand why.

Just a shot:
I assume these spectograms are from a different software than the one in the initial post.

Said that, while this software might make the sound more pleasant compared to the 32kbps version, I see that it is also partially destroying the harmonics. In all the cases, I see that tonal frequencies are less clear than in both, the original and the 32kbps cases, so partially buried in noise.

Of course the regenerated high end can also sound strange and not follow the sound (like, on cymbal sound different than the next cymbal)

Re: [OPEN SOURCE] Fat Llama: a (wannabe) lossy to lossless "upscaler"

Reply #10
Are you saying the results sound pleasing?  I suppose there might be some value in that.
Not exactly: I argue that, in lossy decoding, GANs are more effective than traditional algorithms (exactly just like in the image upscaling field).

However, similarly, aiming to multiply disproportionately the resolution brings excessive artifacts anyway (exactly like upscaling from 1080p to 4/8K).

In other words, I would personally avoid generating "hires" uncompressed files from any lossy format: 44/48KHz-16bit (from a bitrate no lower than 96 kbps) should be a reasonable upper limit.

I assume these spectograms are from a different software than the one in the initial post.

Said that, while this software might make the sound more pleasant compared to the 32kbps version, I see that it is also partially destroying the harmonics. In all the cases, I see that tonal frequencies are less clear than in both, the original and the 32kbps cases, so partially buried in noise.

Of course the regenerated high end can also sound strange and not follow the sound (like, on cymbal sound different than the next cymbal)
Yes, I already asked Wallace Abreu (the unofficial code implementer) to let us test it ASAP: https://github.com/abreuwallace/Stochastic-Restoration-GAN/issues/2.
Hybrid Multimedia Production Suite will be a platform-indipendent open source suite for advanced audio/video contents production.
Official git: https://forart.it/HyMPS/

Re: [OPEN SOURCE] Fat Llama: a (wannabe) lossy to lossless "upscaler"

Reply #11
For some reason I have a feeling this will hardly sound better than mp3PRO with the SBR "enhancement".
Though I would like to hear it first myself before hard judging it.

Re: [OPEN SOURCE] Fat Llama: a (wannabe) lossy to lossless "upscaler"

Reply #12
...anyway, CONSTRUCTIVE criticism generates better results:
Quote
[1.1.0] - 2024-08-01

Chanaged
  • Moved adaptive filtering to after normalization and auto-scaling steps.
  • Reduced step size for LMS adaptive filter for improved stability.
  • Ensured all processing uses CuPy for GPU acceleration.
  • Added detailed comments and logging for better traceability.


Hybrid Multimedia Production Suite will be a platform-indipendent open source suite for advanced audio/video contents production.
Official git: https://forart.it/HyMPS/

Re: [OPEN SOURCE] Fat Llama: a (wannabe) lossy to lossless "upscaler"

Reply #13
At first sight, those results look remarkable... but:

Why does the centre trace (MP3) appear to be running faster than the other two?  EG: the central "rich" section lasts 15s whereas in the original it's 17s.
It's your privilege to disagree, but that doesn't make you right and me wrong.

Re: [OPEN SOURCE] Fat Llama: a (wannabe) lossy to lossless "upscaler"

Reply #14
Hi there good catch,

I've been using  Audacity to convert at 170-210 kbps, 48000 Hz. Noticably, on the sample files I noticed for flac, there are some high pitched frequencies and pauses, that I noticed when converted to mp3 creates some artifacts and speeds up. This however goes against the harmonic nature of the frequencies.

When using fat_llama we are essentially breaking them down in 2 steps.
1. Expanding the existing bit rate plane, by scaling it up by a factor of a whole integers that is closest to the upscale number.
2. Then using IST FFT to reconstruct the missing details, i.e. features of harmoic frequencies crests and lows. Which seem to correctly capture the features of the original flac hence the the shifting of frequencies in time to similar to flac, while going from mp3 -> flac. Is my assumption.

I do need to investigate deeper, also getting better mids reconstruction on files. but loosing some high pitched frequencies though which I need to investigate.

Re: [OPEN SOURCE] Fat Llama: a (wannabe) lossy to lossless "upscaler"

Reply #15
I don't see that has anything to do with it.

If the centre spectrogram is supposed to be the output from an MP3 conversion of the original FLAC, it isn't running at the same speed as the original FLAC – that has nothing to do with any subsequent processing, it is an artefact of the conversion to or from MP3.

If the lower spectrogram is supposed to be the output from "fancy" recovery of detail from the MP3, why is it running at the rate of the original FLAC rather than the rate of the MP3?

You'll forgive me for saying it all looks rather fishy.
It's your privilege to disagree, but that doesn't make you right and me wrong.

Re: [OPEN SOURCE] Fat Llama: a (wannabe) lossy to lossless "upscaler"

Reply #16
 :D  Well sir as I said its my assumption on what audacity is doing so. As it seems to speed up on conversion from FLAC. As for the results they are also assumptions as I will need to dig deeper.

 Audacity vesion Im using is 3.6.1

Note. Dont have much to hide or gain here as this is a free open source idea. But feel free to run the experiment again. Im on a windows 11 on an intel 13th core i9 and rtx 4050. And python 3.12.

The files are in the link below.

https://www.dropbox.com/scl/fi/2b1xesjjk9f5lqd8abkqr/experiment.zip?rlkey=ngdrqbzxnvt0dq7if2k5x8scn&st=vaw9us0s&dl=0

I would install the latest version of fat_lama=1.0.2.3 for this. You can run the conversion for mp3 to flac using converter.py. And the analysis.py to regenerate results. Cheers.

Re: [OPEN SOURCE] Fat Llama: a (wannabe) lossy to lossless "upscaler"

Reply #17
If the MP3 is at increased speed than the original FLAC, (say 44.1ksps was assumed to be 48ksps), then the FLAC conversion from the MP3 should also be at increased speed [besides, 44.1 to 48 is not sufficient to explain time compression from 17s to 15s].  And do you seriously believe Audacity would get that wrong?

There is something wrong with your methodology.

I would install the latest version of fat_lama=1.0.2.3 for this. You can run the conversion for mp3 to flac using converter.py. And the analysis.py to regenerate results. Cheers.
I have no intention wasting my time on that, neither do I have the hardware nor any intention of surrendering to Win11.  I'm just pointing out the inconsistencies in what you published.
It's your privilege to disagree, but that doesn't make you right and me wrong.

Re: [OPEN SOURCE] Fat Llama: a (wannabe) lossy to lossless "upscaler"

Reply #18
 :D  Probably this would need work. Never said its perfect. Can perhaps try converting the flac in Audacity to check.

Well you asked how the spectrogram was created and I shared. As I have nothing to hide.  :D


Re: [OPEN SOURCE] Fat Llama: a (wannabe) lossy to lossless "upscaler"

Reply #19
As already suggested to @bkrd47 in this issue it would be really interesting to "put together" all the lossy audio recovering (or reinventing?) techniques into a single open source tool.

I believe that many forum' skilled users can easily suggest which tools/path adopt to obtain the best possible results.

The first step is, of course, the decoding aspect: more quality decoder can squeeze out, fewer artifacts will be "needed" in subsequent recovering stages.

fatllama relies on ffmpeg (can you suggest something better ?), so which are the best possible quality qudio decoding parameters for it ?
Hybrid Multimedia Production Suite will be a platform-indipendent open source suite for advanced audio/video contents production.
Official git: https://forart.it/HyMPS/

Re: [OPEN SOURCE] Fat Llama: a (wannabe) lossy to lossless "upscaler"

Reply #20
Digging GH, some other interesting projects emerged:

  • Matthew McQuistion's upscalemp3 that "Converts an mp3 (lossy) file back into its uncompressed wav counterpart based on a generative AI model built with tensorflow."
  • Detlef Kroll's Audio Delossifier which claims to "Delossify compressed audio (mp3 and others) with Python and Tensorflow"

...including the already-cited Stochastic Restoration one there are at least 4 different open projects about: an objective comparative is needed !
Hybrid Multimedia Production Suite will be a platform-indipendent open source suite for advanced audio/video contents production.
Official git: https://forart.it/HyMPS/

Re: [OPEN SOURCE] Fat Llama: a (wannabe) lossy to lossless "upscaler"

Reply #21
...more contenders more fun ? (= blind listening test/compatative would be nice)

https://github.com/ivandustin/learnaudio#readme
https://github.com/jhetherly/EnglishSpeechUpsampler#readme

Hybrid Multimedia Production Suite will be a platform-indipendent open source suite for advanced audio/video contents production.
Official git: https://forart.it/HyMPS/

Re: [OPEN SOURCE] Fat Llama: a (wannabe) lossy to lossless "upscaler"

Reply #22
Another well-known open source contender (as IA Hispano "forked" it too) could be AudioSR, of course:


note: I've decided to reorganize (and expand) the HyMPS' collection: AUDIO section \ AI-based category \ Enhancers page \ Upscalers
Enjoy !
Hybrid Multimedia Production Suite will be a platform-indipendent open source suite for advanced audio/video contents production.
Official git: https://forart.it/HyMPS/

Re: [OPEN SOURCE] Fat Llama: a (wannabe) lossy to lossless "upscaler"

Reply #23
I have no issue with "enhancing" audio, whether it's a low-quality MP3, some other defect/limitation, or just some change that you want make to the sound...  I have occasionally used a harmonic enhancer (AKA "harmonic exciter" or "exciter").

But you cannot recover what was lost.   If the MP3 you're tying fix-up doesn't sound exactly like the uncompressed original (in a proper ABX test) the "restored" MP3 will never sound like the uncompressed original.

There also seems to be a lot of focus on the "spectrum".   If you hear compression artifacts in a good-quality MP3, it's usually not the loss of highs that you are hearing, but it's usually the easiest thing to see or measure.   More likely, your are hearing a temporal effect called "pre echo".   That can't be fixed.  Adding harmonics to re-introduce high frequencies will make the spectrum look better but it won't necessarily make it sound more-like the uncompressed original.   In fact it's probably easier to ABX the enhanced version vs. the original.  (You usually have to listen very carefully to hear pre-echo.)

With low bitrate MP3s, you might indeed hear the loss of high frequencies and you may be able to make some improvement.    (Whenever there is a difference, whether it's an improvement or degradation is a matter of personal taste and opinion.)

And remember TOS #8.   The focus here is what you can hear (usually in a blind ABX test), not what you can measure or see in a graph.   It's easier to make a pretty spectrum than it is to get good sound.  ;)   (I'm not saying you're not hearing an improvement or difference with this tool.)

Re: [OPEN SOURCE] Fat Llama: a (wannabe) lossy to lossless "upscaler"

Reply #24
And remember TOS #8.   The focus here is what you can hear (usually in a blind ABX test), not what you can measure or see in a graph.   It's easier to make a pretty spectrum than it is to get good sound.  ;)   (I'm not saying you're not hearing an improvement or difference with this tool.)
Well, I don't think that TOS 8 should be involved here.

In fields like this, we need a more scientific approach then blind ABX, adopting the so called "null test" (original signal + phase inverted reconstructed signal should produce silence) to understand the "AI" objective recontruction effectiveness....
Hybrid Multimedia Production Suite will be a platform-indipendent open source suite for advanced audio/video contents production.
Official git: https://forart.it/HyMPS/