Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: [OPEN SOURCE] Fat Llama: a (wannabe) lossy to lossless "upscaler" (Read 8958 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Re: [OPEN SOURCE] Fat Llama: a (wannabe) lossy to lossless "upscaler"

Reply #25
So I took this screenshot as reference to decide how much high frequency I should cut in my test file for this experiment:


I uploaded another piece of music with the same amount of cutoff, and would like to evaluate the results of various algorithms mentioned in the previous posts. I will upload the original audio file (without cutoff) after someone posted some results in lossless audio (not image) format.
X

Re: [OPEN SOURCE] Fat Llama: a (wannabe) lossy to lossless "upscaler"

Reply #26
So I took this screenshot as reference to decide how much high frequency I should cut in my test file for this experiment:
Since those "applications" claims to restore audio from lossy I don't think that just cuttting frequencies is the correct approach...
...lossy encoding (128/96/64 or lower kbps, IMHO) then apply those restorers and finally perform the "null test" (original signal + phase inverted restored one, that allows to clearly - and acoustically - evaluate any unwanted artefacts introduced too) should be better: the one who gets the least difference from silence wins.
Hybrid Multimedia Production Suite will be a platform-indipendent open source suite for advanced audio/video contents production.
Official git: https://forart.it/HyMPS/

Re: [OPEN SOURCE] Fat Llama: a (wannabe) lossy to lossless "upscaler"

Reply #27
So this one:

I encoded the source to 32kbps CBR MP3. Note that the encoder automatically reduced the sample rate to 16kHz. It is not possible to use VBR because the lowest setting will result in 69kbps for this file, which does not match the bitrate of the screenshot above.

Re: [OPEN SOURCE] Fat Llama: a (wannabe) lossy to lossless "upscaler"

Reply #28
In fields like this, we need a more scientific approach then blind ABX, adopting the so called "null test" (original signal + phase inverted reconstructed signal should produce silence) to understand the "AI" objective recontruction effectiveness....

I disagree: when the goal is to "reconstruct" or "enhance" the losses of a perceptual compression, the evaluation criterion should include some element of perception as well. It could be argued that a traditional ABX (or ABXY) test is not the right test protocol, but I suspect that you can not get away with having to include some kind of subjective preference in the evaluation strategy.

So one protocol one might conjure up includes
- S: the lossless source material
- C: the lossy compression result
- R: the AI-recontructed version
and then have test subjects rate on a reasonably coarse scale how much they like a blindly randomly selected sample drawn from these three (or just C & R) with some repetitions for statistics (and listener consistency checks).

I wonder whether S should be available for reference. But then, it's not an ABX test, all three versions are supposedly sounding different.

Re: [OPEN SOURCE] Fat Llama: a (wannabe) lossy to lossless "upscaler"

Reply #29
I disagree: when the goal is to "reconstruct" or "enhance" the losses of a perceptual compression, the evaluation criterion should include some element of perception as well.
Hopefully yes. But ABXing establishes the first step: are A and B even distinguishable by ear? If they aren't, you don't need to evaluate for "better" or "worse".

Re: [OPEN SOURCE] Fat Llama: a (wannabe) lossy to lossless "upscaler"

Reply #30
It could be argued that a traditional ABX (or ABXY) test is not the right test protocol, but I suspect that you can not get away with having to include some kind of subjective preference in the evaluation strategy.
Well, I believe that depends on goals of the recontruction...
...for me this comparison cannot be considered as a mere decoding one (where, you are probably right, listeners' preferences could count more) but - since the restored spectras are substantially "invented" (read predicted) by neural networks - the "null" approach let better understand which one generates objectively fewer "artifacts", also because the processing - unlike decoding - could eventually generate unexpected "errors" in unconsidered frequency bands or could be NOT linear too (= produce different results on different typer of signals, depending on training).

@bennetng: I've linked all the projects' gits, so you can test them yourself. Please post results.

note: of course a jupiter notebook that performs them all would be cool.
Hybrid Multimedia Production Suite will be a platform-indipendent open source suite for advanced audio/video contents production.
Official git: https://forart.it/HyMPS/

Re: [OPEN SOURCE] Fat Llama: a (wannabe) lossy to lossless "upscaler"

Reply #31
https://github.com/bkraad47/fat_llama#readme
Requirements

    CUDA capable GPU

X

I used to have a super old GTX950 but it is already dead, now the iGPU does not meet the requirement, and I have no plan in getting a new discrete GPU in near future.

Also, I am not familiar with the operation of Github, if someone can build a Windows executable without requiring specific GPU features I am willing to post some results, just like what I did previously.
https://hydrogenaud.io/index.php/topic,125765.msg1042632.html#msg1042632

Obviously, if I were able to post some results, I already posted them.

Re: [OPEN SOURCE] Fat Llama: a (wannabe) lossy to lossless "upscaler"

Reply #32
Just found a very interesting paper - even if focused on restoring missing samples of a speech signal - about the so-called "audio inpainting", where they used MSE - mean squared error - between the source (image) spectrum and the target one to measure the effectiveness. They claims that "achieving the lowest possible MSE will result in the most natural, realistic inpainting for the audio".

Check it out
PDF: https://github.com/iamhectorotero/generative-audio-inpainting/blob/master/generative_audio_inpainting_report.pdf
Git: https://github.com/iamhectorotero/generative-audio-inpainting#a-generative-approach-to-audio-inpainting

Other interesting experiments on audio reconstrucion (with notebooks) here:
https://github.com/ColinShaw/python-neural-network-audio-reconstruction#neural-network-audio-reconstruction
Hybrid Multimedia Production Suite will be a platform-indipendent open source suite for advanced audio/video contents production.
Official git: https://forart.it/HyMPS/

Re: [OPEN SOURCE] Fat Llama: a (wannabe) lossy to lossless "upscaler"

Reply #33
Also, I am not familiar with the operation of Github, if someone can build a Windows executable without requiring specific GPU features I am willing to post some results, just like what I did previously.
If you - like me - don't have a CUDA-capable hardware, a jupiter notebook (running on Colab, for example) is the way to go...
Hybrid Multimedia Production Suite will be a platform-indipendent open source suite for advanced audio/video contents production.
Official git: https://forart.it/HyMPS/

Re: [OPEN SOURCE] Fat Llama: a (wannabe) lossy to lossless "upscaler"

Reply #34
I can see why there is still no one posted the results. Asking others for the processing results will deplete their cloud computing quota, and looks like I will need to learn some Python to automate the process as well, so I will give up.

Here are some "upsampling" examples I made many years ago, with actual audio files instead of only images:
https://hydrogenaud.io/index.php/topic,108864.msg948272.html#msg948272
https://hydrogenaud.io/index.php/topic,108864.msg948350.html#msg948350

The files were created by combining several existing tools/effects bundled with Reaper without using any additional plugins (e.g. iZotope, Thimeo Stereo Tool etc). However I am not going to describe the procedure in detail to avoid people using similar methods to generate large amount of fake files for ill purpose because the processing power requirement of this method is pretty low.

Re: [OPEN SOURCE] Fat Llama: a (wannabe) lossy to lossless "upscaler"

Reply #35
I can see why there is still no one posted the results. Asking others for the processing results will deplete their cloud computing quota, and looks like I will need to learn some Python to automate the process as well, so I will give up.
You can easily open an issue on their Git asking for it !

The files were created by combining several existing tools/effects bundled with Reaper without using any additional plugins (e.g. iZotope, Thimeo Stereo Tool etc). However I am not going to describe the procedure in detail to avoid people using similar methods to generate large amount of fake files for ill purpose because the processing power requirement of this method is pretty low.
Well, a fast null test of your upscale - that is anyway different from a neural network recontruction - isn't that silent (= fidelity uncompliant) ...
Hybrid Multimedia Production Suite will be a platform-indipendent open source suite for advanced audio/video contents production.
Official git: https://forart.it/HyMPS/

Re: [OPEN SOURCE] Fat Llama: a (wannabe) lossy to lossless "upscaler"

Reply #36
You either have overlooked the sample offset for the resampler you used, or the resampler's quality is problematic. I attached a video to show how I upsampled the original file to 88.2k to match the sample rate of another file, aligned the timing, inverted the waveform in one track and rendered the resulting null mix, the result is very different from yours.

Moreover, even a nonlinear phase resampler without involving any additional processing, or a linear phase resampler with subsample offset will cause the result to null poorly, and this poor result is not psychoacoustically relevant:
https://www.audiosciencereview.com/forum/index.php?threads/72-software-sample-rate-converters-put-to-the-test.241/
My reply to the OP:
https://www.audiosciencereview.com/forum/index.php?threads/72-software-sample-rate-converters-put-to-the-test.241/post-83249

Of course, Deltawave also has some file alignment features:
https://deltaw.org/

However, I just want to show that even by using the most basic alignment approach in my attached video file, the result is still much better than yours.

Re: [OPEN SOURCE] Fat Llama: a (wannabe) lossy to lossless "upscaler"

Reply #37
Just another observation of your null file. It is not level matched, it is 6dB higher than the fake88200.flac I posted.
X

Here is the level matched result:
X

Even though these kinds of null comparisons don't have too much psychoacoustic relevance, if one wants to do some null tests, at least do it honestly.

Re: [OPEN SOURCE] Fat Llama: a (wannabe) lossy to lossless "upscaler"

Reply #38
Just another observation of your null file. It is not level matched, it is 6dB higher than the fake88200.flac I posted.
Even though these kinds of null comparisons don't have too much psychoacoustic relevance, if one wants to do some null tests, at least do it honestly.
As said, that was a fast null test (= loaded 2 files in audacity, inverted the phase of the "fake" one and mixdown, nothing other: if audios are not aligned themself it's another fidelity fault to me).

Anyway that's why a "testing protocol" is needed.
Hybrid Multimedia Production Suite will be a platform-indipendent open source suite for advanced audio/video contents production.
Official git: https://forart.it/HyMPS/


Re: [OPEN SOURCE] Fat Llama: a (wannabe) lossy to lossless "upscaler"

Reply #40
loaded 2 files in audacity, inverted the phase of the "fake" one and mixdown
Hm, what is then Audacity doing?
Even if I do the null as mentioned using Audacity, the resulting waveform still obviously different from his version. Something must be seriously screwed up if not intentional. Check out another video file attached.

Re: [OPEN SOURCE] Fat Llama: a (wannabe) lossy to lossless "upscaler"

Reply #41
Hm, what is then Audacity doing?
It's probably resampling signals (without noticing), of course.

So here's the Deltawave - which notices the resampling instead - results:
Quote
DeltaWave v2.0.13, 2024-08-20T08:41:57.7713434+02:00
Reference: original.flac[L] 1323000 samples 44100Hz 16bits, stereo, MD5=00
Comparison: fake88200.flac[L] 2646000 samples 88200Hz 24bits, stereo, MD5=00
Settings:
Gain:True, Remove DC:True
Non-linear Gain EQ:False Non-linear Phase EQ: False
EQ FFT Size:65536, EQ Frequency Cut: 0Hz - 0Hz, EQ Threshold: -500dB
Correct Non-linearity: False
Correct Drift:True, Precision:30, Subsample Align:True
Non-Linear drift Correction:False
Upsample:True, Window:Kaiser
Spectrum Window:Kaiser, Spectrum Size:32768
Spectrogram Window:Hann, Spectrogram Size:4096, Spectrogram Steps:2048
Filter Type:FIR, window:Kaiser, taps:262144, minimum phase=False
Dither:False bits=0
Trim Silence:True
Enable Simple Waveform Measurement: False
Resampled Reference to 88200Hz
Discarding Reference: Start=0s, End=0s
Discarding Comparison: Start=0s, End=0s
Initial peak values Reference: 0,082dB Comparison: 0dB
Initial RMS values Reference: -18,453dB Comparison: -18,394dB
Null Depth=34,424dB
Trimming 71424 samples at start and 0 samples at the end that are below -90,31dB level
X-Correlation offset: -32534 samples
Trimming 0 samples at start and 0 samples at the end that are below -90,31dB level
Drift computation quality, #1: Excellent (0,04μs)
Trimmed 7086 samples ( 80,340136ms) front, 23676 samples ( 268,435374ms end)
Final peak values Reference: 0,082dB Comparison: -0,115dB
Final RMS values Reference: -18,336dB Comparison: -18,366dB
Gain= 0,0923dB (1,0107x) DC=0,00029 Phase offset=-809,811296ms (-71425,356 samples)
Difference (rms) = -40,02dB [-41,64dBA]
Correlated Null Depth=45,45dB [52,48dBA]
Clock drift: -0,01 ppm
Files are NOT a bit-perfect match (match=0,16%) at 16 bits
Files match @ 49,9965% when reduced to 6,96 bits
---- Phase difference (full bandwidth): 25,074700939838°
0-10kHz: 16,85°
0-20kHz: 17,94°
0-24kHz: 20,38°
Timing error (rms jitter): 659,8ns
PK Metric (step=400ms, overlap=50%):
RMS=-43,0dBFS
Median=-44,8
Max=-34,5
99%: -35,57
75%: -42,43
50%: -44,82
25%: -47,33
1%: -53,32
gn=0,989435073400854, dc=0,000287857645362735, dr=-5,14181713836272E-09, of=-71425,3563045766
DONE!
Signature: 653fa7beb2363b5f7598785ac68bc85a
RMS of the difference of spectra: -94,9942219385772dB
DF Metric (step=400ms, overlap=0%):
Median=-24,9dB
Max=-21,7dB Min=-28,1dB
1% > -28,04dB
10% > -26,13dB
25% > -25,54dB
50% > -24,87dB
75% > -24,06dB
90% > -22,95dB
99% > -0,58dB
Linearity 2,1bits @ 0.5dB error
---- Phase difference (full bandwidth): 25,3354794968148°
0-10kHz: 12,28°
0-20kHz: 15,17°
0-24kHz: 18,99°
Linearity 2,1bits @ 0.5dB error




Anyway, we're going OT talking about this upsample...
Hybrid Multimedia Production Suite will be a platform-indipendent open source suite for advanced audio/video contents production.
Official git: https://forart.it/HyMPS/

Re: [OPEN SOURCE] Fat Llama: a (wannabe) lossy to lossless "upscaler"

Reply #42
Even if I do the null as mentioned using Audacity, the resulting waveform still obviously different from his version. Something must be seriously screwed up if not intentional. Check out another video file attached.
Checked, the only two differences are that - as sayd - i've phase inverted the fake (not the original, but should be the same) and set the audacity project properties to 88200.

Keep your malicious suspicions to yourself please, but again: we're not discussing about your (closed-method) upsamples but on how "AI" (read machine learning) can help to reconstruct missed frequencies of a lossy audio signal.
Hybrid Multimedia Production Suite will be a platform-indipendent open source suite for advanced audio/video contents production.
Official git: https://forart.it/HyMPS/

Re: [OPEN SOURCE] Fat Llama: a (wannabe) lossy to lossless "upscaler"

Reply #43
Im torn on the notion that these algorithms cant recover lost information. I thought so too but this thread has had me thinking about it. After all they are making an educated guess, and that guess has a propability to be right, in which case it DID recover lost information. I think since we dont have the source material most of the time this question is more philosophical than anything.
And so, with digital, computer was put into place, and all the IT that came with it.

Re: [OPEN SOURCE] Fat Llama: a (wannabe) lossy to lossless "upscaler"

Reply #44
Checked, the only two differences are that - as sayd - i've phase inverted the fake
I tried to invert the fake version first then mix too, and the resulting waveform still has a lower amplitude than yours. Of course, can't prove you have malicious intent or not, but at least showed an obvious issue with your file.

Interesting that someone mentioned null test first without even paying attention to how do it in a sensible way.

 

Re: [OPEN SOURCE] Fat Llama: a (wannabe) lossy to lossless "upscaler"

Reply #45
Interesting that someone mentioned null test first without even paying attention to how do it in a sensible way.
So we agree that the null test is the way to go, in this field.

Thanks for pointing out Deltawave (straightly forwarded to @bkrd47 's git), which I didn't know about.
Hybrid Multimedia Production Suite will be a platform-indipendent open source suite for advanced audio/video contents production.
Official git: https://forart.it/HyMPS/

Re: [OPEN SOURCE] Fat Llama: a (wannabe) lossy to lossless "upscaler"

Reply #46
Im torn on the notion that these algorithms cant recover lost information. I thought so too but this thread has had me thinking about it. After all they are making an educated guess, and that guess has a propability to be right, in which case it DID recover lost information. I think since we dont have the source material most of the time this question is more philosophical than anything.
I think it would be really interesting to do some (null) real-use tests with custom-trained Audio Delossifier to understand the ML potentials in this field: AFAIK better training should produce better results...
Hybrid Multimedia Production Suite will be a platform-indipendent open source suite for advanced audio/video contents production.
Official git: https://forart.it/HyMPS/

Re: [OPEN SOURCE] Fat Llama: a (wannabe) lossy to lossless "upscaler"

Reply #47
OK, here's a DW-report of files shared by @bkrd47 in this reply:

Quote
DeltaWave v2.0.13, 2024-08-20T12:23:57.2062764+02:00
Reference: test_original_flac.flac[L] 7278467 samples 192000Hz 24bits, stereo, MD5=00
Comparison: test_converted_flac.flac[L] 8491546 samples 224000Hz 24bits, stereo, MD5=00

Settings:
Gain:True, Remove DC:True
Non-linear Gain EQ:False Non-linear Phase EQ: False
EQ FFT Size:65536, EQ Frequency Cut: 0Hz - 0Hz, EQ Threshold: -500dB
Correct Non-linearity: False
Correct Drift:True, Precision:30, Subsample Align:True
Non-Linear drift Correction:False
Upsample:True, Window:Kaiser
Spectrum Window:Kaiser, Spectrum Size:32768
Spectrogram Window:Hann, Spectrogram Size:4096, Spectrogram Steps:2048
Filter Type:FIR, window:Kaiser, taps:262144, minimum phase=False
Dither:False bits=0
Trim Silence:True
Enable Simple Waveform Measurement: False
Resampled Reference to 224000Hz
Discarding Reference: Start=0s, End=0s
Discarding Comparison: Start=0s, End=0s
Initial peak values Reference: -3,358dB Comparison: 0dB
Initial RMS values Reference: -20,25dB Comparison: -16,675dB
Null Depth=101,926dB
Trimming 0 samples at start and 0 samples at the end that are below -90,31dB level
X-Correlation offset: -3 samples
Trimming 0 samples at start and 0 samples at the end that are below -90,31dB level
Drift computation quality, #1: Excellent (0,3μs)
Trimmed 171244 samples ( 764,482143ms) front, 224860 samples ( 1003,839286ms end)
Final peak values Reference: -3,358dB Comparison: -3,801dB
Final RMS values Reference: -20,505dB Comparison: -20,765dB
Gain= 3,822dB (1,5528x) DC=0,00001 Phase offset=-0,013447ms (-3,012 samples)
Difference (rms) = -32,77dB [-39,68dBA]
Correlated Null Depth=42,37dB [41,89dBA]
Clock drift: -0,01 ppm
Files are NOT a bit-perfect match (match=0,13%) at 16 bits
Files are NOT a bit-perfect match (match=0%) at 24 bits
Files match @ 49,998% when reduced to 6,5 bits
---- Phase difference (full bandwidth): 143,874834932871°
0-10kHz: 23,92°
0-20kHz: 56,12°
0-24kHz: 66,07°
Timing error (rms jitter): 514ns
PK Metric (step=400ms, overlap=50%):
RMS=-37,5dBFS
Median=-46,7
Max=-29,4
99%: -30,89
75%: -34,7
50%: -46,7
25%: -50,74
1%: -55,33
gn=0,644017646514286, dc=1,40872942458951E-05, dr=-8,96573621136896E-09, of=-3,01221162486093
DONE!
Signature: 85634bda22fde7ebef2c2e8a0c038e7a
RMS of the difference of spectra: -78,1160326779584dB
DF Metric (step=400ms, overlap=0%):
Median=-18,9dB
Max=-10,6dB Min=-30,8dB
1% > -30,78dB
10% > -27,18dB
25% > -24,67dB
50% > -18,88dB
75% > -13,32dB
90% > -12,28dB
99% > -2,19dB
Linearity 2,1bits @ 0.5dB error




Hybrid Multimedia Production Suite will be a platform-indipendent open source suite for advanced audio/video contents production.
Official git: https://forart.it/HyMPS/

Re: [OPEN SOURCE] Fat Llama: a (wannabe) lossy to lossless "upscaler"

Reply #48
Interesting that someone mentioned null test first without even paying attention to how do it in a sensible way.
So we agree that the null test is the way to go, in this field.
No. My opinion is in the quoted text above. My statement may contain grammatical mistake but I did not say "null test is the way to go", just to be clear.

At some point I may consider getting a new GPU and learn some Python then run the code locally on my PC, and do some listening tests myself.

Re: [OPEN SOURCE] Fat Llama: a (wannabe) lossy to lossless "upscaler"

Reply #49
At some point I may consider getting a new GPU and learn some Python then run the code locally on my PC, and do some listening tests myself.
The CPU version of this do already exist but expect worse performance (perhaps the only option if you don't have enough money to buy a NVIDIA GPU) than GPU-accelerated version