HydrogenAudio

Lossless Audio Compression => FLAC => Topic started by: 2Bdecided on 2007-06-12 20:31:55

Title: Near-lossless / lossy FLAC
Post by: 2Bdecided on 2007-06-12 20:31:55
This is an (unoriginal) idea / work in progress. I make no claims for it, but it might be interesting or useful for someone. It is not competitive with wavpack lossy. It is not "finished" either! As far as I know, it is 100% compatible with existing recent lossless FLAC implementations.


The idea is simple: lossless codecs use a lot of bits coding the difference between their prediction, and the actual signal. The more complex (hence, unpredictable) the signal, the more bits this takes up. However, the more complex the signal, the more "noise like" it often is. It's seems silly spending all these bits carefully coding noise / randomness.

So, why not find the noise floor, and dump everything below it?

This isn't about psychoacoustics. What you can or can't hear doesn't come into it. Instead, you perform a spectrum analysis of the signal, note what the lowest spectrum level is, and throw away everything below it. (If this seems a little harsh, you can throw in an offset to this calculation, e.g. -6dB to make it more careful, or +6dB to make it more aggressive!).


How is this applied to FLAC? FLAC has a nice featured called "wasted_bits". If it finds all bits below a certain bit are consistently zero, it simply stores: "the bottom 3 bits are all zeros" and then takes no more effort in encoding them. It checks this once per frame. In FLAC frames can be variable length, but current encoders use a fixed 4096 sample length.

This means if you have a 24-bit file, but it only contains 16-bit audio data (i.e. the bottom 8 bits are zero throughout) then FLAC encodes it just as efficiently as a 16-bit file. The only overhead is a few bits every 4096 samples saying "wasted_bits=8".

It also means that if, say, you have a normal 16bit CD and you find the noise floor during a certain 4096 samples never falls below the 12th bit, you can set bits 13-16 to zero, then feed the result to FLAC, and it will automatically use a lower bitrate for that frame than if you fed it all 16 bits.

Hence "lossy FLAC" is a wav pre-processor for regular lossless FLAC. The interim stage is a "lossy" wav file with 0s in some least significant bits. The final output is a 100% compliant FLAC, which faithfully reproduces this "lossy" wav file. The lossy stage is therefore the pre-processor, and the processed "lossy" wav file, when encoded to FLAC, results in a lower bitrate than the original wav file when encoded to FLAC.


Potentially the quality is very near to what you started with, and more than good enough for many applications. In most places where mp3 doesn't work, I believe that lossy FLAC will.


On music which FLAC already compresses very well, lossy FLAC gives little advantage. Often it does exactly nothing (full 16 bits preserved), or nearly nothing (the last bit or two dropped occasionally). On music which causes the FLAC bitrate to go comparatively high, lossy FLAC usually brings a significant gain. I've seen bitrates fall by 20%-50%. Still, it's not low bitrate encoding, and it's pure VBR.


Problem samples? I don't know - I'm hoping some HA regulars can lend their ears and detective skills here. Standard lossy codec problem samples are probably not that relevant. Wavpack lossy problem samples are more relevant, but lossy FLAC does seem to spot some of these and either quantises less aggressively or not at all (i.e. encoding is pure lossless).


So what can people download? Well, sadly, I'm not a C programmer. I'm attaching a MATLAB script that works as a lossy FLAC pre-processor. You run a .wav file through this, and then encode it to FLAC as normal.

If you haven't got MATLAB, but have an idea for a useful sample to test, upload it to HA (maximum 30 seconds; shorter=better because MATLAB is slow and the code isn't optimised at all!) and I'll upload a lossy FLAC version when I get a chance.


I'll post more about the algorithm later.

Cheers,
David.

P.S. the attachment should be "lossyFLAC.m" but HA won't allow me to upload .m, so I've changed it to .txt.
Title: Near-lossless / lossy FLAC
Post by: 2Bdecided on 2007-06-12 20:41:56
For those who don't want to read MATLAB code but want to know what's happening...

The algorithm is quite simple. Pick two FFT sizes - one long one, useful for catching tonal signals, one short one, useful for catching transients. Find out where the quantisation noise due to truncating at each bit will fall in these sized FFTs. Store this data in a look-up table.

Now go through the audio file. For each 4096-sample block, look at the long and short FFTs across that block separately, and find the lowest value in each, look up the implied number of wasted bits for each, and then use the lowest value of wasted bits to round the audio in that block to that many bits.

Job done.

However, there are some "bodges" in there.

Firstly, a frequency range is specified. FFT bins outside this frequency range won't be checked. Otherwise, a sharp 20kHz low pass filter in the original would force "wasted_bits" to zero simply to maintain a -96dB noise floor above 20kHz.

Secondly, the FFT's are "spread" before finding the lowest value. This isn't some clever psychoacoustic ear/masking spreading function - it's just a simple average. The reason is quite simple: in almost any windowed FFT, you'll get some bins into which almost no energy falls. This really isn't significant, but if we didn't ignore these bins, they'd force us to keep all the bits all the time. As it is, I've averaged over 4 bins using a rectangular spreading function before finding the lowest. If this gives you cause for concern, this should allay your fears: there are still enough low bins that 8-bit dither, pasted into a 16-bit file, is still encoded with 10-bit accuracy! In other words, when encoding pure noise, there's still a 2-bit "safety margin". Whether this works for all signals is one reason I'd like to people listen.

Thirdly, it's trivial to shift the thresholds, so I've put that feature in, though set it to 0 by default.


There are issues which remain to be solved.

It seems to work OK with clipped files, which is a surprise, because a positive clipped integer sample (e.g. 16bit audio) is all ones, hence wasted_bits=0. I need to look into this. Converting to 24-bits and dropping the audio by 6dB would be a solution (already implemented) if this was a problem.

There is no checking of the mid or side channels yet. Ideally, the algorithm should check mid and side in the same way as left and right, and pick the global noise floor. One caveat is that any channel which is digital silence (or "near" digital silence - there's a can of worms) needs to be ignored.

You can run many many generations with lossy FLAC before problems arise. I've gone to 50 generations with trivial processing and dithering at each generation. The quantisation noise was 1-2 bits higher in the 50th generation than in the first lossy FLAC generation. If this is a problem (I couldn't hear it) I assume you could set a -12dB noise threshold offset to solve it, though the efficiency would decrease dramatically.

Finally, this will lead to FLAC files that look like they're lossless (because FLAC is normally lossless) but are in fact lossy. Never fear! A simple utility (someone else can write one) to check the value of "wasted_bits" will soon tell you what you have. Real FLAC files almost almost never have non-zero "wasted_bits". lossy FLAC files will have load.


To answer the obvious question about bitrates, here is a screen grab from foobar2k showing the bitrates of some wavpack lossy problem samples in lossy FLAC.
[attachment=3286:attachment]


Here is an unrelated file containing a random mixture of music samples from a recent listening test.
This is the waveform (top view) and lossy FLAC quantisation noise (bottom view)
[attachment=3288:attachment]

This is a graph of the number of bits removed (i.e. the quantisation / rounding level) in each FLAC frame/block:
[attachment=3287:attachment]

Obviously the quantisation noise and number of bits removed are correlated (perfectly)..

Cheers,
David.
Title: Near-lossless / lossy FLAC
Post by: 2Bdecided on 2007-06-12 20:52:38
Here are some examples - only a couple, because I'm on dial up.

The originals are elsewhere on HA - do a search if you want to grab them to compare.


Penultimate comment: I have no plans for a lossy+correction=lossless version. It would be possible to do it crudely with two FLAC files (one lossy, one residual) and adding them; or smartly by integrating this within FLAC itself. Not my job. Not sure it's worth it.

Finally (for now), as discussed in a recent thread, if you can't hear above 16kHz, then you can often reduce FLAC bitrates by resampling to 32kHz. Combining that with lossy FLAC pre-processing brings the bitrate down still further. I'm almost tempted to use it.


Let the hunt for problem samples begin!

Cheers,
David.
Title: Near-lossless / lossy FLAC
Post by: TBeck on 2007-06-12 21:09:46
Interesting approach.

I did something similar in the very early days (1997) of TAK. Well, i haven't used your FFT approach but something more simple but nevertheless efficient.

I remember that the frame size was very important. 4096 samples is definitely too much! The signal amplitude will often change considerably in those 93 ms. You will have to keep too many bits to avoid distortions in the frame parts with low amplitude.

I don't know if i still had golden ears in 1997, but for me my bit reduction approach was transparent at about 440 kbps. Well, should be considerably less wth TAK's later compression improvements...

  Thomas
Title: Near-lossless / lossy FLAC
Post by: JeanLuc on 2007-06-12 21:31:03
So ... basically you are applying a variable or 'gliding' noise gate if I understood correctly?
Title: Near-lossless / lossy FLAC
Post by: jcoalson on 2007-06-12 21:36:37
that was my hunch too, that for noisy samples you might get better results with shorter blocks.

clever idea.  in practice I think the file should also be tagged with the preprocessing parameters so it could be identified without analyzing all the frames.
Title: Near-lossless / lossy FLAC
Post by: 2Bdecided on 2007-06-13 09:20:13
that was my hunch too, that for noisy samples you might get better results with shorter blocks.

clever idea.  in practice I think the file should also be tagged with the preprocessing parameters so it could be identified without analyzing all the frames.


You would have better control of the noise floor with shorter (or variable) blocks, but my guess is there would be some kind of trade-off as shorter blocks would often make FLAC itself less efficient. How efficient is FLAC with, say, 1024-sample blocks?

I agree it would be sensible to "tag" the files as lossy in some way, but it should be a way which isn't easily removed by careless use of a tag editor. This implies something at the frame level.


These are both things which can only be done from within the FLAC encoder. I am not skilled enough to start playing around in there myself!

Cheers,
David.


So ... basically you are applying a variable or 'gliding' noise gate if I understood correctly?


Kind of. Technically it doesn't remove noise, since by definition any change to the signal is unwanted, and hence "noise". So it actually adds more noise, at/below the existing noise floor.

The only exception is if you force the threshold up (i.e. make it much more aggressive), and then it's just possible that it could quantise a noise floor (in isolation) out of existence - i.e. if the signal consists of white noise at -90B, it could quantise it to all zeros. I must stress that this isn't default behaviour - you'd have to raise the thresholds by 12dB or more to make this happen. By default, it will preserve all noise, but often add a little more noise several dB below the existing noise - a change which I suspect is both inaudible and almost always irrelevant.

Cheers,
David.
Title: Near-lossless / lossy FLAC
Post by: 2Bdecided on 2007-06-13 10:07:14
I've attached some lossy and lossless files for ABXing if anyone is interested / willing.

These were grabbed from various threads about wavpack lossy problem samples, since these are the most likely to cause problems with lossy FLAC.

If anyone has any other potential problem samples, please let me know / upload them.

Cheers,
David.


I forgot to mention...

If anyone thinks this is useful enough to code into a real programming language, please do!

If anyone wants to adapt this idea for other lossless codecs, feel free.

If anyone wants to argue that I should have included dither, please don't. Used properly, the signals are self dithering at the chosen quantisation level (so it's largely unnecessary) and dither adds an extra bit of noise which reduces the efficiency of the whole process (i.e. you have to keep ~one more bit of data than you would otherwise just to counteract the dither).

Most importantly, I'm asking if people can listen, ABX, and find problem samples. I've never been sensitive to background noise, so I don't know if this approach works fine as it is, needs tweaking, or is useless.

Cheers,
David.
Title: Near-lossless / lossy FLAC
Post by: shadowking on 2007-06-13 10:54:46
I am sensitive to this noise with wavpack and dualstream at 250 k. A casual abx: all is good thus far. Avg bitrate is 550k. I don't know how to compare though as wavpack and dualstream are fine at 350 k even on most hard stuff.
Title: Near-lossless / lossy FLAC
Post by: 2Bdecided on 2007-06-13 12:41:18
Thanks. If you know of anything which wavpack and dualstream can't handle at 350k, that might be an interesting test.


The lossy FLAC bitrate will never be competitive in this incarnation for two reasons:

1) it's just a preprocessor, so it has to work within the limits of the host format (in this case, FLAC).
2) it doesn't use any noise shaping.

It would be interesting to see the lossy FLAC method of setting the noise floor integrated into something like wavpack lossy, with or without wavpack lossy's noise shaping.


btw, the most interesting problem sample I found was "short block test 2". Lossy FLAC does absolutely nothing to it, judging the noise floor to be at or below the 16th bit. Hence it gets encoded losslessly at 137kbps.

Cheers,
David.
Title: Near-lossless / lossy FLAC
Post by: goodnews on 2007-06-13 12:46:35
I am opposed to calling any lossy implementation of FLAC still FLAC. FLAC has positioned itself as a "Free Lossless Audio Codec" (it's name) and changing the name or what it means now would be detrimental and confusing to users I believe. FLAC has also stood for LOSSLESS -- that's why so many people use and like it (no loss of audio quality/data).

Please call your variant LFLAC or something other than FLAC please to avoid confusion with older decoders/hardware implementations which likely won't support any lossy variant. Thanks!
Title: Near-lossless / lossy FLAC
Post by: Nick.C on 2007-06-13 13:00:20
Support for the LFLAC name (or possible Lossy Free Audio Codec - LFAC?).....

I love the idea of LFLAC - I recently set my PC transcoding individual track FLAC > whole album OGG and it sat for 8 hours or so. The reason I picked OGG is due to the predisposed good opinion of it on these forums and the fact that GSPlayer plays it on my iPAQ.

However, I would be interested in LFLAC as a portable variant of my FLAC collection.......
Title: Near-lossless / lossy FLAC
Post by: 2Bdecided on 2007-06-13 13:17:58
Please call your variant LFLAC or something other than FLAC please to avoid confusion with older decoders/hardware implementations which likely won't support any lossy variant. Thanks!


As currently implemented, it is a pre-process to a standard FLAC encoder.

As such, it is 100% compatible with all FLAC compliant decoders, requires no change to the format, and the final file will be a standard .flac file from a standard FLAC encoder.

Given your concerns, this should scare you far more than the name (which can be anything - well, anything sensible).


However, I've already addressed this concern earlier in the thread: if users can't be trusted to tag (or not to untag) lossy FLAC files properly, the only way to recognise them is from something at the FLAC frame level (the "wasted_bits" data already tells you what is happening), or by spotting rows of 0s in the LSBs of the decoded audio data.


If an incompatible "LFLAC" format can do the job better (i.e. more efficiently; same performance in fewer bits) than standard FLAC with the lossy FLAC pre-processor, then it'll probably be created, and you'll have nothing to worry about.

However, the beauty of lossy FLAC (as a pre-processor) is that it's compatible with all the FLAC implementations out there. Unless "LFLAC" brings big advantages, making an intentionally incompatible "LFLAC" format just to hold lossy FLAC data won't stop the problem you envisage: On day 1, nothing will play it back, but it could easly be transcoded losslessly (i.e. maintaining the same losses!) into standard FLAC, maintaining the bitrate advantage and playing back correctly on everything that supports FLAC. So if I or someone else were to force a different lossy FLAC / LFLAC format onto the world, people would transcode it to standard FLAC to get it to play on various devices. Then there would be exactly the same "lossy FLAC" files that I've provided above.


Look at it this way: at least with lossy FLAC there's an easy way to check that it's lossy. However, if someone transcodes a traditional high bitrate lossy file without a lowpass to FLAC and gives it to you, the only way of knowing is by listening.


It's ironic that you're facing this problem because FLAC is open source. If it was closed source, I'd never have been able to do this.

Don't panic though. This is still at the "proof of concept" stage. It might not work. If it does work, I'm sure someone will implement it properly, and they might not base that implementation on FLAC at all.

Cheers,
David.
Title: Near-lossless / lossy FLAC
Post by: Nick.C on 2007-06-13 13:26:44
This sounds like something that could be achieved through collaboration with FLAC's developer - add a command to FLAC.EXE to select a "quality" which might equate to how aggressive the algorithm is and output to a FLAC file.

The issue of "is it a lossless FLAC file or a lossy FLAC file" would remain for those who do not create their own FLAC files - however a simple checksum of the resultant WAV file (and an Accurate Rip style database) would instantly indicate whether the file was lossy / lossless.

Good luck with implementation........
Title: Near-lossless / lossy FLAC
Post by: goodnews on 2007-06-13 13:51:58
David,

I understand more about what you are attempting, but I still don't like FLAC being "forked" like this. Not that you can't do it legally (i.e. open source). But Josh has said before that FLAC hasn't been "forked" in all the years that it has been out, and I believe that "forking" it now would damage FLAC's reputation unless the name and file extension were changed.

When I see a FLAC file, I know it's lossless. FLAC is synonymous with lossless. Changing to to a "forked" lossy version where now a FLAC file could be lossless or could be lossy would confuse many people and IMO detract from the format's name and reputation among users that FLAC has built up all these years.

I suggest you use a different extension .LFL or .LFLAC instead of .FLAC to avoid any chance of confusion. Look at how Apple uses .M4A for lossy and now Apple lossless. You just can't always easily tell in 3rd part apps if you are playing a lossy or lossless file (other than perhaps by the file size). Many apps will choke on an Apple Lossless .M4A file as they think it is a MPEG 4 (AAC audio) file.

I'd hate to see the FLAC name and file extension "bastardized" to mean "it could be lossless or it could be lossy, your guess?" My vote is for FLAC to remain FLAC (LOSSLESS) and please choose some other file extension for a lossy implementation of FLAC, if you so desire to make one.
Title: Near-lossless / lossy FLAC
Post by: 2Bdecided on 2007-06-13 14:31:22
goodnews,

I have no desire to damage FLAC. I would be quite happy to call "lossy FLAC" LFAC and have a .lfac extension.

The immediate problem I have with this is that I have to rename the .lfac files to .flac in order to get foobar2k (or anything else!) to play them. Everyone else in the world will face the same problem.


To be honest, if David (Bryant, wavpack developer) is interested, I think the method I'm using would sit better within his encoder.

Also Josh is free to put this in FLAC in a compatible but identifiable way.

We'll see. Let's figure out if it works first.

Cheers,
David.
Title: Near-lossless / lossy FLAC
Post by: halb27 on 2007-06-13 14:34:15
FLAC as such remains lossless of course.
You can never prevent people from doing pre-processing for whatever reason so you never know when getting a FLAC file whether it was preprocessed before encoding or not.
I have some (few) oldies tracks in my ape archive that are important to me and with which I did some preprocessing (denoising/declicking/bringing artificial brilliance to them cause they sounded pretty dull).
Sure these ape files are not identical with the original source (but I enjoy them a lot more).

Whenever you get a file from somebody else you're always in an unsecure position. The most probable issue isn't preprocessing but potential mediocre quality of the original source used (the FLAC file may be the Non-DRM version of a 128 kbps DRM-WMA track for instance). But you can decide by listening whether you like it or not.

David's idea is great to me just because it's a pre-processor machinery leaving the FLAC world as it is.
Moreover he has found a mechanism which may be valuable for other encoder developers. Maybe David Bryant can use the idea for bringing a quality control to wavPack lossy if he likes to.
Title: Near-lossless / lossy FLAC
Post by: Nick.C on 2007-06-13 14:45:18
I don't see this as "damaging" FLAC at all - as halb27 said, you never know if the WAV input to FLAC has been processed in any way before encoding. As David said, the files are fully FLAC compliant and therefore are FLAC files - the fact that the input WAV was pre-processed is neither here nor there.

Now, if a Foobar based transcoder could be implemented, I could drop OGG and use the LFAC pre-processor to shrink my FLAC-for-iPAQ files..... (I just found the GSPlayer gspflac.dll file  )
Title: Near-lossless / lossy FLAC
Post by: SebastianG on 2007-06-13 15:23:44
The idea is simple: lossless codecs use a lot of bits coding the difference between their prediction, and the actual signal. The more complex (hence, unpredictable) the signal, the more bits this takes up. However, the more complex the signal, the more "noise like" it often is. It's seems silly spending all these bits carefully coding noise / randomness.

So, why not find the noise floor, and dump everything below it?


This is sort of what speech codecs do, actually.

your signal --[LPC analysis filter H(z)]--> pretty white noisy residual --[lossy coding]--> pretty white noisy residual + white noisy errors --[LPC synthesis filter 1/H(z)]--> your approximation with q-noise "hidden behind" your signal.

If you want a similar preprocessing for FLAC or WavPack you'd do something like this:
- estimate LPC filter coeffs (H(z)) and temporarily filter the block to get the residual
- check the residual's power and select "wasted_bits" accordingly
- quantize original (unfiltered) samples so that the "wasted_bits" least sigcificant bits are zero
- use 1/H(z) as noise shaping filter.

If you further check what psychoacoustic models usually do you'll notice that they allocate more bits to lower frequencies than to higher frequencies (higher SNR for lower freqs) most of the time. You then can tweak the noise shaping filter to W(z)/H(z) where W(z) is some fixed weighting so that you have a higher SNR for lower freqs.

This is actually what I did when I experimented with "high data rate steganography for audio carriers" and it worked pretty well. The only difference was that instead of zeroing LSBs i "simulated" data to be carried by randomly filling those LSBs which is like subtractive dithering.

BTW: FLAC's default blocksize is 4608, isn't it? The encoder's blocksize should match the preprocessor's blocksize so no "wasted bits" are coded. Also, for noisy transients it's good to be able to quickly change "wasted_bits" which suggests merging preprocessor + encoder into one program that uses variable length blocks. As a matter of fact I recently (couple of weeks ago) read the FLAC file format spec again to check wheter I should give it a try ...


Cheers!
SG
Title: Near-lossless / lossy FLAC
Post by: menno on 2007-06-13 15:32:05
Nice. This is the same way MPEG-4 SLS becomes lossy, there have been some good results reported with that.
Title: Near-lossless / lossy FLAC
Post by: 2Bdecided on 2007-06-13 16:27:19
SebG,

Thanks for your response, but I'm confused. Are you saying what I've done is equivalent to what you describe? Or better/worse? Or didn't you look at what I'd done? As far as I can tell (and I know almost nothing about LPC analysis!) what I'm doing is more accurate, and gives a better "guarantee" of transparency.

As for skewing the noise or noise calculation towards lower frequencies - I intentionally don't want to put any psychoacoustics in there, other than some very simple assumptions which are required to make it work at all.

The FLAC block size is supposedly 4096, which is what I've used. It would make sense to use/try something smaller, but that's out of my hands.

Cheers,
David.
Title: Near-lossless / lossy FLAC
Post by: jcoalson on 2007-06-13 16:55:39
This sounds like something that could be achieved through collaboration with FLAC's developer - add a command to FLAC.EXE to select a "quality" which might equate to how aggressive the algorithm is and output to a FLAC file.
right now I'm thinking it should remain outside any "flac"-named encoder since flac has always meant lossless.  if it turned out to be really useful then we could probably figure out a way to make it into a proper tool that also wouldn't cause confusion.

BTW: FLAC's default blocksize is 4608, isn't it? The encoder's blocksize should match the preprocessor's blocksize so no "wasted bits" are coded.
yes, you're right, they should definitely match.  the default blocksize switched to 4096 samples in 1.1.4

Also, for noisy transients it's good to be able to quickly change "wasted_bits" which suggests merging preprocessor + encoder into one program that uses variable length blocks. As a matter of fact I recently (couple of weeks ago) read the FLAC file format spec again to check wheter I should give it a try ...
I've been working on supporting variable blocksize properly; currently thespec is ambiguous in some cases... stay tuned.

Josh
Title: Near-lossless / lossy FLAC
Post by: Nick.C on 2007-06-13 17:11:42
Not suggesting that you compromise the excellent reputation that FLAC has - it seems to be becoming a more mainstream codec with support in Volvo cars (of all things, but great start!).

I just like the idea of only using one codec for all my encoding / transcoding - and one that allows lossy coding in a container that will work exactly the same as the lossless parent version.
Title: Near-lossless / lossy FLAC
Post by: pepoluan on 2007-06-13 17:21:38
Now, if a Foobar based transcoder could be implemented, I could drop OGG and use the LFAC pre-processor to shrink my FLAC-for-iPAQ files..... (I just found the GSPlayer gspflac.dll file  )
How big is your iPaq's memory? Even with LFAC I don't think you'll fit more than 1 album's worth.

Slightly offtopic: Where'd you get the gspflac.dll??? I wanna!11!!! 
Title: Near-lossless / lossy FLAC
Post by: SebastianG on 2007-06-13 17:42:26
Hi, David and Josh!

Are you saying what I've done is equivalent to what you describe? Or better/worse? Or didn't you look at what I'd done?

No, it's not equivalent to what you've done. It's just my interpretation of the text I quoted (noise floor not necessarily flat) and sort of a suggestion because I believe it to be a clever thing. But if you don't like the idea of shaping the noise at all your approach (= only introducing white noise below the threshold of hearing) is already as good as it can get, I suppose.

However, I'd like to note that by properly colouring the noise you can theoretically set more LSBs to zero (=> lower bitrate) while keeping the same subjective quality level. Of course, this "properly" is kind of a black magic component.  But even the simple W(z)/H(z) trick did well for me. (I derived W(z) by feeding OggEnc with mono pink noise). But the shaping strength could be softened for those too scared of psychoacoustics.

As a matter of fact I recently (couple of weeks ago) read the FLAC file format spec again to check wheter I should give it a try ...
I've been working on supporting variable blocksize properly; currently thespec is ambiguous in some cases... stay tuned.

Cool!. Could you clarify the "Notes" paragraph in the frame header section, please? What blocksizes are allowed if it's a variable length block stream? I'd use "1000-1111 : 256 * (2^(n-8)) samples" but it looks like they are not allowed.

Cheers!
SG
Title: Near-lossless / lossy FLAC
Post by: 2Bdecided on 2007-06-13 18:59:41
SebG,

Ah, I see. Well, it might be interesting to try. It sounds like you're tempted to do it!


Josh PM'd me to point out that the FLAC frame/block size can be set from the command line using the -b command. I've just tried it, and it works as expected: with the MATLAB code changed to use 1024 sample blocks, it seems I get better compression, but I need to try more samples. On the one I tried (41_30sec) this shaved another 20% off, though that sample encodes 3% more efficiently in 1024 sample blocks than 4096 blocks anyway.

Cheers,
David.
Title: Near-lossless / lossy FLAC
Post by: jcoalson on 2007-06-13 19:07:43
Slightly offtopic: Where'd you get the gspflac.dll??? I wanna!11!!! 
https://sourceforge.net/project/showfiles.p...group_id=165460 (https://sourceforge.net/project/showfiles.php?group_id=165460)

Cool!. Could you clarify the "Notes" paragraph in the frame header section, please? What blocksizes are allowed if it's a variable length block stream? I'd use "1000-1111 : 256 * (2^(n-8)) samples" but it looks like they are not allowed.
all that convoluted logic will be going away with the next version of FLAC thankfully.  I'll publish the details with the next release of FLAC (hopefully no later than july)
Title: Near-lossless / lossy FLAC
Post by: goodnews on 2007-06-13 19:36:44
all that convoluted logic will be going away with the next version of FLAC thankfully.  I'll publish the details with the next release of FLAC (hopefully no later than july)

Slightly OT: Josh, is there a Intel Mac OS X version of FLAC 1.1.4 for download yet? All is still see is the PPC Mac version. Thanks!

What features will be included in next version of FLAC that you mentioned?
Title: Near-lossless / lossy FLAC
Post by: halb27 on 2007-06-13 21:03:39
I tried all the samples you provided, and couldn't abx them except for the second half of furious:
With my first trial I got at 4/4 with furious, but I missed several times with the following guesses.
With my second trial I got at 6/6, then 7/8, finally 8/10. Not a totally convincing result but maybe enough to show that furious lossy isn't totally transparent.
This is pretty similar to what I have learnt from wavPack lossy behavior for furious.
Anyway the difference is so subtle I can't really describe it. Just a minimal lack of precision may be. Not serious at all.

Can you provide Atem-lied, herding_calls, trumpet, harp40_1 please? These and all the other samples in the presumably more efficient short block version?

ADDED:
I forgot badvilbel. Can you provide badvilbel please?
Title: Near-lossless / lossy FLAC
Post by: Bourne on 2007-06-13 23:40:21
I kinda talked about this once before... I called it Virtual Lossless...
But unfortunately someone cut me out saying: Lossy is virtual lossless...

Thanks for your detailed explanation.
And I see a diference between LOSSY and VIRTUAL LOSSLESS.
Title: Near-lossless / lossy FLAC
Post by: Mitch 1 2 on 2007-06-14 10:35:43
Using a two-part file extension (e.g. .lossy.flac) should solve the compatibility problem, at the expense of longer filenames. Proper tagging is needed, of course, as filenames alone cannot be trusted.
Title: Near-lossless / lossy FLAC
Post by: 2Bdecided on 2007-06-14 11:53:55
I tried all the samples you provided, and couldn't abx them except for the second half of furious:
With my first trial I got at 4/4 with furious, but I missed several times with the following guesses.
With my second trial I got at 6/6, then 7/8, finally 8/10. Not a totally convincing result but maybe enough to show that furious lossy isn't totally transparent.
This is pretty similar to what I have learnt from wavPack lossy behavior for furious.
Anyway the difference is so subtle I can't really describe it. Just a minimal lack of precision may be. Not serious at all.


Thank you for ABXing halb27.

I thought I could ABX the background noise at the end of Furious, but then failed. I'm not sure if I'm imagining it, or if it's nearly audible.


If you're in the mood to play, please try the attached files. I've played with the thresholding. I've also (intentionally) broken the lossy part by dithering the LSB itself so you can't cheat and look at the FLAC bitrate!

At least one of these files has more noise (so it's not as hard a job as it looks). At least one has less noise. So you should be able to ABX at least one, and maybe cannot ABX at least one. See what you think.

If you do have time to ABX, please decide upon the number of ABX tests before you start, and stick to that. As you probably know, selecting results or re-starting messes up the statistics.

Of course anyone is free to try.


Quote
Can you provide Atem-lied, herding_calls, trumpet, harp40_1 please? These and all the other samples in the presumably more efficient short block version?

ADDED:
I forgot badvilbel. Can you provide badvilbel please?


I'll do those as time permits. If I get chance before you've responded, I'll post them at the default settings. However, if your next response suggests I need to reduce the noise addition slightly, then I'll post them with a less aggressive setting.

Cheers,
David.
Title: Near-lossless / lossy FLAC
Post by: robert on 2007-06-14 13:11:48
foobar has some problem with the sample 1_Furious:
Code: [Select]
Decoding failure at 0:01.486 (Unsupported format or corrupted file):
"F:\1_Furious.flac"


edit: Sorry, the downloaded file was broken on my side. I downloaded it again and Foobar plays it just fine.
Title: Near-lossless / lossy FLAC
Post by: halb27 on 2007-06-14 13:14:51
Will try them tonight.

BTW I don't concentrate on the background noise but on the accuracy of the 'main signal' in the second half of the track.

As for abxing if the question is 'is track X transparent?' I always allow for a second trial in case I have the impression that there is a difference and get at a result like 4/4 before I start to go wrong (due to possibly fatigueness). The number of guesses is fixed before a test (usually 10 guesses, sometimes 8), but in case I don't see what to concentrate on I often give up within the first guesses (usually with several wrong guesses at that time).
Of course my insisting on going through the test also depends on previous experience with the sample. From furious I know it's a serious problem for wavPack lossy and I have an idea what to look for. I'm also more emotionally engaged cause this is one of the more serious problems to wavPack lossy - I can imagine to get a similar kind of music in real life encoding, and it's not just a tiny amount of altered or increased hiss/noise but inaccuracy - though in your case very tiny as well.
Title: Near-lossless / lossy FLAC
Post by: 2Bdecided on 2007-06-14 13:49:58
As for abxing if the question is 'is track X transparent?' I always allow for a second trial in case I have the impression that there is a difference and get at a result like 4/4 before I start to go wrong (due to possibly fatigueness). The number of guesses is fixed before a test (usually 10 guesses, sometimes 8), but in case I don't see what to concentrate on I often give up within the first guesses (usually with several wrong guesses at that time).
I'm sure an ABX / statistics guru will be along in a minute to explain exactly why that alters the statistics. I just recall that it does, and in a way that makes it much harder to hit a given level of confidence.

I think you could say something like "I will do 8, and they will not count, then I will do 16, and they will count" if you stuck to it.

Of course, you can always listen carefully, and A/B (not X!) until you believe you hear something. Then, and only then, do the pre-decided number of ABX trials. Then there's no question.

Cheers,
David.
Title: Near-lossless / lossy FLAC
Post by: Mark0 on 2007-06-14 13:51:37
I may be misunderstanding something, but: why linking this to FLAC at all?
I mean: this is a "sound simplifier", so to speak, so it's output could be very well fed to pretty any lossless (or even non lossless, even if this does a lot of less sense) coder, right?

Bye!
Title: Near-lossless / lossy FLAC
Post by: Ariakis on 2007-06-14 14:28:42
It was originally designed to specifically exploit FLAC's wasted_bits handling for compression gain.
Title: Near-lossless / lossy FLAC
Post by: naturfreak on 2007-06-14 14:59:01
My suggestion to further prevent confusion whether a FLAC file is from a lossless or lossy source:
Introduce a flag inside the FLAC (meta)data that indicate whether a file has a lossy or lossless source.

An user should be able to set that flag at encoding time only. It should be noneditable and unerasable inside the FLAC file (hex editor might be an execption).
Title: Near-lossless / lossy FLAC
Post by: 2Bdecided on 2007-06-14 15:12:12
It was originally designed to specifically exploit FLAC's wasted_bits handling for compression gain.


Exactly.

However, it turns out you can use it with wavpack lossless too, by using the --blocksize=1024 switch to force wavpack to use 1024 sample blocks. (or 4096 sample blocks for the examples I provided on the previous page). Thanks to David Bryant for providing this, and other useful tips via email (I will reply properly David).

Cheers,
David.
Title: Near-lossless / lossy FLAC
Post by: halb27 on 2007-06-14 15:39:06
... I think you could say something like "I will do 8, and they will not count, then I will do 16, and they will count" if you stuck to it. ...

Hm... it's not like this: I say in advance I'll do 10 guesses with each trial in order to call two tracks abxable.
What shall I do in a situation when I have the impression (which doesn't count in the end but I can't ignore it) that there are audible differences, and this is backed up by the first guesses where I score 4/4? If after that I fail what does that mean? Failure can be due to the tracks not being able to abx, but also due to fatigueness or overconfidence according to the first results. Certainly this means I'm not very good at abxing, but with tracks hard to abx it happens to be like this - I can't change it. So what shall I do in such a situation? My solution is: I do a second trial and try harder. Can't see a better procedure. Taking the result of the first trial as the abxing result isn't the better alternative to me in case there's a suspicion that the tracks are abxable.
Sure if I allow for a second trial I could also allow for a third one, and so on. I see the point. But that's theory cause things are clear after the second trial.
Title: Near-lossless / lossy FLAC
Post by: Mark0 on 2007-06-14 15:40:26

It was originally designed to specifically exploit FLAC's wasted_bits handling for compression gain.


Exactly.

However, it turns out you can use it with wavpack lossless too, by using the --blocksize=1024 switch to force wavpack to use 1024 sample blocks. (or 4096 sample blocks for the examples I provided on the previous page). Thanks to David Bryant for providing this, and other useful tips via email (I will reply properly David).

Cheers,
David.

Exactly. What I was meaning (sorry, my English isn't very good) is that the "simplification" made by your tool can enable similar gain in compression ratio with other similar tools. For example (using one of the posted sample):

Code: [Select]
                      WAV     FLAC      TAK       RAR
01_41_30sec_lossy   5.168KB  1.957KB  2.119KB   2.755KB
02_41_30sec         5.168KB  3.473KB  3.284KB   3.633KB


Maybe two hours from now a new Lossless compressor will surface that enjoy an even higher gain than FLAC. In the end, I don't see the link between a generic tool that "simplify" a sound file, and a particular encoder (for witch the action of the tool may result especially beneficial).

On the same way, I don't see the point about flaggin in some way a lossless encoded file that had as a input a WAV file altered in some way.
Even without using this tool, original files can be altered in a number of ways (badly equalized, or they can contains click/pop, can have some noise reduction effects applied, etc.)...

Bye!
Title: Near-lossless / lossy FLAC
Post by: halb27 on 2007-06-14 15:41:42
... However, it turns out you can use it with wavpack lossless too, by using the --blocksize=1024 switch to force wavpack to use 1024 sample blocks. ...

Wonderful.
Your idea is getting even more useful. Congratulations.
Title: Near-lossless / lossy FLAC
Post by: 2Bdecided on 2007-06-14 15:45:21
Can you provide Atem-lied
I can't find it. Can you upload it please?

Cheers,
David.
Title: Near-lossless / lossy FLAC
Post by: Nick.C on 2007-06-14 16:35:28
So, how soon before an executable version of SoundSimplifierâ„¢ (  ) is released to an expectant HA community? Inquiring (impatient) minds want to know!
Title: Near-lossless / lossy FLAC
Post by: 2Bdecided on 2007-06-14 16:44:47
I've uploaded herding_calls, trumpet, harp40_1, and badvilbel.


Here the lossy versions, and also links to the originals:

badvilbel:
[attachment=3324:attachment]http://www.ff123.net/samples/badvilbel.flac (http://www.ff123.net/samples/badvilbel.flac)

harp40_1:
[attachment=3325:attachment]ftp://ftp.tnt.uni-hannover.de/pub/MPEG/au...am/harp40_1.wav (http://ftp://ftp.tnt.uni-hannover.de/pub/MPEG/audio/sqam/harp40_1.wav)

herding calls:
[attachment=3326:attachment]http://www.hydrogenaudio.org/forums/index....showtopic=37002 (http://www.hydrogenaudio.org/forums/index.php?showtopic=37002)

trumpet:
[attachment=3327:attachment]http://www.hydrogenaudio.org/forums/index....showtopic=39334 (http://www.hydrogenaudio.org/forums/index.php?showtopic=39334)


(I miss the HA problem samples library!)

Cheers,
David.
Title: Near-lossless / lossy FLAC
Post by: TBeck on 2007-06-14 16:54:18
...
Exactly. What I was meaning (sorry, my English isn't very good) is that the "simplification" made by your tool can enable similar gain in compression ratio with other similar tools. For example (using one of the posted sample):

Code: [Select]
                      WAV     FLAC      TAK       RAR
01_41_30sec_lossy   5.168KB  1.957KB  2.119KB   2.755KB
02_41_30sec         5.168KB  3.473KB  3.284KB   3.633KB


Maybe two hours from now a new Lossless compressor will surface that enjoy an even higher gain than FLAC. In the end, I don't see the link between a generic tool that "simplify" a sound file, and a particular encoder (for witch the action of the tool may result especially beneficial).
...

For optimum performance the intenal frame size of the encoder has to be taken into account when using the preprocessor. I suppose you haven't done this for TAK?

Possibly i will have to add an option to manually set the frame size for TAK files to get most out of the preprocessor. While TAK partitions each of it's fixed size frames into up to 5 variable size sub frames to adapt to signal changes, the "wasted bits" options works on the whole frame, which is too big (more than 4000 samples) to work well with the preprocessor.

  Thomas
Title: Near-lossless / lossy FLAC
Post by: 2Bdecided on 2007-06-14 16:59:08
So, how soon before an executable version of SoundSimplifierâ„¢ (  ) is released to an expectant HA community? Inquiring (impatient) minds want to know!


I like the name!

I'm not keeping it back. I've attached the latest MATLAB script which I'm using to generate these samples. It executes very well if you have MATLAB!
(Though you need lots of memory for normal sized audio files, since there's no buffering, and you'll need to change waveread to wavread and wavewrite to wavwrite).

As for an efficient C/C++ implementation which could be compiled - that's beyond me.

I think we need some more listening tests before anyone puts that much effort in, but the job is open to anyone who wants it!

Cheers,
David.
Title: Near-lossless / lossy FLAC
Post by: smok3 on 2007-06-14 18:12:16
i couldnt reliably abx the 1st set of samples, but there were some weird negative results on some like 1/8 or 2/8.
Title: Near-lossless / lossy FLAC
Post by: TBeck on 2007-06-14 18:12:45

...
Exactly. What I was meaning (sorry, my English isn't very good) is that the "simplification" made by your tool can enable similar gain in compression ratio with other similar tools. For example (using one of the posted sample):

Code: [Select]
                      WAV     FLAC      TAK       RAR
01_41_30sec_lossy   5.168KB  1.957KB  2.119KB   2.755KB
02_41_30sec         5.168KB  3.473KB  3.284KB   3.633KB


Maybe two hours from now a new Lossless compressor will surface that enjoy an even higher gain than FLAC. In the end, I don't see the link between a generic tool that "simplify" a sound file, and a particular encoder (for witch the action of the tool may result especially beneficial).
...

For optimum performance the intenal frame size of the encoder has to be taken into account when using the preprocessor. I suppose you haven't done this for TAK?
...


I was curious...
I dowloaded "01_41_30sec_lossy" from here (http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=55522&view=findpost&p=498313). Then i compressed it with TAK's default frame size and then with a frame size of 4096 Bytes. Results:

Code: [Select]
FLAC                2,004,157  Bytes
TAK Normal Default  2,023,188  Bytes
TAK Normal 4096     1,809,281  Bytes
TAK Turbo  4096     1,846,469  Bytes
Title: Near-lossless / lossy FLAC
Post by: Mark0 on 2007-06-14 18:36:21
For optimum performance the intenal frame size of the encoder has to be taken into account when using the preprocessor. I suppose you haven't done this for TAK?

Right. I have just compressed the two files "on a rush" to check if - as I supposed - the action of the SoundSimplifierâ„¢  would be interesting also for other encoders.
And it was, so much, as your results show even more. Thanks for taking the time for the "optimized" encoding.

Bye!
Title: Near-lossless / lossy FLAC
Post by: TBeck on 2007-06-14 19:16:37

For optimum performance the intenal frame size of the encoder has to be taken into account when using the preprocessor. I suppose you haven't done this for TAK?

Right. I have just compressed the two files "on a rush" to check if - as I supposed - the action of the SoundSimplifierâ„¢  would be interesting also for other encoders.
And it was, so much, as your results show even more. Thanks for taking the time for the "optimized" encoding.

Thank you! It was fun to try if this very interesting preprocessor could also be useful for TAK. In the meantime i have checked the other lossy samples from this thread:

Code: [Select]
                               FLAC      TAK
                                         Turbo        Normal      Max
------------------------------------------------------------------------------
01_41_30sec_lossy              2,004,157   1,846,469    1,809,281   1,797,900
05_florida_seq_lossy             784,150     670,701      642,823     637,592
09_SeriousTrouble_lossy          223,366     126,843      122,256     121,305
13_Track03beginning_lossy        897,305     772,040      685,265     649,287
15_Track03entreaty_lossy         950,851     810,619      761,024     748,012
17_Track04cakewithtea_lossy    1,521,299   1,346,236    1,287,662   1,254,913
badvilbel_lossy                1,447,888   1,499,021    1,431,695   1,410,516
harp40_1_lossy                   881,164     903,859      838,660     753,119
herding_calls_lossy              656,097     636,489      570,787     548,968
trumpet_lossy                    605,151     620,738      596,649     552,079
------------------------------------------------------------------------------


All presets used a frame size of 4096.

Hm, possibly i really should add an external option for frame size selection to TAK. This all looks very promising. The preprocessor is a very nice idea!
Title: Near-lossless / lossy FLAC
Post by: halb27 on 2007-06-14 20:03:49
Can you provide Atem-lied
I can't find it. Can you upload it please?

Cheers,
David.

Here it is: Atem-lied (http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=55569&view=findpost&p=498649)
Title: Near-lossless / lossy FLAC
Post by: halb27 on 2007-06-14 20:57:11
Just tried your variants of furious.
6_furious is terrible of course (10/10), and 5_furious is pretty bad as well (9/10 - guess I was a bit too fast with the last guesses).
But as for the other variants: I can abx none of them, and that's true also for your very first sample 07_furious_lossy which I tried again, but no chance (6/10). (I still had the impression with several guesses that the encoding is somewhat 'slower', but forget it.) Sorry for having you done this extra-work.

Will try the new samples now.
Title: Near-lossless / lossy FLAC
Post by: halb27 on 2007-06-14 21:37:15
Just tried badvilbel, trumpet, herding_calls, harp40_1.

Everything is fine - only slightly questionable spot is second 0.9-3.1 on trumpet (8/10 on first trial - not abxable on second trial meant for confirmation).
Maybe somebody else likes to try trumpet?

Anyway great quality, David.

...Thank you! It was fun to try if this very interesting preprocessor could also be useful for TAK. ...

Great results with TAK. Things are getting more and more interesting.
Title: Near-lossless / lossy FLAC
Post by: TBeck on 2007-06-14 22:08:04

...Thank you! It was fun to try if this very interesting preprocessor could also be useful for TAK. ...

Great results with TAK. Things are getting more and more interesting.

Oh yes!

But up to 20 percent better results for TAK compared to FLAC seemed to much. I performed another test where i myself compressed the files with FLAC's strongest mode -8. Now TAK's advantage is down to 10 percent. Still nice.

I replaced the file sizes with kbps values, which does make more sense in lossy comparisons. It would also be nice to have the compression results of the original (lossless) files to see, how much can be saved by applying the preprocessor, but currently i have no time to collect them.

edit: Table removed. The kbps values were 2 times too high! No time to correct it. Please look at the tables below.
Title: Near-lossless / lossy FLAC
Post by: 2Bdecided on 2007-06-14 22:22:16
Just to confirm:

The latest set of samples (those requested by halb27) were all processed with a frame/block size of 1024, and the "default" threshold setting.

As mentioned, Wavpack gives useful results too.

It would be possible to change the block size dynamically to give the best compression for a given sample, but that would require tighter integration with the lossless codec - or calling it repeatedly for each block, checking the resulting file size, and concatenating the best results together.

I'll try to get some more samples on line tomorrow.

Cheers,
David.
Title: Near-lossless / lossy FLAC
Post by: TBeck on 2007-06-14 22:28:10
Just to confirm:

The latest set of samples (those requested by halb27) were all processed with a frame/block size of 1024, and the "default" threshold setting.

Does this regard to badvilbel, harp40_1, herding_calls and trumpet? Then i will have to update the results.
Title: Near-lossless / lossy FLAC
Post by: 2Bdecided on 2007-06-14 22:44:01
Yes.
Title: Near-lossless / lossy FLAC
Post by: TBeck on 2007-06-14 22:55:22
I updated the results. Now the last 4 samples have been encoded with a block size of 1024. I had to hack TAK for such small frame sizes. It is not tuned for them and there is room for improvements. 3 of the samples achieved better results with a frame size of 4096. There is considerable interaction between preprocessor and encoder settings.

Code: [Select]
                        FLAC     TAK
                        -8       Turbo     Normal   Max
----------------------------------------------------------
01_41_30sec              510      492       482      479
05_florida_seq           553      537       515      510
09_SeriousTrouble        409      368       355      352
13_Track03beginning      536      515       457      433
15_Track03entreaty       530      509       478      469
17_Track04cakewithtea    466      449       429      418
badvilbel                431      436       420      419
harp40_1                 428      415       397      384
herding_calls            461      450       423      417
trumpet                  479      470       456      442
----------------------------------------------------------
Average:                 480      464       441      432


edit: And another correction of the table...
Title: Near-lossless / lossy FLAC
Post by: TBeck on 2007-06-15 01:40:49
Mea culpa, mea maxima culpa!

I did some mistake. When calculating the kbps, i forgot that we are dealing with 2 channels, which means half the bitrate! Sorry, but good news regarding the preprocessor efficiency i suppose.

Here a new table which also contains the kbps values for the lossless compressed files. "Savings" is the effect of the preprocessor. I could not include the file "09_SeriousTrouble", because i don't have a lossless copy of it.

Code: [Select]
                        FLAC -8                    TAK Extra Max
                        Lossl.   Lossy    Saving   Lossl.   Lossy    Saving
----------------------------------------------------------------------------
01_41_30sec              924      510      414      895      479      416
05_florida_seq           797      553      244      771      510      261
13_Track03beginning      950      536      414      828      433      395
15_Track03entreaty       911      530      381      841      469      372
17_Track04cakewithtea    783      466      317      722      418      304
badvilbel                703      431      272      673      419      254
harp40_1                 636      428      208      527      384      143
herding_calls            531      461       70      444      417       27
trumpet                  776      479      297      693      442      251
----------------------------------------------------------------------------
Average:                 779      488      291      710      441      269


edit: Correction of the table. Thanks to Porcupine.
Title: Near-lossless / lossy FLAC
Post by: Porcupine on 2007-06-15 02:21:15
Wow, amazing thread. I'm really late to the party, and I need to install foobar2k before I can even start listening to FLAC files and seeing if I can ABX things, but 2Bdecided's lossy "VBR" pre-processor for FLAC sounds great to me.

One thing I noticed though is that so far, most of the tests were done with problematic (tonal) samples for lossless encoders. Wouldn't it also be good to do some tests with easy (noise-like) samples too? That way we can check how aggressive and dynamic the VBR pre-processor really is. Ideally, I think it should be able to identify high noise-levels on easy samples and set a much greater amount of LSBs to 0. Which could also put it at risk of being non-transparent again, but I think that's the goal of good VBR.

TBeck, are you sure those kbps figures are correct? They seem off to me, compared to the figures that 2Bdecided gave in his lossy_flac.gif file he posted earlier. Plus, 200 to 400 kbps for some of those lossless files doesn't seem right to me. Maybe I misunderstood what your table is showing.
Title: Near-lossless / lossy FLAC
Post by: TBeck on 2007-06-15 03:46:03
TBeck, are you sure those kbps figures are correct? They seem off to me, compared to the figures that 2Bdecided gave in his lossy_flac.gif file he posted earlier. Plus, 200 to 400 kbps for some of those lossless files doesn't seem right to me. Maybe I misunderstood what your table is showing.

Oh no, i did it wrong again! ("Oops! I did it again", possibly in the same mental state as the person i cited here...) Don't know what is going on with me today... Some mistake in my calculation sheet: I used an absolute instead of a relative cell reference. Too bad.

Thanks for telling me!

I will correct it soon.

  Thomas
Title: Near-lossless / lossy FLAC
Post by: Nick.C on 2007-06-15 09:10:33
It seems that the SoundSimplifierâ„¢ method is proving to be useful in use with any lossless codec. Yes, it's a contradiction in terms, but there is certainly a "market" for, shall we say, a "high-quality" lossy pre-processing algorithm.

Out of interest, what bitrate does Lame or OGG require to become un-ABX-able for the presumably tricky samples already mentioned?
Title: Near-lossless / lossy FLAC
Post by: shadowking on 2007-06-15 09:30:47
Out of interest, what bitrate does Lame or OGG require to become un-ABX-able for the presumably tricky samples already mentioned?


Some like florida seq, badvilbel are old mp3 hard samples. I think even 320k may not be enough. Florida is bad with -v0 or 256 abr - pre echo is severe, mpc -standard and dualstream are affected on the 'pfft' bit. One of Porcupines samples (with violin in the end) trips all the mp3 -V presets and I abxed 256 abr too. Other samples are exclusive to hybrid encoders (furious). Don't know about OGG. With mp3 bad cases can be reduced or corrected at 200~250k .. Above that its not worth it IMO and 320k is probably not right either. In that case a total psymodel overhaul would be needed.
Title: Near-lossless / lossy FLAC
Post by: Nick.C on 2007-06-15 09:43:43
I'm just trying to rationalise the justification / logic of a lossy pre-processor to a lossless codec. That said, when 2BDecided's method becomes available I will certainly use it during FB2K transcoding from .flac to .lossy.flac for iPAQ use. From the table above it seems that the lossy / lossless (LYLS?) method produces very good quality at about 400 to 500 kbps.
Title: Near-lossless / lossy FLAC
Post by: halb27 on 2007-06-15 09:56:34
Most of the samples mentioned arise from knowledge as being a problem for wavPack lossy. Because of similarities they have a higher probability to be a problem for the preprocessor as well (as long as we don't know more).

Part of the problems are known to be not easy for various codecs, for instance harp40_1, trumpet, herding_calls.
harp40_1 is transparent to me using Vorbis -q5 or Lame 3.98b3 -V1. -q4 resp. -V2 are acceptable.
trumpet isn't a problem any more with Lame 3.98b3 -V2 (haven't tried a lower setting), but was a serious problem with Lame before (partially solved with 3.97 final). herding_calls also is a problem to many codecs - in a recent mp3 listening test of mine wasn't transparent @ ~ 192 kbps with none of the mp3 encoders I tested (but acceptable for instance using Lame 3.98b3 -V2).

Generally speaking using mp3 IMO we shouldn't struggle too hard for perfection. With a bitrate around 192 kbps we get an excellent quality most of the time with a good encoder, and we have to accept that there are samples which aren't very good. Luckily this happens rather rarely.
Using Vorbis we get a better quality/bitrate ratio as well as an improved security against bad encodings at least when using -q5 or higher.

But the charme of lossless codecs and lossy variants is that there are no such things like separating the signal into different bands, simplifying the signal there, code it and usually transform it into the frequency domain, and put it all back together when decoding. This way music can be compressed extremely but the extreme transformation of the signal and the various kinds of heuristic decsion making make many people feel a bit uncomfortable as that's the potential source for many artifacts.

We are getting more and more into the situation where we don't have to be bound to low bitrate, so any high quality lossy variant of a lossless codec with a rather simple and clean signal path is getting more and more attractive. Doing it with a preprocessor that can be used with various encoders is especially attractive.
Title: Near-lossless / lossy FLAC
Post by: Nick.C on 2007-06-15 10:05:22
Well said! We're now getting into the realms of "what are the relative CPU requirements of Lossless and Lossy decoding" with a view to extending battery life on our mobile device.

But..... mobile devices are getting more and more storage (and more and more powerful and batteries are getting better), so in the not too distant future we'll be able to just use any of the lossless codecs (in full lossless mode ) without the need for any transcoding (or preprocessing).
Title: Near-lossless / lossy FLAC
Post by: 2Bdecided on 2007-06-15 11:22:22
One thing I noticed though is that so far, most of the tests were done with problematic (tonal) samples for lossless encoders. Wouldn't it also be good to do some tests with easy (noise-like) samples too? That way we can check how aggressive and dynamic the VBR pre-processor really is. Ideally, I think it should be able to identify high noise-levels on easy samples and set a much greater amount of LSBs to 0. Which could also put it at risk of being non-transparent again, but I think that's the goal of good VBR.
I'm open to suggestions - you name it / upload it, I'll try it (time permitting!).

You have to be careful with clipped samples. They don't harm quality, but they can bloat the bitrate unless you do something about it. The problem is simple: a positive clipped sample in integer binary is all ones (i.e. no zeros) so wasted_bits (or equivalent) is forced to zero.

There are various way around it.

1. You can take a 16-bit signal, losslessly transform it into a 24-bit signal (add 8 zeros!), losslessly reduced it by 6dB (shift towards the LSB by one bit), and then run it through the lossy preprocessor. The clipped sample is now 011111111111111110000000 which can be easily rounded to 100000000000000000000000 if appropriate. There is no quality hit to this method (beyond the action of the lossy pre-processor itself), but the audio is 6dB quieter.

2. As (1), but only attenuate the signal a little. This won't be 100% lossless (even setting the pre-processor asside), but about as close as you can get (you have 24 bits to play with) and the volume change can be much less.

3. You can attenutate the 16-bit signal a little bit before or within the lossy pre-processor, and keep it at 16-bits. (i.e. as (2), but at 16-bits throughout). You should probably dither with this method.

There are various other bodges that try to keep the full volume (e.g. rounding down / intentionaly changing the clipped samples, but leaving everything else), but this is more lossy and I don't like it.

I'll upload some samples next.

Cheers,
David.
Title: Near-lossless / lossy FLAC
Post by: 2Bdecided on 2007-06-15 11:40:48
Here are some more examples.

It's interesting to compare bitrates, but more important to ABX if you can!

The originals are mostly from here:

http://ff123.net/samples.html (http://ff123.net/samples.html)
http://gurusamples.free.fr/samples/ (http://gurusamples.free.fr/samples/)
http://membres.lycos.fr/guruboolez/AUDIO/samples/ (http://membres.lycos.fr/guruboolez/AUDIO/samples/)

Cheers,
David.
Title: Near-lossless / lossy FLAC
Post by: 2Bdecided on 2007-06-15 11:57:49
Here are some more:
Title: Near-lossless / lossy FLAC
Post by: 2Bdecided on 2007-06-15 12:46:02
Just tried your variants of furious.
6_furious is terrible of course (10/10), and 5_furious is pretty bad as well (9/10 - guess I was a bit too fast with the last guesses).
But as for the other variants: I can abx none of them, and that's true also for your very first sample 07_furious_lossy which I tried again, but no chance (6/10). (I still had the impression with several guesses that the encoding is somewhat 'slower', but forget it.) Sorry for having you done this extra-work.
That's OK - thank you for ABXing.

6 was quantised at +12dB
5 was quantised at +6dB
3 was quantised at 0dB
4 was quantised at -6dB

1 and 2 were -6dB and 0dB respectively, but and spread the FFTs over 3 bins instead of 4, which lowered the noise further.

Cheers,
David.

btw, here are the bitrates I have for those latest files:
(YMMV if you use other than default FLAC settings)
Title: Near-lossless / lossy FLAC
Post by: halb27 on 2007-06-15 13:21:18
...
6 was quantised at +12dB
5 was quantised at +6dB
3 was quantised at 0dB
4 was quantised at -6dB

1 and 2 were -6dB and 0dB respectively, but and spread the FFTs over 3 bins instead of 4, which lowered the noise further.

In case you're interested in my feelings: I had the impression that 1 was best and I even started to write it in my post, but erased it cause objectively speaking there can't be differences when you call something 'transparent' according to abx results.
Title: Near-lossless / lossy FLAC
Post by: haregoo on 2007-06-15 14:11:36
For those who are too lazy to download samples one by one, here are
- lossy pack (from post#69 and 70) (16MB) (http://www.fileden.com/files/2006/8/5/156176/69-70%20lossy.zip)
- lossless pack (30MB) (http://www.fileden.com/files/2006/8/5/156176/69-70%20lossless.zip)

Thanks 2Bdecided.
Title: Near-lossless / lossy FLAC
Post by: jcoalson on 2007-06-15 18:24:32
It seems that the SoundSimplifierâ„¢ method is proving to be useful in use with any lossless codec.
actually not any, some codecs like monkey's audio and I think optimfrog do not have the feature that will take advantage of the static LSBs.

BTW shorten has it too and it also has a somewhat similar lossy mode.
Title: Near-lossless / lossy FLAC
Post by: halb27 on 2007-06-15 18:58:13
Just tried Atem-lied. Couldn't abx it.
Title: Near-lossless / lossy FLAC
Post by: SebastianG on 2007-06-15 19:10:44
Three rather unrelated but still on-topic comments:

(1) I'd like to note that it's not only the "frame size" that should match. This preprocessor and any lossless encoder exploiting zeroed LSBs should be in perfect sync (not only the same frame sizes but the same frame boundary positions).

(2) It's nice to have those isolated tools ("simplifier" and lossless encoder) but this also limits the performance. So one should either go for a combined tool with variable length blocks or a modified lossless encoder which is smart enough to detect varying "wasted_bits" and partitions the stream accordingly.

(3) Here's another technical thought which might be interesting for Thomas in case he wants to add lossy support to TAK:
Selecting "wasted_bits" to be an integer allows an encoder to control the signal-to-noise ratio in steps of 6 dB only. Compared to other lossy codecs (MP3, AAC control the SNR in steps of roughly 1.1 dB = 1.5*(3/4) dB) this 6 dB step size is quite large. This is an old idea of mine of how to get more resolution: Make it probabilistic. You can store in each frame or subframe (you might want to allow changing the resolution within a frame) the information "wasted_bits = x with probability p and x+1 with probability (1-p)" and use the same pseudo-random number generator in encoder and decoder for deciding the "wasted_bits" value per sample. Also you should think about generating the actual "wasted bits" via this RNG instead of zeroing them. This would be equivalent to subtractive dithering and avoids nonlinear distortions. Entropy coding might be a bit more complicated, though.

Per sample coding could be done like this:
Code: [Select]
wbits = minWasted + RNG.nextfloat()>p ? 1 : 0;  // randomly chosen wasted bits count
waste = RNG.nextIntBits(wbits); // randomly generated LSBs
quantized_to_code = round( (float)(current_sample-waste) / 2^wbits ); // sample to code
quantized_actual = (quantized_to_code << wbits) + waste: // dequantized sample

Of course, the encoder's RNG state should match the decoder's (ie. same seed).
Good news: Noise shaping doesn't need to be part of the format specification but can later be added to the encoder without breaking anything.

Cheers!
SG
Title: Near-lossless / lossy FLAC
Post by: TBeck on 2007-06-15 19:54:41
(3) Here's another technical thought which might be interesting for Thomas in case he wants to add lossy support to TAK:
Selecting "wasted_bits" to be an integer allows an encoder to control the signal-to-noise ratio in steps of 6 dB only. Compared to other lossy codecs (MP3, AAC control the SNR in steps of roughly 1.1 dB = 1.5*(3/4) dB) this 6 dB step size is quite large. This is an old idea of mine of how to get more resolution: Make it probabilistic. You can store in each frame or subframe (you might want to allow changing the resolution within a frame) the information "wasted_bits = x with probability p and x+1 with probability (1-p)" and use the same pseudo-random number generator in encoder and decoder for deciding the "wasted_bits" value per sample. Also you should think about generating the actual "wasted bits" via this RNG instead of zeroing them. This would be equivalent to subtractive dithering and avoids nonlinear distortions. Entropy coding might be a bit more complicated, though.

That's a very nice idea!

But if i build a dedicated lossy codec, i am -unlike the preprocessor- not restricted regarding the scale factors. I may divide the signal by any value i like and therefore can have a quite high resolution of the signal-to-noise steps. My very old experimental implementation of a lossy codec was using about 2 dB.

  Thomas
Title: Near-lossless / lossy FLAC
Post by: SebastianG on 2007-06-15 20:09:15
Very true. Although, what integer "scalefactor" is between 1 and 2 ? 
BTW, the probabilistic approach can and was initially intended to be used for steganography.

Cheers!
SG
Title: Near-lossless / lossy FLAC
Post by: TBeck on 2007-06-15 20:22:52
Very true. Although, what integer "scalefactor" is between 1 and 2 ? 

As many as the resolution of my fixed-point integer arithmetic (multiplication with a factor 1 / "ScaleFactor") permits? I am not restricted to integer values.

But i forgot one thing: If you want to have a correction file, it's much easier and more efficient to use the integer bit removal approach and then i would favourize your clever idea.

  Thomas
Title: Near-lossless / lossy FLAC
Post by: SebastianG on 2007-06-15 22:59:11
As many as the resolution of my fixed-point integer arithmetic (multiplication with a factor 1 / "ScaleFactor") permits? I am not restricted to integer values.

Hmm... Havn't thought of that, tbh.

But i forgot one thing: If you want to have a correction file, it's much easier and more efficient to use the integer bit removal approach

Hmm... I see what you mean. However, this correction file stuff is not easily combined with noise shaping. No matter how you do this correction (coding of either unfiltered or filtered error samples). I wonder how MPEG4-SLS is going to solve that problem. Prior mennos comment (http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=55522&view=findpost&p=498380) I believed SLS to be some kind of AAC + IntMDCT mix.

Cheers!
SG
Title: Near-lossless / lossy FLAC
Post by: 2Bdecided on 2007-06-16 22:48:10
I like the subtractive dither idea (though of course it's useless with a pre-processor approach), but I doubt you'll find distortion without dither in this application, so it's more a case of "nice ot have, just in case" rather than essential.

I don't like the idea of noise shaping in this application (though it could depend what you mean by noise shaping). I guess you can draw the line between pure lossless, and lowest possible transparent bitrate lossy anywhere you like - but the cleverer you get, the closer you get to mp3 etc - one of the points of this is that, with nothing "clever" going on, there's nothing there to unexpectedly interfere with something (anything) downstream. It could be a great format for transcoding.

You could add more clever stuff as an option, but I think it would be extremely useful to keep the minimalist aproach I've outlined as an option too.


I know it's a tough call, but I was hoping for some ABX results. IMO there's not much point proceding further until the best ears at HA have commented!

Cheers,
David.
Title: Near-lossless / lossy FLAC
Post by: TBeck on 2007-06-17 15:29:58
I know it's a tough call, but I was hoping for some ABX results. IMO there's not much point proceding further until the best ears at HA have commented!

Oh yes, please...

Unfortunately i have lost my "Golden Ears" in the last years (that's why i have dropped TAK's earlier lossy mode), therefore i can't help with this although i would be very excited to do so.

I am really thinking about the possibility to add a lossy mode to TAK. But i would only like to do it, if transparency can be achieved at bitrates i myself would regard as small enough to be useful. Therefore i am very interested into some ABX results.

Later i may also provide an alternative preprocessor application (in another thread, no hijacking) which is based upon my earlier lossy approach. It's working quite different than your preprocessor. It would be interesting to compare the results.

  Thomas
Title: Near-lossless / lossy FLAC
Post by: halb27 on 2007-06-17 18:09:19
... I am really thinking about the possibility to add a lossy mode to TAK. ...

Wonderful.
IMO extremely high quality lossy variants of lossless codecs are most attrative today now that mass storage is so cheap. If a robust extremely high quality is achievable at ~ 350 kbps this is still very attractive compared to lossless. This is especially true if such a codec is available for DAPs (for mere PC use we're already in a state where many people can use a lossless codec). At the moment availability for DAPs de facto means: it's available with Rockbox firmware.
Title: Near-lossless / lossy FLAC
Post by: TBeck on 2007-06-18 01:54:16
If a robust extremely high quality is achievable at ~ 350 kbps this is still very attractive compared to lossless.

I doubt, that this is possible. I would expect 400 to 450 kbps (on average) to be sufficient.

But we will never know, if nobody evaluates 2Bdecided's files... 

BTW: I spend this day building a quick and dirty preprocessor which implements a variation of my old lossy approach:

- Preprocessing of files with output to a wave file. Hi 2Bdecided, your idea is really clever!
- Shows you, which bit rate TAK's default setting would achieve when compressing the output file.
- Select a quality level.
- Choose between 2 different filters.

Unfortunately it doesn't make sense to release the preprocessor, if nobody likes to evaluate the results of such software...

  Thomas
Title: Near-lossless / lossy FLAC
Post by: halb27 on 2007-06-18 07:11:56

If a robust extremely high quality is achievable at ~ 350 kbps this is still very attractive compared to lossless.

I doubt, that this is possible. I would expect 400 to 450 kbps (on average) to be sufficient.
But we will never know, if nobody evaluates 2Bdecided's files... 

BTW: I spend this day building a quick and dirty preprocessor which implements a variation of my old lossy approach:
....
Unfortunately it doesn't make sense to release the preprocessor, if nobody likes to evaluate the results of such software...

'~ 350 kbps' was not meant to be achieved with 2Bdecided's preprocessor exactly though that would be great. When I wrote that I had a lossy mode of TAK in mind you were thinking about.
shadowking has done listening tests, and me too spent quite some time with listening tests on various samples 2Bdecided gave us. So 'nobody likes to evaluate ...' is not totally correct though sure a lot more testing would be welcome. I have a lot of hope that Porcupine will join us. He has a very good feeling what to look at to find the weaknesses of these kind of codecs.
I'd love to learn to know your preprocessor, and I'm willing to do some listening tests.
Title: Near-lossless / lossy FLAC
Post by: shadowking on 2007-06-18 07:30:10
I don't expect problems at 550k. Even shorten and rkau with lossy were fine at those bitrates when I played with them, but not good at 350k. If some data reduction is needed and one is content with 450~550k bitrate then this preprocessor will work fine. If that is its goal I don't see a problem. The goals of Dualstream and to an extent Wavpack lossy are different. Can this preprocessor be ported to meet those goals ?
Title: Near-lossless / lossy FLAC
Post by: 2Bdecided on 2007-06-18 09:52:42
I don't expect problems at 550k. Even shorten and rkau with lossy were fine at those bitrates when I played with them, but not good at 350k. If some data reduction is needed and one is content with 450~550k bitrate then this preprocessor will work fine. If that is its goal I don't see a problem. The goals of Dualstream and to an extent Wavpack lossy are different. Can this preprocessor be ported to meet those goals ?


The important thing about my approach is that it's pure VBR.

If it works, the bitrate will be whatever the bitrate will be. You will have no control over it - it'll be completely content dependent.

You can shift the threshold upwards to decrease the bitrate, but then you'll introduce audible noise. Simple as that.

(More usefully, you can increase the bitrate by shifting the threshold downwards - this could be useful to allow multiple generations of coding. Also, if I've set the threshold in the wrong place to start with, it will have to be lowered by default - a reason to ABX!)


If my approach doesn't work, then it will be ABXable sometimes. I don't expect problems, but I don't think shadowking you can imply there's no need for people to ABX just because the bitrate is usually high. You can have a high bitrate file with audible artefacts!


If my answer hasn't covered what you had in mind, let me know what you meant by "the goals of Dualstream and to an extent Wavpack lossy".

Cheers,
David.
Title: Near-lossless / lossy FLAC
Post by: 2Bdecided on 2007-06-18 10:28:43
Unfortunately it doesn't make sense to release the preprocessor, if nobody likes to evaluate the results of such software...


I'll have a play, if you can release it.

The things you've already discussed (taking my basic idea and enhancing it when implementing it within TAK) sound quite exciting. If you could include more ideas from your own lossy work that could be even better.

Cheers,
David.
Title: Near-lossless / lossy FLAC
Post by: shadowking on 2007-06-18 10:32:27
Ok. I understand now. Goal is transparent pure vbr without bitrate control.

Can you try these:

http://64.41.69.21/technical/reference/keys_1644ds.wav (http://64.41.69.21/technical/reference/keys_1644ds.wav)

http://64.41.69.21/technical/reference/keys_2496.wav (http://64.41.69.21/technical/reference/keys_2496.wav)


Old artificial sample. Seems to be an exclusive optimfrog / wavpack problem. Noise at 350 k .. I abxed at 450k but fail at 512k. Advanced noise shaping (up) works wonders for both encoders.
Title: Near-lossless / lossy FLAC
Post by: Nick.C on 2007-06-18 10:51:15
As the basic premise of reducing bitdepth "transparently" will apply to all lossless codecs, I would very much like to see a standalone implementation of the method.
Title: Near-lossless / lossy FLAC
Post by: menno on 2007-06-18 10:59:09
Hmm... I see what you mean. However, this correction file stuff is not easily combined with noise shaping. No matter how you do this correction (coding of either unfiltered or filtered error samples). I wonder how MPEG4-SLS is going to solve that problem. Prior mennos comment (http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=55522&view=findpost&p=498380) I believed SLS to be some kind of AAC + IntMDCT mix.


In SLS the correction/scalability has nothing to do with the AAC core (except defining the lower limit  ). The quantised AAC spectrum serves as a starting point for SLS, but there is nothing to scale in that. The scalability in SLS comes from the fact that the frames are entropy coded in bitplanes. So first all MSB's in the frame are encoded, etc, upto the LSB. Scaling can then be achieved by simply removing some bytes from the end of the frame. Correction files (tracks) can be made by copying bytes from the end of each frame to another file or track.
The trick proposed in this threat can basically be done as post-processing in SLS instead of pre-processing.
Title: Near-lossless / lossy FLAC
Post by: menno on 2007-06-18 11:09:43
Funny is also that SLS does not manage to take any advantage of this pre-processor
Title: Near-lossless / lossy FLAC
Post by: 2Bdecided on 2007-06-18 12:00:08
Ok. I understand now. Goal is transparent pure vbr without bitrate control.
Exactly.

Quote
Can you try these:

http://64.41.69.21/technical/reference/keys_1644ds.wav (http://64.41.69.21/technical/reference/keys_1644ds.wav)

http://64.41.69.21/technical/reference/keys_2496.wav (http://64.41.69.21/technical/reference/keys_2496.wav)


Old artificial sample. Seems to be an exclusive optimfrog / wavpack problem. Noise at 350 k .. I abxed at 450k but fail at 512k. Advanced noise shaping (up) works wonders for both encoders.


Wow! They're killer samples for this algorithm, and FLAC itself. I think they're still transparent (can you try ABX please?) but look at the bitrates (all FLAC)...

keys_1644ds:
lossless: 1078kbps (ratio=0.764)
lossy: 829kbps (ratio=0.587)

keys_2496:
lossless: 4587kbps (ratio=0.995!!!)
lossy: 1742kbps (ratio=0.378)


Note: There's a mistake in the MATLAB script I've posted when the FFT size is larger than the lossless block size (as it is at 96kHz sampling with the parameters I was using). I believe I've fixed it for your sample, but I'll check the script more thoroughly with other sample rates before I post an update.

Cheers,
David.
Title: Near-lossless / lossy FLAC
Post by: shadowking on 2007-06-18 12:47:42
More info and warnings about these samples here:

http://64.41.69.21/technical/sample_rates/index.htm (http://64.41.69.21/technical/sample_rates/index.htm)..

It seems they are transparent ! - bitrate is insane but thats unlimited VBR for you. Dualstream quality 5 was where I couldn't get good results in the past - these 'samples' maybe tricking the optimfrog model. Bitrate is 458 k.

Optimfrog Wavpack has much better compression on these. That's why vbr is very important on lower compression codecs and modes like FLAC. With optimfrog, MAC, TAK and Wavpacks -hx modes, even the 'end to all' cases will do fine with ABR 400~500k simply because they will compress well. In flac (or wavpack fast / normal modes) such a 'fixed' bitrate will trigger strong noise.
Title: Near-lossless / lossy FLAC
Post by: TBeck on 2007-06-18 13:00:21

....
Unfortunately it doesn't make sense to release the preprocessor, if nobody likes to evaluate the results of such software...

...
shadowking has done listening tests, and me too spent quite some time with listening tests on various samples 2Bdecided gave us. So 'nobody likes to evaluate ...' is not totally correct though sure a lot more testing would be welcome. I have a lot of hope that Porcupine will join us. He has a very good feeling what to look at to find the weaknesses of these kind of codecs.
I'd love to learn to know your preprocessor, and I'm willing to do some listening tests.

Big sorry halb27!

I should have split the post. My complaining about the lack of testers was not directed to you!


Unfortunately it doesn't make sense to release the preprocessor, if nobody likes to evaluate the results of such software...

I'll have a play, if you can release it.

Fine!

The things you've already discussed (taking my basic idea and enhancing it when implementing it within TAK) sound quite exciting.

For now i took your great preprocessor idea to test my own, very simple approach for the determination of the wasted bit count. My approach has possibly more in common with the method described by SebastianG (http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=55522&view=findpost&p=498376).  I would have liked to add your code too, but unfortunately i don't know nothing about MathLab and also not very much about DSP, therefore this was out of the scope of a rainy afternoons work.

If you could include more ideas from your own lossy work that could be even better.

I am not sure if you are already doing something like this: Because 1024 samples are still quite much, i am partitioning the frame into blocks of 128 or 256 samples. Each block is beeing analyzed and i am using the lowest wasted bit count result for the whole frame. Safety first.

  Thomas
Title: Near-lossless / lossy FLAC
Post by: 2Bdecided on 2007-06-18 13:51:31
I am not sure if you are already doing something like this: Because 1024 samples are still quite much, i am partitioning the frame into blocks of 128 or 256 samples. Each block is beeing analyzed and i am using the lowest wasted bit count result for the whole frame. Safety first.
I'm doing the same, but with 2 FFT size...

My analysis is (ignoring bugs in the code!) independent of lossless frame size. The thresholds are calculated using 20ms and 1.5ms FFTs. Looking at all the results, the lowest wasted_bits requirement within the frame is the one that is chosen.

I'm sure you could follow the MATLAB  - it's only BASIC with some clever array handling. I'll try to add some more comments to the code when I get time.

Cheers,
David.
Title: Near-lossless / lossy FLAC
Post by: Nick.C on 2007-06-18 13:54:57
A bit O/T, but I'm a reformed Pascal / Assembler hobby programmer and find the whole concept of playing with audio quite appealing - however, not £1350 appealing (commercial Matlab licence cost). I found Scilab and FreeMAT almost immediately - which would you recommend as a being easier to port the Matlab code to?
Title: Near-lossless / lossy FLAC
Post by: 2Bdecided on 2007-06-18 14:51:32
I haven't tried myself, but GNU Octave is supposed to be good.

EDIT: The first problem is getting audio data in. I use my own routines, hacked from the basic MATLAB routines wavread and wavwrite, to add the ability to handle various sample rates, bit depths, and numbers of channels. Since they're hacks of copyrighted code, I don't feel comfortable sharing them, but wavread and wavwrite are good for 44.1kHz 16-bit stereo.

Cheers,
David.
Title: Near-lossless / lossy FLAC
Post by: TBeck on 2007-06-18 15:19:12
Would it be ok to discuss my preprocessor in this thread?

I don't want to perform some kind of hijacking, but it seems to be the right context.

  Thomas

P.S.: It sems to be possible to attach files to ordinary threads, but how to do it? Have i to belong to another member group to be allowed to?
Title: Near-lossless / lossy FLAC
Post by: 2Bdecided on 2007-06-18 17:10:23
I think "Developers" can upload files anywhere. "Members" can only upload in the uploads forum. I guess you can PM a moderator to become a "Developer".

Of course you can discuss your pre-processor here, though if we're both going to get people to ABX, it might get a bit confusing. I guess it depends if you expect them to merge, or not.

I am not an open source zealot, but for a useful discussion, you'll have to share pretty much exactly what it's doing - otherwise there's little hope of finding relevant problem samples without doing an exhaustive test.

Cheers,
David.
Title: Near-lossless / lossy FLAC
Post by: TBeck on 2007-06-18 19:34:03
I think "Developers" can upload files anywhere. "Members" can only upload in the uploads forum. I guess you can PM a moderator to become a "Developer".

Thanks for the info.

Of course you can discuss your pre-processor here, though if we're both going to get people to ABX, it might get a bit confusing. I guess it depends if you expect them to merge, or not.

I am not an open source zealot, but for a useful discussion, you'll have to share pretty much exactly what it's doing - otherwise there's little hope of finding relevant problem samples without doing an exhaustive test.

I understand.

Currently i only want to see, if my approach is useful at all. It's quite possible that your method is superior. If so, then it does not make sense to put more effort into my method. It's really very very simple. And it had to be simple in 1997 when my Pentium 133 provided very limited processing power...

What it does: It performs a very simple estimation of the expected residuals (what is likely to remain after sending the signal through the linear predictor?). Then it calculates Log2 of the mean of the residuals, substracts the quality treshhold (expressed as bits) and uses the difference as shift value. Thats all.

It's some kind of CBR compression with some variation introduced by the estimation error of the residuals.

Probably you are right: It's better to open a new thread to avoid confusion. But it would be nice to compare the results to check, if my approach does make any sense.

If so, then i can think about an implementation as part of a preprocessor or intgerated into TAK. Currently i am just curious and maybe a bit lazy: Actually i should better work on far less interesting tasks of the next TAK lossless release...

  Thomas
Title: Near-lossless / lossy FLAC
Post by: TBeck on 2007-06-18 20:57:14
I just have opened a new thread (http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=55648&view=findpost&p=499404) to discuss my preprocessor implementation.

  Thomas
Title: Near-lossless / lossy FLAC
Post by: halb27 on 2007-06-18 21:05:22
... Can you try these:
http://64.41.69.21/technical/reference/keys_1644ds.wav (http://64.41.69.21/technical/reference/keys_1644ds.wav)...

Just tried keys with wavPack lossy @ 350 kbps fast mode: terrible. triangle-2 from that page is very ugly too.
I was afraid after having heard furious that there may be real life sound that makes look wavPack lossy pretty bad.
Setting for my DAP (see signature) btw is excellent in comparison to plain wavPack usage - 32 kHz sampling frequency and s0.4 is a bit of an anti-killer setting. I did a lot of listening tests with it this weekend and I'm very content.

Anyway this shows that a good quality control would be very much welcome for wavPack lossy. Usually a pretty moderate bitrate yields excellent results, but it's not always the case. Maybe this thread encourages David Bryant to go along this way.
Title: Near-lossless / lossy FLAC
Post by: halb27 on 2007-06-18 21:34:56
Wow! They're killer samples for this algorithm, and FLAC itself. I think they're still transparent (can you try ABX please?) but look at the bitrates (all FLAC)...

keys_1644ds:
lossless: 1078kbps (ratio=0.764)
lossy: 829kbps (ratio=0.587)

....

To me keys_1644ds_lossy.flac is transparent too.
I wouldn't care much about bitrate bloat of such samples. A robust extremely high quality is what counts, as well as average bitrate of different genres.
Title: Near-lossless / lossy FLAC
Post by: Porcupine on 2007-06-18 21:55:09
I agree with halb27. A humongous lossy file in those cases is actually what a good VBR is supposed to do, to maintain the sound quality.

In any case, here's a sample I'd like you to try, 2Bdecided.

Very "Easy" Sample (http://www.hydrogenaudio.org/forums/index.php?showtopic=55649)

This is a typical example of the kind of obnoxious music that I think compresses best with things such as WavPack lossy. It requires 1200+ kbps to be mathematically lossless, but even at 200 kbps and below it still sounds transparent to me. Maybe shadowking or others can try to ABX it at WavPack/Optimfrog lowest quality settings, I think it's transparent.

The reason to test this sample would be to test how dynamic the range of 2Bdecided's VBR algorithm is. Will it choose a very low bitrate, and if it does will it still be transparent? By the way, this sample isn't clipped I made sure. Upon extraction, EAC reported the song's normalization as 98.8% amplitude.

I plan to install foobar2k next weekend and I can try to ABX some of the samples, sorry I didn't have enough time this past weekend. I figure I need to dedicate about a day to play with foobar first, and another day to do actual listening tests. But just to let people know, never from the start did I consider myself to have terrific hearing. I think I probably only have fairly good hearing, and it isn't as good as it used to be either, I'm in my late 20's. And even when I was in my teens, I think I had perfect undamaged hearing but when my friends and I did "single" blind-tests back in the early days of mp3, one of my friends easily defeated me in being able to differentiate certain things.
Title: Near-lossless / lossy FLAC
Post by: shadowking on 2007-06-19 01:41:53
Simple tonal classical music should be tested. Corelli and bruhns samples from Guruboolez come to mind.


... Can you try these:
http://64.41.69.21/technical/reference/keys_1644ds.wav (http://64.41.69.21/technical/reference/keys_1644ds.wav)...

Just tried keys with wavPack lossy @ 350 kbps fast mode: terrible. triangle-2 from that page is very ugly too.
I was afraid after having heard furious that there may be real life sound that makes look wavPack lossy pretty bad.
Setting for my DAP (see signature) btw is excellent in comparison to plain wavPack usage - 32 kHz sampling frequency and s0.4 is a bit of an anti-killer setting. I did a lot of listening tests with it this weekend and I'm very content.

Anyway this shows that a good quality control would be very much welcome for wavPack lossy. Usually a pretty moderate bitrate yields excellent results, but it's not always the case. Maybe this thread encourages David Bryant to go along this way.


True. These cases will never happen on CD though. They are good reference for tuning. The thing is that the preprocessor is averaging 450~550k on vbr. If I matched wavpack abr bitrate then all results are also transparent. It would be very impressive if to have a wavpack preset avg 320k that can adapt to these cases. So far this isn't the case - not even dualstream but its close. and I don't know if its possible without noise shaping (vbr + shaping). The wavpack 4.x encoder should handle these fine.
Title: Near-lossless / lossy FLAC
Post by: halb27 on 2007-06-19 06:20:10
... These cases will never happen on CD though. ...

Why not? That's exactly what I am afraid of.
I am not so much afraid of killer samples originating from certain kind of electronic music (cause that's not 'my' music) - but an isolated loud triangle or similar might well reach my musical horizon.
Sure things are to be set into relation. We can't expect a full frequency encoding with 350 kbps fast mode to be good or acceptable at everything. 'My' 350 kbps fast mode 32 kHz resampling result isn't transparent as well though acceptable.
Title: Near-lossless / lossy FLAC
Post by: shadowking on 2007-06-19 07:45:54
Its nothing but test signal stuff. If you read the pcabx page and all its warning of possible equipment and hearing damage.

I agree though that some cd content can be similar, but the effect will be a much more subtle noise. The fast and even normal modes were never designed for robust compression and quality. I would go for 350k high modes and even -ans. I never liked too many hacks. Actually I think at these middle bitrates -ans will be an advantage as you will get gains in these situation and you have enough masking to not hear the -ans 'working'.
Title: Near-lossless / lossy FLAC
Post by: 2Bdecided on 2007-06-19 10:25:02
I agree with halb27. A humongous lossy file in those cases is actually what a good VBR is supposed to do, to maintain the sound quality.

In any case, here's a sample I'd like you to try, 2Bdecided.

Very "Easy" Sample (http://www.hydrogenaudio.org/forums/index.php?showtopic=55649)

This is a typical example of the kind of obnoxious music that I think compresses best with things such as WavPack lossy. It requires 1200+ kbps to be mathematically lossless, but even at 200 kbps and below it still sounds transparent to me. Maybe shadowking or others can try to ABX it at WavPack/Optimfrog lowest quality settings, I think it's transparent.

The reason to test this sample would be to test how dynamic the range of 2Bdecided's VBR algorithm is. Will it choose a very low bitrate, and if it does will it still be transparent? By the way, this sample isn't clipped I made sure. Upon extraction, EAC reported the song's normalization as 98.8% amplitude.
This one is very interesting!

Guess how many "bits of resolution per sample" lossy FLAC thinks this needs?

On average, 5-6, with some moments only needing 4.

In other words, lossy FLAC is sometimes dropping 12 of the original 16 bits!

This means, despite the original sample not clipping, the resulting file often clips. For this reason, I've done two versions - one at full amplitude (which clips) and one at 50% amplitude 24-bit resolution (though most of these bits are subsequently dumped: typically 17-18!).

If you want to ABX the reduced amplitude version, you'll have to enable ReplayGain, or use the "lossless half amplitude" version I've included below as a reference.

This is the one that clips:
[attachment=3366:attachment]444kbps

This is the one that's 6dB quieter, and doesn't clip
[attachment=3367:attachment]342kbps

This is a lossless 6dB quieter file for ABX comparison with the above
[attachment=3368:attachment]1269kbps


Given the aggressive processing this has received by lossy FLAC, I would really appreciate it if people could try to ABX.


Quote
I plan to install foobar2k next weekend and I can try to ABX some of the samples, sorry I didn't have enough time this past weekend. I figure I need to dedicate about a day to play with foobar first, and another day to do actual listening tests. But just to let people know, never from the start did I consider myself to have terrific hearing. I think I probably only have fairly good hearing, and it isn't as good as it used to be either, I'm in my late 20's. And even when I was in my teens, I think I had perfect undamaged hearing but when my friends and I did "single" blind-tests back in the early days of mp3, one of my friends easily defeated me in being able to differentiate certain things.
Well, you can try. Foobar2k ABX is easy enough to use - I don't think you'll need a day! More like one minute...

Cheers,
David.
Title: Near-lossless / lossy FLAC
Post by: Nick.C on 2007-06-19 11:34:35
To my untrained ear, 342kbps sounds *very* nice indeed - using fb2k & earbuds on my laptop - not a semi-pro ABX, just comparable to how I would actually listen to it.

However, I can't play it on my iPAQ because GSPFlac.dll does not seem to handle this type of FLAC  . Converted to WAV and GSPlayer still fails - hmmmmm....... something's wrong with my hardware. Nope, nothing wrong with hardware - just a 16bit limitation - it falls over with >16bit samples. Played all the other comparator samples in 69/79 and was pleased not to notice any degradation (caveat: on earbuds on an iPAQ). Very pleased.
Title: Near-lossless / lossy FLAC
Post by: halb27 on 2007-06-19 12:41:24
...This means, despite the original sample not clipping, the resulting file often clips. ...

I must have a misconception about your preprocessor.
I thought you're just zeroing a certain amount of least significant bits of each sample according to what your machinery thinks it can safely do so. But then clipping couldn't occur with your preprocessor, and shouldn't occur with FLAC of course.
What's wrong with my imagination?
Title: Near-lossless / lossy FLAC
Post by: 2Bdecided on 2007-06-19 13:00:49
I zero by rounding, not truncation.

Maybe I'll try truncation instead.

The problem is that it will introduce a DC bias which will accumulate if you go through many generations of processing. However, it should solve the clipping problem.

Cheers,
David.
Title: Near-lossless / lossy FLAC
Post by: Nick.C on 2007-06-19 13:26:07
Okay, trying to understand the maths - you have a 16 bit signed number between -32768 and +32767  so if (not clear on the nomenclature here) rounding / truncating was to 10 significant bits (-512 to 511), you might get:

0101101011101000 > 0101101011000000

or am I missing the point? Or, do you merely create a bitstream of signed 10bit elements?
Title: Near-lossless / lossy FLAC
Post by: 2Bdecided on 2007-06-19 13:29:20
Here is the version with truncation (round down, if you like) instead of standard rounding...

[attachment=3369:attachment]This is 16bits

I don't think this file will sound any different, but the errors are all one way (down), which means the DC level has been shifted by up to 6.5% in places. EDIT: I can imagine a file where the DC jump would an audible, so don't like this method, but see what you think.


Re Clipping: If you are going to round or truncate samples, some of the results will hit digital full scale (0dB FS). This isn't strictly "clipping", even though many audio editors will report it as such. Clipping is strictly where the (analogue) signal should be larger than 0dB FS, but was clipped at 0dB FS because it couldn't go any higher in the digital domain. In comparison, 0dB FS is exactly where lossy FLAC wanted to put those samples: no higher, no lower. So there's never clipping, but there can be digital full scale samples which weren't at digital full scale before.

This isn't a problem. The problem is that digital waveforms have positive and negative peaks (obviously!). The limits, i.e. negative and positive digital full scale, are not equal because we have an even number of values available (2 to the power n values, where n is the bitdepth), but must include zero in the middle, which leaves an odd number of values to split between positive and negative - hence we cannot have the same number of positive and negative values available. e.g. for 16-bit audio, 0 is zero(!), negative full scale is -32768, positive full scale is +32767.

With rounding, you could head towards positive digital full scale. This would be +32768, which you can't have. That's the problem. In this case, lossy FLAC was hitting +32767, which is binary all ones, i.e. no wasted bits. In contrast, truncation always works: you'll never hit +32768 because you're always rounding down, and the largest number you started with was only +32767. Since you can have -32768, truncation (rounding down) is always OK.

Hope this makes sense.

Cheers,
David.
Title: Near-lossless / lossy FLAC
Post by: 2Bdecided on 2007-06-19 14:14:44
To my untrained ear, 342kbps sounds *very* nice indeed - using fb2k & earbuds on my laptop - not a semi-pro ABX, just comparable to how I would actually listen to it.

However, I can't play it on my iPAQ because GSPFlac.dll does not seem to handle this type of FLAC  . Converted to WAV and GSPlayer still fails - hmmmmm....... something's wrong with my hardware. Nope, nothing wrong with hardware - just a 16bit limitation - it falls over with >16bit samples. Played all the other comparator samples in 69/79 and was pleased not to notice any degradation (caveat: on earbuds on an iPAQ). Very pleased.


You can have that file in 16-bits.

You're not losing anything in this case, since lossy FLAC has already dumped most of the bits!

If lossy FLAC wanted to keep all 16-bits, then staying at 16-bits but reducing the amplitude will effectively raise the noise floor by 6dB, which is why I don't do it by default. However, given the 16-bit hardware limitation and the other issues with clipping, I think this should certainly be an option, as with the file I've attached here.

Cheers,
David.
Title: Near-lossless / lossy FLAC
Post by: 2Bdecided on 2007-06-19 14:27:45
Truncating can be bad.

I have a killer test sample where truncating is ABXable.

It's also an interesting sample for checking lossy FLAC's block boundaries. Despite visible glitches in this killer test sample, I don't think they are audible.


This is the orginal file:
[attachment=3371:attachment]

This is the truncated version:
[attachment=3372:attachment]I can ABX this

This is the rounded (hence clipped) version:
[attachment=3373:attachment]I can't ABx this, but maybe someone can?

This is how it should be done (24 bits, half amplitude):
[attachment=3374:attachment]


This is a half amplitude 16-bit version for Nick:
[attachment=3377:attachment]


There must be a similar test case which can be bad for rounding. I'm going to give it more thought.

Cheers,
David.
Title: Near-lossless / lossy FLAC
Post by: halb27 on 2007-06-19 14:41:40
What about rounding towards zero? Should prevent clipping and avoid a systematic DC offset.
Title: Near-lossless / lossy FLAC
Post by: 2Bdecided on 2007-06-19 16:17:13
That would put a kink in the transfer function, a bit like a bad class B amp. This would introduce harmonic (and aliased inharmonic) distortion that may or may not be audible. It would also reduce the peak to peak and RMS amplitude, which could, in extreme cases, be perceived as a slight reduction in loudness.

There's no perfect solution to this, so I guess it's a case of finding the least imperfect. The problem is, which one is least imperfect probably depends on the application, and it would be nice to avoid that complexity.


btw, the issue in my previous post is an example of noise modulation. This is an effect of having no dither, or the wrong dither. I didn't expect to find a real world sample where this would be audible, but to solve my own killer sample I might have to add a dither option.

Cheers,
David.
Title: Near-lossless / lossy FLAC
Post by: TBeck on 2007-06-19 16:34:55
Truncating can be bad.

I second that!

My lossy approach is currently truncating. I tried the problem sample (for TAK-Lossy) "keys_1644ds" (reported by halb27) with rounding and it sounded considerably better. With truncating i would have to reduce the wasted sample bits count by about 0.7 (on average) to achieve a similar quality.
Title: Near-lossless / lossy FLAC
Post by: robert on 2007-06-19 17:31:12
I zero by rounding, not truncation.

Maybe I'll try truncation instead.

The problem is that it will introduce a DC bias which will accumulate if you go through many generations of processing. However, it should solve the clipping problem.

Cheers,
David.

I am only curious. How does the MATLAB rounding-function work? Round to even?
Title: Near-lossless / lossy FLAC
Post by: 2Bdecided on 2007-06-19 20:52:11
I am only curious. How does the MATLAB rounding-function work? Round to even?
By default, MATLAB works in double precision floats, and stores audio data over the range +/-1. It's only converted back to integers when writing to .wav. To round, I'm multiplying by 2^n, using round(), and then dividing by 2^n. The round function itself just rounds like you would at school.

Gotta go - serious lightening storm here!
Title: Near-lossless / lossy FLAC
Post by: halb27 on 2007-06-19 21:20:08
So it looks like real rounding is necessary.
For a general procedure that avoids clipping this seems to mean always to work in 24 bit and shifting input down 1 bit. As this reduces final bitrate according to your sample a higher precision seems to be necessary in order to get at the same final resolution.
Or is my understanding wrong?
Title: Near-lossless / lossy FLAC
Post by: halb27 on 2007-06-19 21:41:38
...
This is the one that's 6dB quieter, and doesn't clip
[attachment=3367:attachment]342kbps

This is a lossless 6dB quieter file for ABX comparison with the above
[attachment=3368:attachment]1269kbps
...
Given the aggressive processing this has received by lossy FLAC, I would really appreciate it if people could try to ABX.

Can't abx annoyingloudsong, neither the half amplitude version nor the "clipped" version.
Title: Near-lossless / lossy FLAC
Post by: Nick.C on 2007-06-20 09:04:20
Becoming more concerned about the capabilities of the iPAQ - it plays most FLAC files I've tried with pretty good reproduction, however......

I tried the 4 noise test samples but got weird harmonics running throughout each and all quite different - more worryingly, harmonics were not the same on consecutive repeats of the same file. Played them on my laptop (2GHz T2500, 1GB) and couldn't tell them apart.

Like the way the discussion is going...... Oh, just a thought - instead of bitshifting right by 1, why not divide by Root-2 then round? Would this help the clipping?

Ditto inability to ABX annoyingloudsong (Laptop / iPAQ).
Title: Near-lossless / lossy FLAC
Post by: 2Bdecided on 2007-06-20 09:45:46
Thanks for all the testing and input halb27, Nick.C, shadowking, TBeck, SebG, smok3, and for the file Porcupine. Special thanks to Josh Coalson and David Bryant.


From the testing so far, I'm thinking the default behaviour for a pre-processor should be this:

fix clipping = If file clips extensively, attenuate by 31/32 and retain at least 5 bits*
output bitdepth = input bitdepth
dither = none
threshold shift = 0dB
bit reduction method = round
frame_size = dynamic* or 1024 (lossless target format dependent)


For advanced users, other options should be available:

fix clipping: do nothing / default* / default+dither* / For 16-bit files: change to 24-bits and 50% amplitude / For 24-bit files: 50% amplitude with dither
dither: default / rectangular* / triangular* / noise shaped* (maybe!)
threshold shift = anything
frame_size = default* / fixed: 1024 / 2048 / 4096 / 8192 etc (lossless codec and sample rate dependent)

* = not tried / implemented yet.


If this was integrated properly into a lossless codec, there could be three changes/improvements:

1. I like SebG's suggestions of subtractive dither.
2. The clipping handling should be transparent to the user, efficient, and free from confusing options; e.g. internally it could be 50% amplitude 24-bit, but it should be bounced back to full amplitude 16-bit by the decoder (if the original was 16-bit).
3. The dynamic frame size should be tightly integrated / optimised with the codec.

EDIT: either implementation (pre-processor or integrated) could have a hybrid mode added, so you have a lossless correction file too. Obviously in the pre-processor version, you'd need a post-processor to stitch them back together, so this isn't very useful for listening unless support becomes widespread, but it works for "archiving", whatever people mean by that.


I'll implement some of the pre-processor behaviour described above when I get chance.

In the mean time, I'm still looking for problem samples.

Cheers,
David.
Title: Near-lossless / lossy FLAC
Post by: TBeck on 2007-06-20 13:26:44
Thanks for all the testing and input halb27, Nick.C, shadowking, TBeck, SebG, smok3, and for the file Porcupine. Special thanks to Josh Coalson and David Bryant.

Thanks for your great idea and all the effort you are putting into it!

frame_size = dynamic* or 1024 (lossless target format dependent)
...
frame_size = default* / fixed: 1024 / 2048 / 4096 / 8192 etc (lossless codec and sample rate dependent)
...
* = not tried / implemented yet.

Can you define a maximum resolution (granularity?) for dynamic frame sizes? For example: Variable frame sizes are always an integer multiple of maybe 256 or 128 samples? I assume this would make possible later TAK support for dynamic frame sizes easier.

TAK is always using fixed frame sizes, but then partitioning those frames into an appropriate number of sub frames of variable size. If one of your dynamic frames crosses a frame border (of a frame containing for instance 4096 samples), encoding will be more efficient if not only a handful of samples are falling into the next frame but at least 128 or 256 samples.

Well, i hope i could make it clear despite my bad english...

  Thomas
Title: Near-lossless / lossy FLAC
Post by: Nick.C on 2007-06-21 08:11:26
David, could you possibly link to a revised Matlab source? I am going to try to convert to Scilab and would prefer to start with the "latest" version.

Thanks,

Nick.
Title: Near-lossless / lossy FLAC
Post by: 2Bdecided on 2007-06-21 09:59:56
Nick,

This is the latest version. It doesn't include any of the algorithm refinements I discussed in my last post.

It's also missing disc buffering. It loads the whole file into memory, which limits the file length to what your PC's memory can hold, which is much less than you might expect with MATLAB.

It's also missing an adjustment to the block boundaries which I need to add.

However, it generated all the files (apart from the truncated ones) I've uploaded recently, so here it is:

[attachment=3387:attachment]

Good luck with Scilab!

Cheers,
David.
Title: Near-lossless / lossy FLAC
Post by: Nick.C on 2007-06-21 10:03:23
Many thanks - this will be a challenge to the rusty cogs in my ageing grey matter.......

[edit]
Oh, and I decided to download GNU Octave rather than Scilab.
[/edit]
Title: Near-lossless / lossy FLAC
Post by: 2Bdecided on 2007-06-21 10:45:14
Quote

frame_size = dynamic* or 1024 (lossless target format dependent)
...
frame_size = default* / fixed: 1024 / 2048 / 4096 / 8192 etc (lossless codec and sample rate dependent)
...
* = not tried / implemented yet.

Can you define a maximum resolution (granularity?) for dynamic frame sizes? For example: Variable frame sizes are always an integer multiple of maybe 256 or 128 samples? I assume this would make possible later TAK support for dynamic frame sizes easier.

TAK is always using fixed frame sizes, but then partitioning those frames into an appropriate number of sub frames of variable size. If one of your dynamic frames crosses a frame border (of a frame containing for instance 4096 samples), encoding will be more efficient if not only a handful of samples are falling into the next frame but at least 128 or 256 samples.

Well, i hope i could make it clear despite my bad english...



It's clear.


To take advantage of the most possible "wasted bits", you need a small block size, probably equivalent to half the smallest FFT length (currently that's 64/2=32 at 44.1kHz sampling). Then, if the smaller FFT size is the limiting factor at a given moment, and it allows more wasted bits for a very short time, you can take advantage of this, whereas you couldn't with a longer block size.

This only brings compression advantages if the lossless codec can take advantage of the smaller block size. Otherwise, it doesn't change the compression at all, since the lossless codec will only take advantage of the lowest number of wasted_bits across whatever block size it's working with. Also, it means you're adding more noise than you need to - you're adding noise for no benefit (though it should be inaudible).

So, there's 3 possibilities

1) pre-processor completely independent from lossless codec (static block size).
2) pre-processor separate from lossless codec, but can run lossless codec with (e.g.) -blocksize n command and check resulting filesize / bitrate (dynamic block size). Can choose best blocksize depending on file size. (clunky!)
3) "pre-processor" integrated completely into lossless codec, meaning that the optimal block size can be decided jointly between lossy and lossless parts of algorithm without clunky calling of separate code.

(1) and (3) are obvious.

(2) means you aggressively (minimum block size) pre-process a block or file of audio, and then run it through the lossless codec at a variety of block sizes. Whichever one gives the best compression is the one you use - and you go back, and re-pre-process the file to ensure you only remove the wasted bits that can be taken advantage of with that block size. In other words, you avoid the problem of adding more noise than will bring you benefit.

I don't know whether it's worth doing this - I haven't tried.


The block sizes can be anything that the lossless codec can exploit. I need to check the code is robust for block sizes which aren't related to the FFT size, and there's little benefit to using these from the pre-processor point of view, but no good reason not to if it makes the lossless codec more efficient.

Cheers,
David.
Title: Near-lossless / lossy FLAC
Post by: TBeck on 2007-06-21 11:38:24
To take advantage of the most possible "wasted bits", you need a small block size, probably equivalent to half the smallest FFT length (currently that's 64/2=32 at 44.1kHz sampling). Then, if the smaller FFT size is the limiting factor at a given moment, and it allows more wasted bits for a very short time, you can take advantage of this, whereas you couldn't with a longer block size.

Well, my question was unnecessarily complicated. Indeed it should have been as simple as "What is the minimum (theoretical) blocksize?" And the Answer is: 32. Thank you.

(2) means you aggressively (minimum block size) pre-process a block or file of audio, and then run it through the lossless codec at a variety of block sizes. Whichever one gives the best compression is the one you use - and you go back, and re-pre-process the file to ensure you only remove the wasted bits that can be taken advantage of with that block size. In other words, you avoid the problem of adding more noise than will bring you benefit.

I don't know whether it's worth doing this - I haven't tried.

I suppose that later tests will show us useful lower limits for the blocksize. Then possibly even such an exhaustive approach will be practicable.  My intuition is telling me, that blocks of 128 or even 256 samples are the minimum for current codec implementations, which are not specifically prepared for the preprocessor.

Some experiments with my own simple preprocessor provided hints for some significant interaction between bit count reduction and predictor efficiency. If many bits had been removed, the predictor lost most of it's efficiency. Depending on the predictability of the signal, the loss can be equivalent to 1 bit per sample. It's pure speculation but possibly it's sometimes even advantegous to take less bits away. But this could be most efficently evaluated with an integrated solution 3).

You have opened a very intersting field for later research...
Title: Near-lossless / lossy FLAC
Post by: 2Bdecided on 2007-06-21 14:43:44
Some experiments with my own simple preprocessor provided hints for some significant interaction between bit count reduction and predictor efficiency. If many bits had been removed, the predictor lost most of it's efficiency. Depending on the predictability of the signal, the loss can be equivalent to 1 bit per sample. It's pure speculation but possibly it's sometimes even advantegous to take less bits away.
I saw that too in some of my early testing, where I was simply removing bits without caring about the audible consequences to check compression ratios. I haven't seen it yet with the pre-processor, but then I haven't really looked. I thought it happened with "annoyingly loud sample", but when I went back to check, it hadn't. As you say, it's difficult to handle this properly unless integrated within the lossless codec. Just removing all inaudible bits with a 1024 block size seems to work well enough most of the time, though it will be interesting to see how much more efficiency you can squeeze out with a more careful method.

Cheers,
David.
Title: Near-lossless / lossy FLAC
Post by: Porcupine on 2007-06-21 22:56:34
Terrific performance of 2Bdecided's VBR pre-processor on annoyingloudsong, that's great. 1269 kbps --> 342 kbps transparent reduction is very dynamic, so the VBR pre-processor is meeting the goal of true VBR I would say. I didn't realize that this sample would introduce clipping problems (not necessarily audible in this case, but just existing), so that is interesting too I guess. I agree with everything that 2Bdecided said regarding possible solutions to the clipping. In any case, a perfect solution could be integrated perfectly within the codec itself like he said, so I don't really consider it a true problem just a nuisance.

I guess the main thing left to do is to incorporate this kind of VBR algorithm directly into the encoder(s) which might increase overall efficiency of everything (in regards to how much compression can be achieved transparently). Although the VBR is working near-perfectly as is, right now I think the filesize might be a little bit larger on everything than it needs to be, which might require the pre-processor to be incorporated into the encoder for optimal results.

I suppose the VBR algorithm itself could also be improved slightly by using spreading functions and more accurate psychoacoustics rather than just picking the lowest coefficients out of a FFT, but to me it's probably good enough as is. Doing that would only make the VBR algorithm smarter but it's already smart enough. More important I think would be to try to increase the efficiency and reduce the bitrates, without sacrificing transparency. We can know the limits of what can be achieved by comparing to things like WavPack lossy mode at bitrates that we manually check against.
Title: Near-lossless / lossy FLAC
Post by: Nick.C on 2007-06-25 13:09:55
So, I got the source to work in GNU Octave - quite pleased. Now, does it *really* take about 1 hour to process jost over 5MB of WAV file? (Core2 T2500 @ 2.0GHz, 1MB DDR2-667). Using the 41_30sec.flac converted to WAV then processed, I get a 1951kB FLAC (-8) for 5169kB of WAV - Delighted!

The best possible way forward would be a generic transcoder model but using the preprocessor to process the WAV file created from the input file prior to recoding to the output file, preserving tags, metadata, etc.

<goes looking for the relevant PASCAL fft libraries and digs out freepascal>
Title: Near-lossless / lossy FLAC
Post by: 2Bdecided on 2007-06-25 13:39:03
So, I got the source to work in GNU Octave - quite pleased. Now, does it *really* take about 1 hour to process jost over 5MB of WAV file? (Core2 T2500 @ 2.0GHz, 1MB DDR2-667).


Heck, no. It's slow, but not that slow.

It takes about 45 seconds for a ~4MB (~25 seconds) wav file on my PC (2GHz P4 windows XP).

I remember the initial ReplayGain implementation in MATLAB was painfully slow on my 300MHz P2(!), whereas the implementations in mp3gain and foobar2k are a joy to use these days!

Sp, properly coded, I don't see why lossy FLAC should be any slower than a half-decent mp3 encoder, i.e. much faster than real time. Even in MATLAB, it's not optimised at all.


Anyway, congratulations on getting it working!

Cheers,
David.
Title: Near-lossless / lossy FLAC
Post by: Nick.C on 2007-06-25 14:00:13
Another question: I take it that the lossy_variables file is specific to the WAV file being processed?

[edit] And I think that GNU Octave is using a whole lot of swapfile rather than RAM so that may be the slowdown explained immediately - will try on a machine with 2GB...... [/edit]
Title: Near-lossless / lossy FLAC
Post by: 2Bdecided on 2007-06-25 14:35:10
The lossy_variables file stores the fft noise thresholds for each bit, which will depend on the frequency limits, fft size, sample frequency etc.

If you don't change any parameters, it'll always be the same for all 44.1kHz 16-bit stereo files.


If you're hitting virtual memory, then it'll take forever. Try a smaller wavefile to see if this is the issue.


Did you need to make many/any alternations to make it run under Octave? If so, can I have a look please? Apparently it's possible to make code which works with both Octave and MATLAB.

Cheers,
David.
Title: Near-lossless / lossy FLAC
Post by: Josef Pohm on 2007-06-25 16:08:50
If you're hitting virtual memory, then it'll take forever. Try a smaller wavefile to see if this is the issue.
Did you need to make many/any alternations to make it run under Octave? If so, can I have a look please? Apparently it's possible to make code which works with both Octave and MATLAB.


In my case, using Octave:

- I had to replace backslashes with slashes;
- I couldn't get file collections to work (though I didn't try that hard), I went for processing samples one by one;
- I had a serious problem with a free version of Wavwrite I downloaded from the web which corrupts last bit when saving to 16 bit;

Apart from that it works quite impressively under Octave (great idea and promising implementation, by the way), though it's painfully slow:

- About 12 minutes for a 20 seconds sample on a Prescott 2.8ghz (around 30 times slower than realtime);
- About  5 minutes for a 20 seconds sample on a Orleans  2.2ghz (around 15 times slower than realtime);

That is carefully selecting file sizes which don't lead to fill pc actual ram memory, otherwise I would dare to say that virtual memory swapping can make the process to last really too much to be worth.
Title: Near-lossless / lossy FLAC
Post by: 2Bdecided on 2007-06-25 17:08:32
Thanks for the feedback Josef.

My own hack of wavread has the option of adding dither, but that has to be switched off for this script because it would mess up the least significant bit.

Do you have a link to a legal free version of wavread/write?


In MATLAB, there's a function called profile which lets you see which parts of a script/function are taking all the time.

In mine, it's currently the hanning() call, the result of which can be trivially stored rather than re-generated every time, so I'll change that!

Cheers,
David.
Title: Near-lossless / lossy FLAC
Post by: 2Bdecided on 2007-06-25 17:19:38
This is quicker...

(though it could be quicker still!)

Cheers,
David.
Title: Near-lossless / lossy FLAC
Post by: Nick.C on 2007-06-26 10:43:54
I've just got your latest code, and have found Octave wavread & write code. I've modified the wav handling code *not* to convert to ±1 as we just convert back again! (although the code I found does not read 24-bit WAV (yet)).

[edit] Been looking at Task Manager while Octave is running the process. There seem to be a v.large number of "I/O Other Bytes" (Read 3293097, Write 23038, Other 135,247,986 and climbing.....). Does this include read / write to the swapfile? [/edit]
Title: Near-lossless / lossy FLAC
Post by: Nick.C on 2007-06-26 22:16:18
Been playing about at home (C2D @ 3.0GHz, 2GB) and it's not really that much faster - maybe GNU Octave 2.1.73 is not really that fast. Anyway, while playing around, I got to messing about with the spreading function as follows, to see what effect weighting the middle values had.

Code: [Select]
if (choose_spread==1) || (choose_spread==2) || (choose_spread==3) || (choose_spread==4) || (choose_spread==5),
    tcs = (0.5/(choose_spread+1));
    tcl = (0.5-tcs);
    spreading_function{1}=[tcs,tcl,tcl,tcs];
    spreading_function{2}=[tcs,tcl,tcl,tcs];
  else
    spreading_function{1}=[0.250,0.250,0.250,0.250];
    spreading_function{2}=[0.250,0.250,0.250,0.250];
  end


Basically, it makes the files a little bit bigger, although not much real testing - as I said, it's s.....l.....o.....w.....
Title: Near-lossless / lossy FLAC
Post by: 2Bdecided on 2007-06-27 10:13:33
If you're using the lossy_variables file to store the noise thresholds, you might need to re-calculate for different spreading functions. On average, if they still sum to 1, it shouldn't matter.

If you do re-calculate, they don't have to sum to 1, since re-calculating will self normalise.


Proper psychoacoustic based spreading functions aren't consistent on a linear scale, since they're approximately log spaced. I don't know if they'd help or hinder.

Cheers,
David.
Title: Near-lossless / lossy FLAC
Post by: Nick.C on 2007-06-27 10:49:31
In accordance with David's wishes that any changes be shared....... 

Code: [Select]
%==========================================================
% lossyFLAC.m
%==========================================================
% David Robinson, 2007
% This code is open source. I'll pick a licence later (need advice), but:
% Any changes to this MATLAB code must be shared freely
% Any changes to the algorithm must be shared freely
% Any other implementations / interpretations must be shared freely
% You are free to include any implementation in commercial and/or closed source code, but must give an acknowledgement in help/about or similar.

% No warrantee / guarantee. This is work in progress and not debugged.

% Things to add:
% dither
% ms checking
% ignore digital silence
% work on .wav files in blocks
% look one before and after for short FFT
% Add 31/32 + retain 5 bits declipping option - done
% Lossless correction file
% optimise, inc:
%  put 20log10 outside of min()
%  buffer hanning

% Set the source path
%==========================================================
source_path='c:/data_nic/octave/wav/';

% external variables (i.e. 'switches' when implemented properly!)
%==========================================================

noise_threshold_shift=0;
% noise threshold shift. average level of added quantisation
% noise relative to the lowest amplitude frequency bin (default=0=equal!)

low_frequency_limit=20;
high_frequency_limit=16000;
% Frequency range over which to check for lowest amplitude signal

minimum_bits_to_keep=0;

choose_spread=1;

fix_clipped=0;
% 0 = do nothing;
% 1 - 8 = reduce by (2^n-1)/(2^n), stay at 16-bits, keep at least 5 bits;
% 9 = reduce by 6dB and switch to 24-bit;

flac_blocksize=1024;
% blocksize of lossless codec

% internal constants (i.e. no need to change them)
%==========================================================

noise_averages=1000;
inaudible=1E-10;    % small number to add to audio to prevent log(0)

% Get a list of all .wav files in that source path
%==========================================================
filenamelist=dir([source_path "*.wav"]);

% Loop through them all, creating lossy versions
%==========================================================
for loop=1:length(filenamelist),
    filename=filenamelist(loop).name;
   
    % load file
    [inaudio,fs,bs]=wavread_raw([filename]); % Octave puts path & filename into "filename".

    % This reduces the amplitude by (2^n-1)/(2^n) and keeps at least 5 bits
    % (1 = 1/2, 2 = 3/4, 3 = 7/8, 4 = 15/16, 5 = 31/32, 6 = 63/64, 7 = 127/128, 8 = 255/256)
    if (fix_clipped>=1) & (fix_clipped<=8),
        inaudio=inaudio.*((2^(fix_clipped)-1)/2^fix_clipped);
        minimum_bits_to_keep=max(minimum_bits_to_keep,5);
    end
    % This pads it to 23 bits: It's treated as 24 bit data at the end, effectively dropping it by 6dB - Same as fix_clipped=1 but saves last bit.
    if (fix_clipped==9),
        inaudio = inaudio.*(2^(23-bs));
        bs=23;
    end

    [samples channels]=size(inaudio);

    % create integer copy (MATLAB wav(e)read loads audio with the range +/-1) - removed, now dealing with integer values.
    % inaudio_int=inaudio.*(2^(bs-1))+inaudible;

    % Set up the FFT analysis lengths - define these in time (seconds)
    % (will then use the nearest power of two based on sampling frequency)
    clear analysis_time spreading_function fft_length low_frequency_bin high_frequency_bin reference_thresholds reference_threshold min_bin bits_to_remove bits_to_remove_table;
    analysis_time(1)=2.0E-2;  % 20ms
    analysis_time(2)=1.5E-3;  % 1.5ms
    number_of_analyses=length(analysis_time);

    % spreading function to apply to FFT before determining lowest amplitude. Keep peak at centre, even if it means padding with zeros
    if (choose_spread >= 1) & (choose_spread <= 10),
        tcs=(0.5/(choose_spread+1));
    else
        tcs=(0.25);
    end
    spreading_function{1}=[tcs,0.5-tcs,0.5-tcs,tcs];
    spreading_function{2}=[tcs,0.5-tcs,0.5-tcs,tcs];
   
    % Loop through each analysis length (typically only two) and set the FFT values
    for analysis_number=1:number_of_analyses,
        % Calculate the closest FFT length
        fft_length(analysis_number)=2^round(log10(analysis_time(analysis_number)*fs)/log10(2));

        % Generate window function
        window_function{analysis_number}=hanning(fft_length(analysis_number));
       
        % Calculate which FFT bin corresponds to the low frequency limit
        low_frequency_bin(analysis_number)=round(fft_length(analysis_number)*low_frequency_limit/fs+((length(spreading_function)-1)/2));
        if low_frequency_bin(analysis_number)<2, low_frequency_bin(analysis_number)=2; end;
        if low_frequency_bin(analysis_number)>fft_length(analysis_number)/2, error('low frequency too high'); end;

        % Calculate which FFT bin corresponds to the high frequency limit
        high_frequency_bin(analysis_number)=round(fft_length(analysis_number)*high_frequency_limit/fs+((length(spreading_function)-1)/2));
        if high_frequency_bin(analysis_number)<2, error('high frequency too low'); end;
        if high_frequency_bin(analysis_number)>fft_length(analysis_number)/2, high_frequency_bin(analysis_number)=fft_length(analysis_number)/2; end;
    end

    variables_filename=['lossy_variables__fs' num2str(fs) '_bs' num2str(bs) '_noa' num2str(number_of_analyses) '_fft' num2str(fft_length) '_lfb' num2str(low_frequency_bin) '_hfb' num2str(high_frequency_bin) '_nts' num2str(noise_threshold_shift) '_sprfunc' num2str(choose_spread) '.mat'];

    % Find out if we've stored the quantisation noise thresholds before
    if exist(variables_filename,'file'),
        load(variables_filename);
        % If not, estimate quantisation noise at each bit in these FFTs
    else
        for analysis_number=1:number_of_analyses,
            clear reference_thresholds;
            reference_thresholds(1:noise_averages,1:bs)=zeros;
            for av_number=1:noise_averages,
                noise_sample=rand(fft_length(analysis_number),1);
                for bits_to_remove=1:bs,
                    % This models the quantisation noise introduced by truncating the last "bits_to_remove" bits from the audio data:
                    this_noise_sample=floor(noise_sample.*((2^bits_to_remove)))-(2^(bits_to_remove-1));
                    fft_result=20*log10(conv(abs(fft(this_noise_sample.*window_function{analysis_number})),spreading_fun
ction{analysis_number}));
                    reference_thresholds(av_number,bits_to_remove)=mean(fft_result(low_frequency_bin(analysis_number):hi
gh_frequency_bin(analysis_number)));
                end
            end
            reference_threshold{analysis_number}=mean(reference_thresholds)-noise_threshold_shift;

            for threshold=1:round(20*log10(2^(bs+4))),
                if isempty(find(reference_threshold{analysis_number}<threshold)),
                    threshold_index{analysis_number}(threshold)=0;
                else
                    threshold_index{analysis_number}(threshold)=max(find(reference_threshold{analysis_number}<threshold));
                end;
            end;
        end;
        save('-mat',variables_filename,'threshold_index');
    end;

    % Loop through each analysis length (typically only two) finding minimum value (min_bin) in each FFT
    for analysis_number=1:number_of_analyses,
        min_bin{analysis_number}(1:floor(samples/(fft_length(analysis_number)/2))-1,1:channels)=zeros;

        % Perform spectral analysis
        for block_start=1:fft_length(analysis_number)/2:samples-fft_length(analysis_number),
            block_number=1+(block_start-1)/(fft_length(analysis_number)/2);
            % On last (partial) block, just do to end of file (better than processing beyond end of file with zero pad, because that would
            % add a hard cut-off transition on gapless files, giving an artificially high spectrum)
            if block_start<samples-fft_length(analysis_number),
                actual_block_start=block_start;
            else
                actual_block_start=samples-fft_length(analysis_number);
            end;
            for channel=1:channels,
                fft_result=conv(abs(fft(window_function{analysis_number}.*inaudio(actual_block_start:actual_block_st
art+fft_length(analysis_number)-1,channel))),spreading_function{analysis_number});
                min_bin{analysis_number}(block_number,channel)=20*log10(min(fft_result(low_frequency_bin(analysis_nu
mber):high_frequency_bin(analysis_number))));
            end;
        end;
        min_bin_length(analysis_number)=length(min_bin{analysis_number}(:,1));
    end;

    clear bits_to_remove;
    bits_to_remove(1:ceil(samples/flac_blocksize))=zeros;

    % loop through flac blocks
    for block_start=1:flac_blocksize:samples,
        block_number=1+round(block_start/flac_blocksize);

        block_end=block_start+flac_blocksize-1;
        if block_end>samples, block_end=samples; end; % Don't jump past end of file!

        for analysis_number=1:number_of_analyses,

            first_block=(block_start-1)/(fft_length(analysis_number)/2);
            last_block=first_block+(flac_blocksize/(fft_length(analysis_number)/2));
            if first_block<1, first_block=1; end; % Don't jump before start of file
            if last_block>min_bin_length(analysis_number), last_block=min_bin_length(analysis_number); end; % Don't jump past end of file!
            if last_block<first_block, first_block=last_block; end;

            for channel=1:channels,
                this_min_bin=round(min(min_bin{analysis_number}(first_block:last_block,channel)));
                if this_min_bin<1, % i.e. if it's quieter than quantisation noise at the least significant bit
                    bits_to_remove_table(analysis_number,channel)=0; % don't remove any bits!
                else
                    bits_to_remove_table(analysis_number,channel)=threshold_index{analysis_number}(this_min_bin);
                end;
            end;
        end;

        bits_to_remove(block_number)=min(min(bits_to_remove_table));

        bits_to_remove(block_number)=bs-max((bs-bits_to_remove(block_number)),minimum_bits_to_keep);

        if bits_to_remove(block_number)>0,
    twoval=(2^bits_to_remove(block_number));
            inaudio(block_start:block_end,1:channels)=round(inaudio(block_start:block_end,1:channels)/twoval).*twoval;
        end
    end

    if fix_clipped==9, bs=24; end

    wavwrite_raw([filename(1:length(filename)-4) '.ss.wav'],inaudio,fs,bs)

    % Make a .bat file to call FLAC twice for comparison: lossless and lossy
    % Note comparison might not be fair, since -b1024 itself gives better
    % compression on _some_ samples
    fid=fopen('temp.bat','w');
    dummy=fprintf(fid,'%s\n',['"C:\\Program Files\\FLAC\\flac.exe" -b' num2str(flac_blocksize) ' -f "' filename '"']);
    dummy=fprintf(fid,'%s\n',['"C:\\Program Files\\FLAC\\flac.exe" -b' num2str(flac_blocksize) ' -f "' filename(1:length(filename)-4) '.ss.wav"']);
    dummy=fclose(fid);
    % Run the .bat file
    % !temp.bat

%==========================================================
end
%==========================================================

Having fun with Octave now - it just seems to be slow on the first processing - subsequent processing is quicker 

 

Nick.
Title: Near-lossless / lossy FLAC
Post by: Josef Pohm on 2007-06-27 11:14:24
I've had a little and definitely NON-ULTIMATE comparison to verify how different codecs cope with the script.

I've chosen FLAC, WV, TAK and ALS as candidates, which are the major codecs that I know to support wasted bits detection and customizable frame size. For each codec, not always more complex modes turned out to be the most efficient ones.

OFR supports wasted bits, but there's no way to fix the frame size.
LA and APE don't even support wasted bits. Other codecs may easily be added later.

I've chosen 11 non-critical real world music samples using FFT sizes such as 2048, 1024, 512, 256, 128. Please note, that every step in FFT size reduction brings a reduction in S/N ratio for about a couple of dB (which, on the other hand, can also be seen as a better efficiency of the algorhytm).

Compressed samples bitrate varies from around 300kbps to 560kbps. Next table shows average results in kbps.

Code: [Select]
              FLAC -8   ALS -7      WV -x6     TAK -p4m
Avg - 2048      470       426        447         420
Avg - 1024      457       410        442         408
Avg - 0512      444       395        451         399
Avg - 0256      437       382        498          NO
Avg - 0128      447       383        616          NO


Results are promising, though further optimizations may be expected. At least 3:1 compression (470kbps) seems to be generally granted for all codecs on 44.1khz-16bit samples. Obviously, listenings test should clarify if this is competitive with other available options (WVLossy, DualStream).

- In general, 256 seems to be the best option as a frame size, with the exception of WV, which doesn't seem to work well with smaller frames, but offers an acceptable performance with an FFT of 1024. It may be very interesting to see if this is competitive with proprietary WV lossy mode;
- When a frame size of 256 is used, FLAC offers a slightly better performance than WV;
- TAK performances are quite impressive, better than both FLAC and WV for over a 10%, when same frame sizes are taken into account;
- ALS performances are more or less on par with TAK, slightly worse for bigger frame sizes, but scaling a little better for smaller frame sizes.
Title: Near-lossless / lossy FLAC
Post by: 2Bdecided on 2007-06-27 11:43:34
Joseph,

That's very interesting.

What was the lossless bitrate for each codec, for your sample collection?


I'm not sure the script behaves exactly as I would like it to when the frame size is reduced below 1024. More work to do! (No time now  ).

Cheers,
David.
Title: Near-lossless / lossy FLAC
Post by: Josef Pohm on 2007-06-27 14:20:07
What was the lossless bitrate for each codec, for your sample collection?


Next table shows detailed results for all samples, frame size=1024 (first column is average bitrate of the four codecs on lossless samples, second column is average bitrate of the four codecs on lossy versions of samples, third column is ratio).

Code: [Select]
       Lsl   Lsy   Lsy/Lsl
F01   1053   374   35,54%
F02    871   490   56,23%
F03    910   398   43,71%
F04    880   374   42,50%
F05    962   462   48,05%
F06    947   419   44,27%
F07    865   351   40,61%
F08    823   430   52,19%
F09    919   358   38,93%
F10    877   521   59,44%
F11    764   546   71,43%
    
AVG.   897   429   47,84%
Title: Near-lossless / lossy FLAC
Post by: bryant on 2007-06-28 02:55:18
- In general, 256 seems to be the best option as a frame size, with the exception of WV, which doesn't seem to work well with smaller frames, but offers an acceptable performance with an FFT of 1024. It may be very interesting to see if this is competitive with proprietary WV lossy mode;

Thanks for the testing! 

Yes, WavPack is not well tuned for the very small block sizes. As you found, it should be okay at 1024, but would really rather be up at 4096. If the smaller blocks turn out to be useful, however, I could make it smart enough to intelligently concatenate blocks with the same bitdepth (and skip very short ones if the savings wasn't enough) because WavPack does not require all blocks in a file to be the same length.

Another interesting note about the clipping discussion above is that WavPack should be happy with blocks that have clipped samples because it looks for any redundancies in the LSBs (not just all zeros). It would definitely be useful to leave the option in to do nothing special for that case.

However, I really believe that the most useful application of this is for FLAC because of the large installed base of hardware players. If it really turns out to be robustly transparent across samples (even artificial ones) then it could be incorporated directly into the WavPack lossy mode (including the correction file) without any format changes. Additionally, it could be made even more efficient because it would not be limited to just the quantization levels that were powers of two.
Title: Near-lossless / lossy FLAC
Post by: Nick.C on 2007-06-28 13:36:07
Right, spent a bit of time processing some of the files in 69/70, etc. and got the following:
Code: [Select]
Title                                   WAV FLAC  PP10  PP37
=============================================================
annoyingloudsong.ss                         1106   341   356
birds.ss                                     763   465   450  
E50_PERIOD_ORCHESTRAL_E_trombone_strings.ss  862   467   438
glass_short.ss                               776   635   641
jump_long.ss                                 946   481   464
S30_OTHERS_Accordion_A.ss                    709   709   722!!
S35_OTHERS_Maracas_A.ss                      679   616   539
S53_WIND_Saxophone_A.ss                      598   486   493
=============================================================
Average                                1411  805   525   513
                                       100% 57.0% 37.2% 36.4%
                                             100% 65.2% 63.7%
=============================================================
FLAC = Normal FLAC;
PP10 = PreProcessed, [0.250,0.250,0.250,0.250] spreading, no reduction by multiplication, minimum bits to keep=0;
PP37 = PreProcessed, [0.125,0.375,0.375,0.125] spreading, 127/128 reduction (0.07dB), minimum bits to keep.=0.

No audible artifacts, to my ears anyway. Currently praying to the code gods to produce an Octave compiler to make the whole thing quicker...... or, alternatively psyching myself up to port to Pascal (as it's the only compilable language I really know.....)
Title: Near-lossless / lossy FLAC
Post by: shadowking on 2007-06-28 14:10:07
I've uploaded more test samples here:

http://www.megaupload.com/?d=4O4O4JNK (http://www.megaupload.com/?d=4O4O4JNK)
Title: Near-lossless / lossy FLAC
Post by: Nick.C on 2007-06-28 14:27:10
ShadowKing, I take it that those samples are LossLess FLAC?
Title: Near-lossless / lossy FLAC
Post by: shadowking on 2007-06-28 14:31:23
ShadowKing, I take it that those samples are LossLess FLAC?


Yes.
Title: Near-lossless / lossy FLAC
Post by: Nick.C on 2007-06-28 16:23:26
ShadowKing's samples.
Code: [Select]
                                            FLAC  PP10
=======================================================
10 - Dungeon - The Birth- The Trauma Begins  919   453
A02_metamorphose                             846   507
aps_Killer_sample                            929   484
Moon_short                                   834   550
velvet                                       957   516
=======================================================
Average                               1411   897   502
                                      100%  64.6% 35.6%
                                             100% 56.0%
=======================================================
No artifacts noticable.
Title: Near-lossless / lossy FLAC
Post by: Nick.C on 2007-07-03 13:09:53
Playing about with the code, I've added a "choose_bits_to_remove" parameter - which is used as follows:

Code: [Select]
        if (choose_bits_to_remove==0),
            bits_to_remove(block_number)=min(min(bits_to_remove_table));
        else
            bits_to_remove(block_number)=floor(mean(mean(bits_to_remove_table)))+(choose_bits_to_remove-1);
        end;
        bits_to_remove(block_number)=min(bits_to_remove(block_number),bs-minimum_bits_to_keep);


To my ears (combined with minimum_bits_to_keep=5) the transparency threshold is about 3 or 4.  Setting Minimum_bits_to_keep (MBTK) to 6 improves BTR=4. The bitrate reduction is fairly significant:

Code: [Select]
Samples: 10 - Dungeon - The Birth- The Trauma Begins, 41_30sec, A02_metamorphose, 
annoyingloudsong, aps_Killer_sample, Atem_lied, ATrain, birds,
E50_PERIOD_ORCHESTRAL_E_trombone_strings, eig, glass_short, jump_long, Moon_short,
rach_original, rawhide, S13_KEYBOARD_Harpsichord_C, S30_OTHERS_Accordion_A,
S34_OTHERS_GlassHarmonica_A, S35_OTHERS_Maracas_A, S53_WIND_Saxophone_A, thewayitis,
VELVET

|=====|=========================|
| WAV | 53,763,880 (1411.2kbps) |
|FLAC | 29,767,971 ( 781.2kbps) |
|=====|=========================|========================|========================|
|     |        MBTK=5           |        MBTK=6          |        MBTK=7          |
|=====|=========================|========================|========================|
|BTR0 | 17,209,767 ( 451.7kbps) | 17,209,767 ( 451.7kbps)| 17,256,277 ( 452.9kbps)|
|BTR1 | 16,052,243 ( 421.3kbps) | 16,052,243 ( 421.3kbps)| 16,110,776 ( 422.9kbps)|
|BTR2 | 13,259,455 ( 348.0kbps) | 13,313,411 ( 394.4kbps)| 13,530,611 ( 355.2kbps)|
|BTR3 | 10,814,615 ( 283.9kbps) | 11,025,396 ( 289.4kbps)| 11,369,979 ( 298.4kbps)|
|BTR4 |  8,959,432 ( 235.1kbps) |  9,288,634 ( 243.9kbps)|  9,732,593 ( 255.5kbps)|
|=====|=========================|========================|========================|
Title: Near-lossless / lossy FLAC
Post by: shadowking on 2007-07-03 14:07:31
You should start to pickup some hiss below 300k . Sometimes turning up the volume reveals it, otherwise these encoders are artifact free.

Dungeon - baby crying  added hiss
Velvet - noise moving around beats (doom-chik-doom-chik)
Atemlied - hissing on the phone ringing part
41 secs -  cymbals 'dusty'
metmorphose - hiss on the HF bits
moon short - slight hiss
Title: Near-lossless / lossy FLAC
Post by: Nick.C on 2007-07-03 14:11:22
Which BTR were you using? MBTK=7 (or maybe 8?) may help. My "testing" is on earbuds at moderate volume - suitable for an office environment at lunch. It also replicates my most likely playback environment.
Title: Near-lossless / lossy FLAC
Post by: 2Bdecided on 2007-07-03 14:15:13
Nick,

My gut feeling (and I haven't tried it yet) is that this will introduce audible problems.

Near the start of this thread, halb27 ABXed some samples with 6dB and 12dB more noise than default. From the bitrates, it looks like you're pushing it even further than that.


I've been working to solve the problem sample I managed to manufacture. It's fixed now with rectangular or triangular dither, which I've finally implemented properly. I still think it's a waste of time for most content, but it's nice to have the option.

I'll upload when I get the chance.

Cheers,
David.
Title: Near-lossless / lossy FLAC
Post by: Nick.C on 2007-07-03 14:22:43
Good afternoon David,

In ways, I'm looking for "an acceptable bitrate / quality" balance - my DAP of choice plays FLAC and this method of bitrate reduction feels "cleaner" than moving to a full blown lossy codec. Your original concept has proven itself - how far it can be pushed whilst maintaining "acceptable" quality is another matter. I see this as an analog to the LAME -V0 .. -V9 options.

Looking forward to the revised source to chew on.....
Title: Near-lossless / lossy FLAC
Post by: 2Bdecided on 2007-07-04 10:16:19
If you want to force the bitrate lower, you can do any or all of the following (with predictable results)...


* Resample to 32kHz
-  (removes frequencies above 16kHz)

* Reduce the bitdepth (e.g. 14-bits, 12-bits) within the 16-bit file
-  (introduces fixed noise)
      (either pre-process, or force "bits_to_remove" to always be above a certain number)

* ReplayGain (or just reduce the volume) before encoding
-  (makes it quieter!)

* Use a positive noise_threshold_shift
-  (introduces variable noise)


Part of what you've done is similar to just reducing the bitdepth, but might be less predictable.

I'll post some numbers in a moment...
Title: Near-lossless / lossy FLAC
Post by: 2Bdecided on 2007-07-04 12:00:17
I grabbed all the files from the Atem_lied to thewayitis test set.

Regular flac: 728kbps
Lossy flac: 524kbps
Lossy flac nts+6dB: 457kbps

Regular flac RG: 756kbps (! didn't help, because most of these files are quiet!)
Regular flac RG 32k: 592kbps
Lossy flac RG 32k: 441kbps
Lossy flac RG 32k nts+6dB: 386kbps
Lossy flac RG 32k nts+12dB: 328kbps
Lossy flac RG 32k nts+24dB: 230kbps

I also tried annoyinglyloudsong:

Regular flac: 1252 kbps
Lossy flac: 411kbps

Regular flac RG 32kHz: 828kbps
Lossy flac RG 32kbps nts+6dB: 266kbps
Lossy flac RG 32kbps nts+12dB: 211kbps
Lossy flac RG 32kbps nts+24dB: 133kbps


I ran all these tests with triangular dither. With the caveat that the block switching might not be debugged, I've attached my latest script.


Resampling to 32kHz is normally transparent for me, but won't be for people who can hear above 16kHz.

nts+24dB sounds awful - like an FM radio with a very weak signal
nts+12dB sounds OK. The hiss is audible if you listen carefully. It's probably OK for you Nick.
nts+6dB sounds good. It's probably ABXable, but I didn't try.

Cheers,
David.
Title: Near-lossless / lossy FLAC
Post by: 2Bdecided on 2007-07-04 13:35:29
Examples for Nick.

Not transparent.
Title: Near-lossless / lossy FLAC
Post by: Nick.C on 2007-07-06 11:10:06
Looking at the analysis times (1.5ms and 20ms) then the corresponding FFT_Length for those, I was wondering why the time is not set so that no rounding of the power to which two is raised is required when determining FFT_Length?

using time=10^(log10(2)*bits-log10(fs)) yields

time (bits=6, fft_length=32) = approx. 1.451ms;
time (bits=10, fft_length=1024) = approx. 23.219ms;

and for the extra analysis:

time (bits=8, fft_length=256) = approx. 5.805ms;

Cound there be a benefit in tuning the analysis time exactly to the fft_length?
Title: Near-lossless / lossy FLAC
Post by: 2Bdecided on 2007-07-06 12:14:32
Hi Nick,

I kind of picked the times off the top of my head. They seemed like good times.

As you've seen, they're converted into numbers of samples the way they are, so you get something close to those times that's a power of 2, irrespective of sampling frequency. It could be neater (it's "closest" on a log scale, which may or may not be ideal), but I can't see any advantage to picking exact times.

There can't be any times that will convert to exact powers of 2 for 32kHz, 44.1kHz and 48kHz sampling.

If you want to avoid the log calculation, use a look up table, either to approximate the calculation, to specify sample values directly for common sample rates. However, I think there are other log calculations later in the code that you can't avoid.

Cheers,
David.


btw, do the 32kHz sampled files play OK on your porable?
Title: Near-lossless / lossy FLAC
Post by: Nick.C on 2007-07-06 12:45:35
Oops - didn't reply to the samples - NTS6 and NTS12 play fine, NTS24 is full of hiss - probably to be expected due to the noise added.

Been playing with the number of analyses and fft_lengths:

5 analyses (4,6,8,10,12 bits) and following BTR variant (btr_type=4)

Code: [Select]
        btr_sum  = sum(sum(bits_to_remove_table));
        btr_min  = min(min(bits_to_remove_table));
        btr_max  = max(max(bits_to_remove_table));
        btr_size = number_of_analyses * channels;
        
        if (btr_type==0),
            bits_to_remove(codec_block_number)=btr_min;
        else
            bits_to_remove(codec_block_number)=max(0,floor((btr_sum-btr_min-btr_max)/(btr_size-2)+(btr_type-1)/2));
        end;

        bits_to_remove(codec_block_number)=bs-max((bs-bits_to_remove(codec_block_number)),minimum_bits_to_keep);


This gave me *really* nice sounding results (got a pair of Sennheiser canal phones for my iPAQ) at 272kbps for the sample set used previously.
Title: Near-lossless / lossy FLAC
Post by: 2Bdecided on 2007-07-06 14:42:21
So let me see if I've got this right...

You're doing FFTs of sizes 2 to the power 4, 6, 8, 10, and 12.

You were taking the mean bits-to-remove across the block, and but now you're adding them together, subtracting the highest and lowest values, dividing by something which isn't quite the number of values, and also dropping an extra 1-2 bits.

I'll have to give it a listen. It can't be magic (or, I would think, universally transparent!, but maybe it hides the worst noise where it's least obvious.

For a laugh, tell me how long it takes to run your five analysis version in Octave

Cheers,
David.
Title: Near-lossless / lossy FLAC
Post by: Nick.C on 2007-07-06 14:58:02
Basically I'm calculating the mean of all the values (disregarding the highest & lowest) then adding 1.5 bits and finally rounding down.

i.e. bits_to_remove_table=[2,3,4,5,6],[3,4,5,6,6] >> (44-2-6)/(10-2) = 36/8 =  4.5 add 1.5 = 6!

Oh, analysis takes a very long time..........

but...... tried 5,7,9 & 11 with btr_type=4 (i.e. add 1.5 bits) and get 292kbps, but with less analysis time.
Title: Near-lossless / lossy FLAC
Post by: 2Bdecided on 2007-07-06 15:39:36
I see. I'm unsure as to why it's not (btr_size-2*channels).

I suspect you'll get more noise (possibly audible) for highly tonal and highly transient signals.

All else being equal, forcing an extra bit to remove is the same as using a +6dB noise threshold shift (except when bits to remove would have been zero with the former).

It should be fine for what you want it for.

Cheers,
David.
Title: Near-lossless / lossy FLAC
Post by: Nick.C on 2007-07-06 20:59:52
2*channels would remove 4 values. I only want to remove the highest and the lowest analysis value (i.e. 2), and take the mean of the rest.

To be perfectly frank, I'm trying lots of permutations and seeing how the results pan out - I have two loops set up so that it loops through number_of_analyses=2:5 and btr_type=0:5 and it already loops through the 21 samples in the format .AxBy.wav where x=number of analyses and y=btr_type - leave simmering for quite a while and you get some results to listen to.

Oh, I had to modify wavread and wavwrite to read / write integer values and modify your script to do the same as 3 copies of the audio data was causing my machine to run out of memory.......

Love the concept - like the fact that I can get good quality at 300 - 350kbps on the sample set.
Title: Near-lossless / lossy FLAC
Post by: 2Bdecided on 2007-07-06 23:30:00
Glad you're having fun with it Nick. For myself, I'd feel more comfortable with mp3 at those bitrates, but I could be convinced.

You mentioned modifying waveread and wavewrite. It sounds like a good idea. I don't have to be so careful with 4GB of RAM, but hopefully eventually I (or someone) will implement disk buffering do it's doesn't matter.

It's great that you're playing with it and finding useful ways to get good quality at lower bitrates, but there is a hard ceiling with this approach. I don't want to sound negative, but you're adding flat noise, and experience suggests this becomes audible for problem samples ~300-400kbps, and audible for many things much below this.


For the future, I'm wondering how well psychoacoustic based noise shaping would work with this. Not instead of what's there already, but as an optional alternative. You could obviously throw away more bits, but the peak level would increase (dramatically in some cases) and you must hit a point where FLAC (or whatever) finds it harder to compress.

Bryant has mentioned this before, as has SebG...

http://www.hydrogenaudio.org/forums/index....showtopic=11623 (http://www.hydrogenaudio.org/forums/index.php?showtopic=11623)

It's more complicated than what's in there at present. I might try it just for the fun(!) of it, but I'm off on holiday so it won't be for a while.

Cheers,
David.
Title: Near-lossless / lossy FLAC
Post by: Nick.C on 2007-07-07 11:41:36
If it was easy, anyone could do it......

I'm a totally unskilled amateur in audio processing - but having immense fun. Have a good holiday!

[edit]
Had a rethink on the forcing extra bits to be removed and reverted back to the simplistic mean(mean(bits_to_remove_table)) alternative - but still using 4 analyses,  fft_length =2^(5,7,9,11).

Had a look at the triangular dither and found a Gaussian variant in wikipedia and this link http://www.musicdsp.org/showone.php?id=121 (http://www.musicdsp.org/showone.php?id=121). Currently using (sum of 8 separate Rand(block_size,channels)-4)/8.

Planning to do a lot of conversion for DAP use - now to come up with a method of preserving tags.......
[/edit]
Title: Near-lossless / lossy FLAC
Post by: Nick.C on 2007-07-11 14:20:32
Back to using 2 analyses (6 & 10 bit fft_length), using gaussian dither with 32 repeats and your fix_clipped=2 - forcing bits to be removed again - the dither *seems* to mask the extra bit loss.

Anyway, I can't seem to upload the results (no webspace of my own), so I can't submit for constructive criticism.

Removing up to 2 extra bits (1/3 bit at a time) over the mean I can reduce 63.1MiB of WAV to between 20.3Mib and 11.7MiB of .ss.flac (lossless flac = 36.1MiB) for the 25 files in my sample set.
Title: Near-lossless / lossy FLAC
Post by: 2Bdecided on 2007-07-11 22:00:29
You can upload in the uploads forum here.

("Developers" can upload in normal threads - don't ask me, I found it be accident)

On its own, "forcing some bits to be removed always" just raises the noise floor a little. Most people are more than happy with 14-bits (~FM BBC Radio 3 on very good equipment kind of quality). It's a good strategy to have as an option - I'll certainly merge it into my code when I get back.

btw, even just rectangular dither solved the problem sample I created, but if you're forcing just audible noise, your dither choice would be subjective. EDIT: followed the dither link. Dubious information. IIRC Gaussian isn't proven to remove all harmonic distortion or noise modulation, where triangular is perfect in both regards. Rectangular is only perfect in the former - it can leave noise modulation (though we're adding some of that anyway!).

(Nice holiday so far, but the weather will probably be poor tomorrow).

Cheers,
David.
Title: Near-lossless / lossy FLAC
Post by: Nick.C on 2007-07-12 08:12:31
Glad the holiday's going well!

I managed to upload some files see:
http://www.hydrogenaudio.org/forums/index....showtopic=56129 (http://www.hydrogenaudio.org/forums/index.php?showtopic=56129)

- if there are any other samples anyone would wish to be processed, let me know. Samples uploaded for information basically - the bitrate is dramatically reduced in most cases.

Baically, I'm playing with dither now - the gaussian implemented easily, so worth a try at least.

I've processed a few albums now and they typically reduce to about 1/3rd of the lossless FLAC size post processing. So, from a magpie's perspective I can fit 3 times as many on my DAP!
Title: Near-lossless / lossy FLAC
Post by: 2Bdecided on 2007-07-16 13:11:54
I'm not certain that lossyFLAC will never work at the bitrates you seem to want, but in the meantime, have you heard of mp3?

Cheers,
David.
Title: Near-lossless / lossy FLAC
Post by: Nick.C on 2007-07-16 13:52:26
Mp3 - fuzzat den?  Yes, I could use Lame or aoTuV as one way of doing this, but I'm having much fun playing around with the script and it's costing me nothing.........

On reflection, I'll probably fall back to your original script after learning for myself why I wouldn't want to remove any more bits.

Still playing with dither and another possible variant on conditional fix_clipped.
Title: Near-lossless / lossy FLAC
Post by: Nick.C on 2007-07-18 10:15:15
Playing with more samples now and when no bits are "forced" to be removed the size of the sample set compressed in FLAC is almost identical to that of OGG at -q 10.

I managed to successfully implement a variable and conditional reduction to avoid clipping when rounding. However, it always produces a maximum possible amplitude of 2^(bs-1)-2^(bs-minimum_bits_to_keep)-1 i.e. 1023, 2047, 4095 from the absolute peak - for possibly 1 or two samples in the whole set. What would happen if I were to allow the occasional 32768 then reduce it to 32767 (by force)?
Title: Near-lossless / lossy FLAC
Post by: 2Bdecided on 2007-07-18 10:41:44
Playing with more samples now and when no bits are "forced" to be removed the size of the sample set compressed in FLAC is almost identical to that of OGG at -q 10.
That's interesting. It would be interesting to compare the added noise - I have no idea what OGG does at q10.
Quote
I managed to successfully implement a variable and conditional reduction to avoid clipping when rounding. However, it always produces a maximum possible amplitude of 2^(bs-1)-2^(bs-minimum_bits_to_keep)-1 i.e. 1023, 2047, 4095 from the absolute peak - for possibly 1 or two samples in the whole set. What would happen if I were to allow the occasional 32768 then reduce it to 32767 (by force)?
Nothing bad. It would just mean that, in that block only, FLAC couldn't take advantage of the wasted bits, and so would encode all 16. At least, that's my understanding. David Bryant has suggested that Wavpack might be able to handle the situation more intelligently.

What you suggest is a useful strategy (can you share it please?). I think I'd want to implement it across albums rather than tracks - both to avoid very slight but possibly audible loudness changes between tracks (4095 from peak is -1.16dB down, which is the point where it might just become audible) and any possible issues on gapless albums. Still, if you limited it to 2047 it would usually be OK on a per-track basis, and any issues on larger changes will still be relatively small.

For my use, I'd apply album ReplayGain before encoding (as long as it was negative - i.e. lower volume) so wouldn't expect to see much clipping.

Cheers,
David.
Title: Near-lossless / lossy FLAC
Post by: Nick.C on 2007-07-18 12:43:53
*&*&^%$^&^&$! Editor / user compatibility issues.........

Revised code, changing implementation of fix_clipped and hard limiting to (2^(bs-1)-2^(bits_to_remove(codec_block_number))) for each block.

Edit: Codebox removed.  Forum no like.  Code attached instead.


[edit] Synthetic Soul, you're a gentlemen and a scholar - cheers! [/edit]
Title: Near-lossless / lossy FLAC
Post by: Nick.C on 2007-07-18 19:35:25
I've figured out a way to work out the optimum fix_clipped ratio based on the actual bits to remove per codec block rather than the maximum_bitd_to_remove for the whole WAV file. Will implement and revert.

Script updated - functional blocks of code moved about a bit...... text file is in uploads, Lossy FLAC thread post.

Optimum fix_clipped ratio refined and existing methods removed. Codec_block_size changed to 576 samples as this reduced file size for sample set (lossy flac'ed) from 33.6MiB to 32.3MiB. Code tidied up a fair bit and partially commented.
Title: Near-lossless / lossy FLAC
Post by: Nick.C on 2007-07-20 21:04:33
Source modified again - see Lossy FLAC thread in Uploads.
Title: Near-lossless / lossy FLAC
Post by: stel on 2007-07-20 21:45:03
Keep up the good work gents, I'm certainly keeping an eye on this topic.
I've tried using Octave but it doesn't want to install on my PC and I'm too poor to purchase Matlab so I can't try the matlab files. I might have a go at installing Octave on my Linux box.

I have in the past done a bit of C/C++ coding so I'm trying to put something together in C++. I'm not promising anything but so far I've managed to piece together a working WAV read/write and FFT routine in one app just need to join them together. Never tried anything like this before, but its good fun trying.

Steve
Title: Near-lossless / lossy FLAC
Post by: kjoonlee on 2007-08-02 20:38:02
Oh wow. I've only discovered this thread today. Near-lossless / lossy FLAC needs a catchy name so that you can avoid questions like "is it lossy or lossless?"

I propose "Flossy".

(Hat tip to Garf for Floggy.)
Title: Near-lossless / lossy FLAC
Post by: Nick.C on 2007-08-03 10:25:39
Earlier in the thread it was discussed and, basically, the only bit being modified is the WAV file input to the FLAC file. As such, any FLAC file created from the processed WAV file is still a perfectly compatible FLAC file - not any kind of new format.

I totally agree that these processed files should be in some way differentiated from FLAC files created from lossless sources. In the script, 2Bdecided renames the WAV file from ".wav" to ".lossy.wav" to clearly mark which is which.

[edit]
ps. Calling the processor SoundSimplifier was put forward and in my variations on 2Bdecided's script, I name the processed ".wav" files ".ss.wav".
[/edit]
Title: Near-lossless / lossy FLAC
Post by: kjoonlee on 2007-08-03 16:12:30
Very well, but what I had meant was a quick shorthand for "lossless FLAC files made by compressing the lossy output of SoundSimplifier": Flossy files.
Title: Near-lossless / lossy FLAC
Post by: Nick.C on 2007-08-03 17:28:44
As a colloquial reference, flossy is suitably amusing and rolls off the tongue - but for file naming, it still needs to end in ".flac"
Title: Near-lossless / lossy FLAC
Post by: Porcus on 2007-08-13 09:15:36
The idea is simple: lossless codecs use a lot of bits coding the difference between their prediction, and the actual signal. The more complex (hence, unpredictable) the signal, the more bits this takes up. However, the more complex the signal, the more "noise like" it often is. It's seems silly spending all these bits carefully coding noise / randomness.

So, why not find the noise floor, and dump everything below it?

This isn't about psychoacoustics. What you can or can't hear doesn't come into it.


I beg to differ. Well, I understand what you want to do, but:
- the "lossy" part of "lossy compression" is really about optimizing "what to remove". There are bits that have higher "listening value" than others, and for each post-compression file size S, you want to keep the "most valuable subset of size S".
- of course it is not as simple as "find the most valuable subset of the raw PCM and compress it", it is "find the most valuable compressed subset" -- cropping and compression are (in principle) interacting. 
- your idea assumes -- possibly implicitely -- that everything buried in the noise floor is of lesser value and can be cropped.

So, the question is:
For a B-bit bitstream with the noise floor occupying the lower b bits, you have one possible lossy compression by cropping at b and applying FLAC. This gives you a file of size S -- is there no lossy compression of file size at most S, sounding at least as good to the human ear? And if there is one better, is this crop+flac procedure reasonably close?

That question is all about psychoacoustics, imho.

Of course, if a lossy encoder sets out not to discriminate too much between musical styles -- including Japanese noise music -- it might (for all that I know) be that they cannot work by detecting and removing the noise floor, and so it might be possible to improve by hand-picking recordings where you don't care about the audible noise (you might even think that filtering it off is a subjective improvement). But in principle a lossy encoder could then add a -toleratenoisefloorremoval flag (with a parameter setting the aggressiveness!) and improve on both your and their software?
Title: Near-lossless / lossy FLAC
Post by: 2Bdecided on 2007-08-13 10:27:16
So, the question is:
For a B-bit bitstream with the noise floor occupying the lower b bits, you have one possible lossy compression by cropping at b and applying FLAC. This gives you a file of size S -- is there no lossy compression of file size at most S, sounding at least as good to the human ear?
Of course. Any of the well known psychoacoustic codecs are going to give you smaller file size, more noise, and (usually) a perceptually transparent result.

Quote
That question is all about psychoacoustics, imho.
Only in this respect: if you quantise at (or below) the noise floor (actually the lowest FFT bin), is it audible?

Quote
Of course, if a lossy encoder sets out not to discriminate too much between musical styles -- including Japanese noise music -- it might (for all that I know) be that they cannot work by detecting and removing the noise floor, and so it might be possible to improve by hand-picking recordings where you don't care about the audible noise (you might even think that filtering it off is a subjective improvement). But in principle a lossy encoder could then add a -toleratenoisefloorremoval flag (with a parameter setting the aggressiveness!) and improve on both your and their software?
Quantising at the noise floor doesn't remove noise - by definition it adds noise since the signal is now even less like the orginal. There's already an option to decide how far below the noise floor to quantise.

lossyFLAC can cope perfectly well with pure noise (it keeps about 5-7 bits for pure white noise), but I'd be interested to hear some Japanese noise music - do you have a sample you could post?

Cheers,
David.
Title: Near-lossless / lossy FLAC
Post by: Porcus on 2007-08-15 10:01:57
I'd be interested to hear some Japanese noise music - do you have a sample you could post?


From Merzbow's extensive discography (http://en.wikipedia.org/wiki/Merzbow_discography):
http://www.fulldozer.ru/distribution/175 (http://www.fulldozer.ru/distribution/175) (Three WMA samples)
http://zzik.free.fr/dexpress/merzbow.mp3 (http://zzik.free.fr/dexpress/merzbow.mp3)
http://www.artificialmusicmachine.com/mp3/...d_1-excerpt.mp3 (http://www.artificialmusicmachine.com/mp3/merzbow_vs_tamarin/04-Tamarin-Untitled_1-excerpt.mp3)
Title: Near-lossless / lossy FLAC
Post by: 2Bdecided on 2007-08-15 10:28:11
Thanks. I'm guessing lossyFLAC would be fine, though I bet there would be mp3 problem samples in there somewhere.

Cheers,
David.
Title: Near-lossless / lossy FLAC
Post by: 2Bdecided on 2007-08-23 17:10:48
Nick,

Can you list the exact test set you were using when assessing file size please?

I have something new working.

Cheers,
David.
Title: Near-lossless / lossy FLAC
Post by: Nick.C on 2007-08-23 19:22:56
Files as follows:

Code: [Select]
13/07/2007  07:46         1,763,156 06_florida_seq.wav
28/06/2007  14:25         4,207,772 10 - Dungeon - The Birth- The Trauma Begins.wav
13/07/2007  07:46         2,116,844 14_Track03beginning.wav
13/07/2007  07:46         2,249,144 16_Track03entreaty.wav
13/07/2007  07:46         4,233,644 18_Track04cakewithtea.wav
12/07/2007  13:54         2,727,980 34_Gabriela_Robin___Cats_on_Mars.wav
29/06/2007  09:24         5,292,048 41_30sec.wav
28/06/2007  14:25         1,464,340 A02_metamorphose.wav
12/07/2007  13:54         1,058,444 A03_emese.wav
08/08/2007  12:35         1,344,704 Angelic.wav
27/06/2007  10:29         2,822,444 annoyingloudsong.wav
28/06/2007  14:25           886,144 aps_Killer_sample.wav
09/07/2007  16:29         2,145,060 Atem_lied.wav
09/07/2007  16:29         3,377,108 ATrain.wav
09/07/2007  16:29         4,410,076 Bachpsichord.wav
13/07/2007  07:55         4,669,484 badvilbel.wav
09/07/2007  16:29         4,320,784 BigYellow.wav
09/07/2007  16:29           717,072 birds.wav
08/08/2007  18:45         2,428,708 bruhns.wav
18/07/2007  07:49         1,487,684 cricket__insect___edit_.wav
27/06/2007  11:24         1,522,796 E50_PERIOD_ORCHESTRAL_E_trombone_strings.wav
09/07/2007  16:29         2,646,180 eig.wav
08/08/2007  18:45           797,372 Furious.wav
09/07/2007  16:29           562,952 glass_short.wav
13/07/2007  07:55         2,891,112 harp40_1.wav
13/07/2007  07:55         1,986,864 herding_calls.wav
09/07/2007  16:29         1,319,320 jump_long.wav
08/08/2007  12:07           168,104 keys_1644ds.wav
12/07/2007  13:54         1,766,396 ladidada_10s.wav
12/07/2007  13:55         1,845,132 Liebe_so_gut_es_ging.wav
28/06/2007  14:25           663,492 Moon_short.wav
12/07/2007  13:54         1,416,420 Poets_of_the_fall___Shallow.wav
09/07/2007  16:29         5,292,044 rach_original.wav
09/07/2007  16:29         3,130,908 rawhide.wav
12/07/2007  13:54         1,785,436 Rush___Hold_Your_Fire___Turn_the_Page.wav
09/07/2007  16:29         1,697,724 S13_KEYBOARD_Harpsichord_C.wav
27/06/2007  11:24           882,048 S30_OTHERS_Accordion_A.wav
09/07/2007  16:29         3,357,548 S34_OTHERS_GlassHarmonica_A.wav
27/06/2007  11:24         1,170,784 S35_OTHERS_Maracas_A.wav
27/06/2007  11:24         2,292,528 S53_WIND_Saxophone_A.wav
08/08/2007  12:35           486,196 SeriousTrouble.wav
18/07/2007  07:49         3,514,148 swarm_of_wasps__edit_.wav
09/07/2007  16:30         6,218,144 thewayitis.wav
08/08/2007  12:35         7,605,028 the_product.wav
08/08/2007  12:09           317,656 triangle.wav
08/08/2007  19:15           777,516 triangle_2_1644ds.wav
13/07/2007  07:55         1,769,512 trumpet.wav
28/06/2007  14:25         2,095,424 VELVET.wav
11/07/2007  11:33         3,707,200 wait.wav
              49 File(s)    117,408,624 bytes
Title: Near-lossless / lossy FLAC
Post by: 2Bdecided on 2007-08-24 11:13:44
Thanks.

Do you have links to...

Code: [Select]
28/06/2007  14:25         4,207,772 10 - Dungeon - The Birth- The Trauma Begins.wav
12/07/2007  13:54         2,727,980 34_Gabriela_Robin___Cats_on_Mars.wav
18/07/2007  07:49         1,487,684 cricket__insect___edit_.wav
12/07/2007  13:54         1,416,420 Poets_of_the_fall___Shallow.wav
12/07/2007  13:54         1,785,436 Rush___Hold_Your_Fire___Turn_the_Page.wav
18/07/2007  07:49         3,514,148 swarm_of_wasps__edit_.wav
08/08/2007  12:35         7,605,028 the_product.wav
08/08/2007  12:09           317,656 triangle.wav
11/07/2007  11:33         3,707,200 wait.wav


...please?

Cheers,
David.
Title: Near-lossless / lossy FLAC
Post by: Nick.C on 2007-08-24 11:45:33
Thanks.

Do you have links to...

Code: [Select]
28/06/2007  14:25         4,207,772 10 - Dungeon - The Birth- The Trauma Begins.wav
12/07/2007  13:54         2,727,980 34_Gabriela_Robin___Cats_on_Mars.wav
18/07/2007  07:49         1,487,684 cricket__insect___edit_.wav
12/07/2007  13:54         1,416,420 Poets_of_the_fall___Shallow.wav
12/07/2007  13:54         1,785,436 Rush___Hold_Your_Fire___Turn_the_Page.wav
18/07/2007  07:49         3,514,148 swarm_of_wasps__edit_.wav
08/08/2007  12:35         7,605,028 the_product.wav
08/08/2007  12:09           317,656 triangle.wav
11/07/2007  11:33         3,707,200 wait.wav


...please?

Cheers,
David.

David - at work, so no time to find links, however, check your e-mail!

Best regards,

Nick.

As an aside, to allow the (timely) calculation of extreme fft_lengths I employed the following (as the longer fft_bit_length analyses seemed to converge more quickly):

noise_averages_bits=25;

noise_averages=ceil(2^(max(0,(noise_averages_bits-fft_bit_length(analysis_number)))^0.9));
so, for fft_bit_length=17:91 iterations; 14:404; 11:1726; 8:7160 and 5:28979.
Title: Near-lossless / lossy FLAC
Post by: 2Bdecided on 2007-08-24 17:06:31
Thanks Nick.

I've implemented noise shaping, something like...

http://telecom.vub.ac.be/Research/DSSP/Pub.../AES-2002-B.pdf (http://telecom.vub.ac.be/Research/DSSP/Publications/int_conf/AES-2002-B.pdf)
http://telecom.vub.ac.be/Research/DSSP/Pub...ICASSP-2003.pdf (http://telecom.vub.ac.be/Research/DSSP/Publications/int_conf/ICASSP-2003.pdf)

All due credit to SebG - this is almost exactly what he suggested on page 1 of this thread.

I make no guarantee that it's transparent (though I've tried!) - it's been a challenge to stop it going unstable (50dB more noise then you expected isn't great for the audio quality!) but it seems to have settled down now.

I'm getting these bitrates...

wav: 111 MB (117,408,624 bytes) = 1411kbps
lossless FLAC: 64.1 MB (67,304,026 bytes) = 809kbps
lossyFLAC6: 34.0 MB (35,754,414 bytes) = 429kbps
lossyFLAC10: 27.0 MB (28,378,924 bytes) = 341kbps

32kRG:
lossyFLAC6: 32.6 MB (34,209,716 bytes) = 411kbps
lossyFLAC10: 21.6 MB (22,696,905 bytes) = 273kbps

blocksize=576 throughout (may not be optimal for v10 and 32k)

32k = PPHS in foobar2k
RG=ReplayGain-by-track+clipping-prevention-by-peak in foobar2k
- which increases the volume of lots of clips in this test set (i.e. makes it less efficient)

No clipping prevention enabled in lossyFLAC, no dither.



The code is a mess for now, and about 10 times slower than the previous version. It uses lpc.m from the MATLAB sig proc toolbox, which means implementing it without this would be some work.


There's a big problem though. I contacted Prof Werner Verhelst to ask if the approach was patented. He said it wasn't - it had been a contract for an audio equipment manufacturer, so he couldn't share the code, but there was nothing to stop me implementing it myself and he'd be interested to hear how I got on.

So far so good. But if you actually read that paper, they make it quite clear that what they're suggesting is very close (a generalised version, if you like) of Sony's Super Bit Mapping technique. I assume this is patented - US 5,204,677 may be the correct patent.

Does this cover what I'm doing, and what's in that paper? I don't know.

Under UK patent law, you can play with it all you like privately, and also perform research (including commercial or public research) on the algorithm, but not with the algorithm. That's my understanding - it may be wrong. Other countries have other patent exemptions for R&D.


So I don't know what to do. I have no desire to fall foul of any patents.

FWIW the Sony patent can't have that long left - it claims priority from a Japanese patent from 1990. That means you can probably have lossyFLAC10 in 2010!

I welcome suggestions and legal opinions.

Cheers,
David.
Title: Near-lossless / lossy FLAC
Post by: 2Bdecided on 2007-08-25 19:30:43
No lawyers on HA then?
Title: Near-lossless / lossy FLAC
Post by: Dynamic on 2007-08-25 20:36:52
No lawyers on HA then?


I was sitting this out to see whether you got any bites.

I've only got experience as a patent coordinator within my R&D group at my previous employer, and mostly on physical embodiment types of patents rather than method patents (essentially algorithms). I read the situation pretty much the same as you.

If it's not invalidated by prior art (I didn't look back at the "A" application and look for "X" and "Y" citations in the Search Report) then it would appear to cover the method you're trying to implement. I can't recall where I'd found that on previous occasions, and whether it's a publically accessible source like uspto.gov or a subscription service.

You might have scope to carry out research alone and publish source code for academic purposes (like the LAME project, which does not distribute the encoder, but publishes the source code). Anyone who then compiled that code or used it for non-research purposes might then be committing a breach of the patent in certain countries, while others in some jurisdictions might be free to share compiled code or even use it commercially. I'm not really au fait with the legality of this, but LAME seems to have been OK, and I didn't think one would be prevented from conducting private/commercial research on the algorithm or with the algorithm. The exception for research "with the algorithm" might be where you use a method not to research the method itself but as a tool for producing a commercial product as part of the production process. Proving that a company made an item using a particular method in court is rather difficult, which is why method claims weren't favoured by my previous employer. I understand that in certain fields, certain novel methods would thus remain secret (only documented internally in case they need to defend against an infringement lawsuit) rather than being publicly disclosed.

If the patent could be considered to be invalid at least for those claims applicable to your techniques, you might be able to prove it, but if you actually infringed it yourself and were sued, could you afford the lawyers to go to court and to pay Sony's lawyers if you lost. That's the kind of decision that companies have to make from time to time. OTOH, if you have meagre financial resources and don't actually impinge on Sony's business, would Sony actually sue you in the first place?

It's a tough one.

I've even heard of companies (possibly in desperate financial states or having farmed out their patent portfolio management to revenue generation firms) attempting to threaten their competitors or "offering the opportunity to license our inventions" with ungranted patent applications for which no claims were without "X" or "Y" citations indicating prior art found in the search report, and which at least some of the competitors hadn't implemented anyway and didn't look like they were about to either.

In summary, the LAME source-code and documentation only approach is worth consideration and investigation. This might also let you publish fair-use excerpts processed by the method for public listening tests in the interest of research and not personal gain, but not let you directly provide anyone with the tools that implement the methods, which would appear to be much more of a grey area.
Title: Near-lossless / lossy FLAC
Post by: 2Bdecided on 2007-08-28 11:51:09
Thanks dynamic.

Yes, I was wondering about posting samples. If it can't be made transparent at a reasonable bitrate (i.e. significantly less than the non noise shaped approach), then it's not much use anyway - so samples are essential, and clearly research on the algorithm itself.

However, there's not much incentive for anyone to test if they can't then use it themselves.

I understand what the Lame project has done to get around the IP issues, and that they have "got away with it" so far. Maybe some people using lame commercially are actually paying mp3 license fees so it makes money for the patent holders.

However, I'm not comfortable with taking that route myself.

I might just ask Sony.

Cheers,
David.
Title: Near-lossless / lossy FLAC
Post by: Nick.C on 2007-09-14 16:25:40
Latest version of delphi transcode of David's script is in the LossyFLAC thread in uploads.

It is giving a close approximation to the output from the Matlab script.
Title: Near-lossless / lossy FLAC
Post by: Nick.C on 2007-10-02 13:19:00
David,

I have been trying to find a (relatively simple) weighting which is a bit more scientific than the primitive skew I have used in lossyWAV.

I found the formulae for A, B, C & D weighting and also found tabulated values for ITU-R.468 (BBC Research Department noise weighting). Of course, I also have the tabulated values of the equal-loudness curve you use in Replay Gain.

I am wondering as to the applicability of D-Weighting (principally because I have the formula) or ITU-R.468 (as it may be simple to implement) as a substitute for the 20Hz to 3.7kHz skew currently available as an option in lossyWAV.

Best regards,

Nick.
Title: Near-lossless / lossy FLAC
Post by: 2Bdecided on 2007-10-02 15:34:20
Sorry Nick, I don't think they're appropriate for this. SebG's suggestion of testing white noise in vorbis was a good one (see page 1!).

EDIT: Those curves are the audibility, or perceived loudness, of something on it's own. Whereas the noise we're adding here is added below something else. So you need masking curves, not absolute threshold / equal loudness curves. Still, I've learnt something - I hadn't heard of D-weighting before - sounds interesting for its intended application.

Cheers,
David.
Title: Near-lossless / lossy FLAC
Post by: Nick.C on 2007-10-02 20:41:54
If you want a similar preprocessing for FLAC or WavPack you'd do something like this:
- estimate LPC filter coeffs (H(z)) and temporarily filter the block to get the residual
- check the residual's power and select "wasted_bits" accordingly
- quantize original (unfiltered) samples so that the "wasted_bits" least sigcificant bits are zero
- use 1/H(z) as noise shaping filter.
Sebastian,

This is the second time that David has pointed me in the direction of your suggestion - unfortunately, I am unable to take these concepts and convert to code as I have no idea where to start as to the algorithms that are required. If you have any second-hand code which you would be willing to share, I would gratefully receive it and attempt to implement it in the lossyWAV Delphi project.

Best regards,

Nick.
Title: Near-lossless / lossy FLAC
Post by: 2Bdecided on 2007-10-03 11:16:12
Nick,

Sorry, not that bit. I've already done that bit, but haven't released it due to concern over a Sony patent.

The part I meant was this...

If you further check what psychoacoustic models usually do you'll notice that they allocate more bits to lower frequencies than to higher frequencies (higher SNR for lower freqs) most of the time. You then can tweak the noise shaping filter to W(z)/H(z) where W(z) is some fixed weighting so that you have a higher SNR for lower freqs.
(I derived W(z) by feeding OggEnc with mono pink noise).

...where you can use that weighting for exactly what you're doing now.

So it's "just": feed noise into Ogg, subtract input from output, check noise (implies SNR) at given frequencies using, say, spectral view in Cool Edit, and simulate that rough spectral shape in your code.

Just an idea. I keep meaning to try it but have other things to do!

Cheers,
David.
Title: Near-lossless / lossy FLAC
Post by: Nick.C on 2007-10-03 11:24:09
Doh! <slaps forehead> That sounds like a plan to me.... I'll get onto it tonight.
Title: Near-lossless / lossy FLAC
Post by: Nick.C on 2007-10-04 12:27:03
Didn't get round to OGG noise analysis last night, however reading the arstechnica MP3 explanation, it struck me that there may be some merit in the following:

Instead of a spreading function where values are averaged (in the default case over 4 bins), why not take the max of (last_bin,this_bin,next_bin) values, progressively along the fft bin results.

I have made a test implementation and the difference in bits_to_remove (average) between 4 bin average and this 3 bin max seems to be small.

[edit] Well, that was my impression, but when I ran my 52 sample set at default quality, 4 bin averaging = 39.48MB, 3 bin max = 38.83MB; [/edit]

[edit2] For Guru's 150 sample set at default quality, 4 bin averaging = 89.56MB, 3 bin max = 87.99MB;

Maybe, averaging the two highest values, disregarding the minimum value would be better - I'll try it. [/edit2]

[edit3] For Guru's 150 sample set, at default quality, 2-highest-of-3-average = 90.86MB; [/edit3]

[edit4] For my 52 sample set, at default quality, 2-highest-of-3-average = 40.23MB; [/edit4]
Title: Near-lossless / lossy FLAC
Post by: Nick.C on 2007-10-08 12:39:03
Looking at the way that bit reduction / dither noise is calculated for each of the dither options, it appears that I neglected to ensure that the rounded value remained within the permissible sample limits when calculating the noise from rounding and dithering. I have re-written my noise calculation subroutine and will revise the constants used in the code to recreate the dither noise surfaces (1..32 bits x 6..15 bit fft length x 3 dither options).

On the experimental spreading function front, I am looking at a spreading function which changes from averaging at small fft lengths to simple maximum at long fft lengths as follows:

Code: [Select]
  begin
    pcll:=low_frequency_bin[analysis_number]-1;
    pchl:=high_frequency_bin[analysis_number]-1;

    for pci:=0 to pchl-pcll+1 do
    Begin
      v1:=fft_result[pci];
      v2:=fft_result[pci+1];
      v3:=fft_result[pci+2];

      vMax:=max(v1,max(v2,v3));
      vMin:=min(v1,min(v2,v3));
      vTot:=v1+v2+v3;
      vMid:=vTot-vMax-vMin;
      vAvg:=vTot/3;

      Case fft_bit_length[analysis_number] of
         0.. 6 : fft_result2[pci+1]:=(vAvg);
         7     : fft_result2[pci+1]:=(vMax*1.50+vMid+vMin*0.5)/3;
         8     : fft_result2[pci+1]:=(vMax*2.00+vMid)/3;
         9     : fft_result2[pci+1]:=(vMax*2.50+vMid*0.5)/3;
        10..15 : fft_result2[pci+1]:=(vMax);
      End;
    End;
  end;
Title: Near-lossless / lossy FLAC
Post by: SebastianG on 2007-10-08 15:21:04
Hi, Nick, David!

From what I understand you are looking for some kind of weighting to determine the wasted_bits count, right? I'm not sure whether the weighting trick I described is appropriate here since I used this filter for noise shaping. I calculated the amount of bits to use for steganography (in your case wasted_bits) solely based on the power of the linear prediction residual. Combined with the fixed non-recursive part of the noise shaper the effect was quantization noise with a more or less constant (constant over time) SNR for a specific frequency region.

To be honest I really don't understand why you guys insist on introducing white-only noise. It's like travelling from A (lossless) to B (perceptual lossy) and stopping right in the middle where both disadvantages are combined: lossy encoding (B) + high bitrate (A) necessary due to lack of noise shaping.

IMHO the best thing to do here is following Edler, Faller and Schuller: Perceptual Audio Coding Using a Time-Varying Linear Pre- and Post-Filter (http://infoscience.epfl.ch/getfile.py?docid=5729&name=edler00_AES109&format=pdf&version=1). Their psychoacoustic analysis results in a "pre-filter" and a "post-filter". The post filter acts like a noise shaper. To make it work for lossy FLAC justAbout sharing code: I'd have to locate the source code, first. It's been a while since I touched it. Exactly what are you interested in? The "complicated" part of it was the levinson durbin algorithm. I could share a Java version if you like. It's not hard to find other source code for it with the help of Google, I suppose. If you want to follow the "Edler et al type approach" you could borrow a lot of Speex code for handling the filters.

Cheers!
SG
Title: Near-lossless / lossy FLAC
Post by: 2Bdecided on 2007-10-09 12:59:47
Hi Seb,

To be honest I really don't understand why you guys insist on introducing white-only noise.


1. It works.
2. I didn't stop there. I've done a noise shaping version. See the previous page!
(It's not truly psychoacoustic though)


Quote
It's like travelling from A (lossless) to B (perceptual lossy) and stopping right in the middle where both disadvantages are combined: lossy encoding (B) + high bitrate (A) necessary due to lack of noise shaping.


I see both advantages being combined: no problem samples, little or no transcoding issues, lower bitrate than lossless.

You could probably use Vorbis at high bitrates instead, with possibly slightly more transcoding worries. Also I'm not sure you could be so confident with multi-generation coding; set the threshold correctly, and lossyFLAC seems to go many generations (e.g. 50) without issue.


You could, of course, make this a proper psychoacoustic codec, but I'd only do this for fun - what would be the practical point? You'd be forcing the underlying issues of FLAC onto a psychoacoustic codec - why would you do that? Surely it would be much better to use Vorbis or something without these issues? I don't think myself or Nick are up for designing a new psychoacoustic model(!), though I guess we could "borrow" one.


Quote
IMHO the best thing to do here is following Edler, Faller and Schuller: Perceptual Audio Coding Using a Time-Varying Linear Pre- and Post-Filter (http://infoscience.epfl.ch/getfile.py?docid=5729&name=edler00_AES109&format=pdf&version=1). Their psychoacoustic analysis results in a "pre-filter" and a "post-filter". The post filter acts like a noise shaper. To make it work for lossy FLAC just
  • skip the prefilter, we don't need it.
  • derive wasted_bits according to the first sample of the post-filter's impulse response. This first sample tells you the optimal quantizer step size.
  • use the ("normalized") post-filter as noise shaping filter. (Normalized: A noise shaping filter's impulse response must start with the coefficient '1' and has an average log response of 0 dB on a linear frequency scale.)
About sharing code: I'd have to locate the source code, first. It's been a while since I touched it. Exactly what are you interested in? The "complicated" part of it was the levinson durbin algorithm. I could share a Java version if you like. It's not hard to find other source code for it with the help of Google, I suppose. If you want to follow the "Edler et al type approach" you could borrow a lot of Speex code for handling the filters.


Thank you for this. All pointers greatfully received!

Does it have any IP attached?
What form is the "post-filter" in?

The reason for the first question is obvious! I ask the second because I know what a noise shaping filter should be like (you missed minimum phase off your list) and it's not trivial getting exactly what you want - the LPC-based method delivers filters which check all the boxes - does this one? If not, is "normalization"/conversion easy?

Cheers,
David.
Title: Near-lossless / lossy FLAC
Post by: SebastianG on 2007-10-09 16:24:48
Hi Dave,

2. I didn't stop there. I've done a noise shaping version. See the previous page!

Sorry, I wasn't aware of that.

You could, of course, make this a proper psychoacoustic codec, but I'd only do this for fun - what would be the practical point? You'd be forcing the underlying issues of FLAC onto a psychoacoustic codec - why would you do that? Surely it would be much better to use Vorbis or something without these issues?

It depends. Why were you tackling "lossy FLAC" again?
I just wanted to mention the benefits of noise shaping. Given a specific target bitrate one can maximize the lowest MNR (mask-to-noise ratio) via noise shaping. This means higher quality at the same rate. To guarantee a certain minimum MNR if only white quantization noise is introduced you have to raise the bitrate -- sometimes by a great amount.

I don't think myself or Nick are up for designing a new psychoacoustic model(!), though I guess we could "borrow" one.

Me, neither.

Thank you for this. All pointers greatfully received!

Does it have any IP attached?
What form is the "post-filter" in?

The reason for the first question is obvious! I ask the second because I know what a noise shaping filter should be like (you missed minimum phase off your list) and it's not trivial getting exactly what you want - the LPC-based method delivers filters which check all the boxes - does this one? If not, is "normalization"/conversion easy?

I don't know about the IP issue. The pre- and post filters are minimum-phase IIR filters and each other's inverse. They are "just a frequency warped" version of the LPC-based/autocorrelation method where the autocorrelation coefficients are determined by the output of the psychoacoustic model. Frequency warping is used to match the varying bandwidths of the critical bands. Regarding the missing "minimum phase" property: It may not be obvious but it follows from both properties I mentioned. If a filter's impulse response starts with the sample X and the average log response is log(x) then your filter is also a minimum phase filter. By normalizing I just meant scaling the impulse response so X=1. The difference between what Edler et al did and how it can be applied to FLAC is that the varying "post filter" does both, shaping in frequency and shaping in time whereas the noise shaping filter for FLAC can only shape in frequency and shaping in time is done by varying the wasted_bits count. To isolate these you have to extract the "gain" of the post filter which in this case is equal to the first sample. The postfiter (including gain) is supposed to represent the masking curve, so it makes sense to use it as noise shaper.

Edit: You asked about the form of the post filter:
H(z) = 1 / [1 + a1 D(z) + a2 D^2(z) + a3 D^3(z) + ... + an D^n(z) ]  (frequency warped all-pole filter)
where D(z) is a non-linear phase all-pass used as a replacement for the z^-1.
To use it as noise shaper is not more difficult than to use it as synthesis filter for linear prediction coding. However, it is a bit tricky because this form includes a delay-free loop in general. Edler et al point to another paper that describes how to resolve that.

Cheers,
SG
Title: Near-lossless / lossy FLAC
Post by: jmvalin on 2007-10-11 00:15:48
The idea is simple: lossless codecs use a lot of bits coding the difference between their prediction, and the actual signal. The more complex (hence, unpredictable) the signal, the more bits this takes up. However, the more complex the signal, the more "noise like" it often is. It's seems silly spending all these bits carefully coding noise / randomness.

So, why not find the noise floor, and dump everything below it?

This isn't about psychoacoustics. What you can or can't hear doesn't come into it. Instead, you perform a spectrum analysis of the signal, note what the lowest spectrum level is, and throw away everything below it. (If this seems a little harsh, you can throw in an offset to this calculation, e.g. -6dB to make it more careful, or +6dB to make it more aggressive!).


Sounds like you're trying to get the worse from standard lossy and lossless codecs. What you have now is a *lossy* codec that just uses a really crappy psychoacoustic model *and* is stuck with time-domain linear prediction instead of frequency transforms. BTW, the main reason why lossless codecs use time-domain linear prediction is not because it's better. It's only because that's the only sane way of getting back *exactly* what you encoded without numerical errors or having to code irrelevant information. By going lossy anyway, that advantage of LP no longer applies. I can't see any advantage of your idea compared to a lossy codec at very high rate (e.g. Vorbis q10 or something like that).
Title: Near-lossless / lossy FLAC
Post by: 2Bdecided on 2007-10-11 11:09:45
That's what I like - real encouragement!

Cheers,
David.
Title: Near-lossless / lossy FLAC
Post by: halb27 on 2007-10-11 11:37:07
Please don't feel discouraged.
I think it's okay if somebody thinks there is no use in this approach.
Pure practically minded persons won't consider using it anyway. It's a way to encode for perfectionists or near-perfectionists. And even these are free to prefer a transform codec with a high quality settings if they like to. Ask 5 perfectionists about what they prefer and you'll get (nearly) 5 different answers with possibly underlying strong emotions. BTW it's the same thing with lossless codecs where differences between many codecs are very small. And for the practically minded it's not different: everybody loves his champion though in an overall sense differences between codecs and encoders may be rather small (looking for instance at AAC, Vorbis, and MPC, but at least at high bitrate even MP3 is competetive most of the time).
Title: Near-lossless / lossy FLAC
Post by: Nick.C on 2007-10-11 11:43:51
Ran my 52 sample set through OGG aoTuv 4.51 @ 10 and lossyWAV -2 -spread. lossyWAV output is smaller when compressed to FLAC with the corresponding codec_block_size (in this case 1152 samples), 485 kbps vs 488 kbps

Using fb2k bit compare as a quick way to "see" differences, lossyWAV has fewer samples which are different to the lossless original than OGG and a smaller maximum magnitude of difference than OGG.
Title: Near-lossless / lossy FLAC
Post by: j7n on 2007-10-11 11:52:07
Using fb2k bit compare as a quick way to "see" differences, lossyWAV has fewer samples which are different to the lossless original than OGG and a smaller maximum magnitude of difference than OGG.

What happened to the strong argument that audio quality should not be "seen" and codecs not evaluated by substracting...
Title: Near-lossless / lossy FLAC
Post by: Nick.C on 2007-10-11 11:57:29
Using fb2k bit compare as a quick way to "see" differences, lossyWAV has fewer samples which are different to the lossless original than OGG and a smaller maximum magnitude of difference than OGG.
What happened to the strong argument that audio quality should not be "seen" and codecs not evaluated by substracting...
[/size]Yes, I know, sorry, I won't do it again. However, as lossyWAV only ever rounds a sample to fewer bits the sample value barely changes. Surely fewer changed samples has some merit?
Title: Near-lossless / lossy FLAC
Post by: 2Bdecided on 2007-10-11 12:26:19
Please don't feel discouraged.
I think it's okay if somebody thinks there is no use in this approach.


Thanks halb27. I wasn't discouraged though. It's there for whoever wants to use it, and I'm fully aware of the strengths and weaknesses.

FWIW there are circumstances where a real psychoacoustic model (even backed off from the assumed threshold of audibility by several dB via the use of "insane" quality settings), is still inferior to having no psychoacoustic model at all. The places should be obvious: where the psychoacoustic model is wrong, where the psychoacoustic model is crippled by the format, and where the psychoacoustic model will interact (unpredictably) with something down stream.

LossyFLAC is there for those instances, and for those people who would like to use lossless, but recognise that sometimes you're wasting 1000kbps+ on making a "perfect" copy of something that has been smashed to pieces before it reached you.


It still surprises me that LossyFLAC works as well as it does. I'm very grateful (we should all be very grateful!) to Nick for all the experimenting he's done. He probably felt deflated with positive ABX results to some of his changes, but what it showed was that lossyFLAC is hitting more or less exactly the right bitrate for the technique to work. A higher bitrate doesn't add anything, and a lower bitrate rapidly falls apart. It seems to have a very sharp "sweet spot".

I don't think for one minute all possible issues are ironed out. This low frequency thing has to be nailed properly in a way that makes some sense, so it'll fix problem samples we haven't found yet! Then there is the question of what happens with M/S (surround) decoding. It's easy to add something to prevent problems - but no one has even looked for problems here yet AFAIK. Finally, there are times when dither is necessary, but in the vast majority of times it isn't. I'm wondering if there could be a check for this? It would probably encoding down, but I'll think about it anyway.

Anyway, thank you programmers for all your hard work, and thank you Nick too for spotting some bugs and implementing genuine improvements.

Cheers,
David.


Using fb2k bit compare as a quick way to "see" differences, lossyWAV has fewer samples which are different to the lossless original than OGG and a smaller maximum magnitude of difference than OGG.
What happened to the strong argument that audio quality should not be "seen" and codecs not evaluated by substracting...
[/size]Yes, I know, sorry, I won't do it again. However, as lossyWAV only ever rounds a sample to fewer bits the sample value barely changes. Surely fewer changed samples has some merit?


You can't draw any conclusions about perceived audio quality from this, but there are obvious reasons to test and report this behaviour, e.g. to understand something (not everything) about what the algorithm is doing.

It tells you what you already know though: Ogg makes no attempt to preserve the original samples numerically, while lossyFLAC will, on average, keep the exactly original value 1 in 2^bits_removed samples. This doesn't tell you anything about what it sounds like. Neither does the maximum difference.

Cheers,
David.
Title: Near-lossless / lossy FLAC
Post by: Nick.C on 2007-10-11 23:43:09
I've got to the pre-alpha test stage of the Bark related bin averaging - I haven't managed to listen to anything yet at a high enough volume (everyone else in the house is sleeping!) but on size of output alone, this is an interesting development.

My 52 sample set: WAV: 121.5 MB; FLAC: 68.2MB; lossyWAV -2: 39.5MB; lossyWAV -2 -spread: 35.3MB.

Late now, must sleep - will listen to the samples in the morning.

[edit] Sounds promising (pardon the pun!) Will post as alpha v0.3.7 [/edit]


Development of Bark related bin averaging has stopped in favour of frequency dependent variable length spreading function.
Title: Near-lossless / lossy FLAC
Post by: SebastianG on 2007-10-21 11:53:45
Sounds like you're trying to get the worse from standard lossy and lossless codecs. What you have now is a *lossy* codec that just uses a really crappy psychoacoustic model *and* is stuck with time-domain linear prediction instead of frequency transforms. [...] I can't see any advantage of your idea compared to a lossy codec at very high rate (e.g. Vorbis q10 or something like that).


There ARE some advantages, though:LPC based methods for perceptual lossy coding can't compete with AAC/MP3 at low bitrates, on that we agree. But at higher bitrates the advantages of MP3/AAC-like methods are probably close to insignificant and outweighed by the LPC method's decoding simplicity, I suppose.

FWIW there are circumstances where a real psychoacoustic model (even backed off from the assumed threshold of audibility by several dB via the use of "insane" quality settings), is still inferior to having no psychoacoustic model at all.

I totally disagree. Having no model at all is for sure inferior to having a model that's a bit off. Also, even if you don't trust the raw output of a psy model you can still enforce some safety conditions like it's possible with MusePack (--minSMR so_and_so).

Maybe we interpret "having no/some psychoacoustic model" differently. Let's say we do 2-pass VBR to achieve some target bitrate. How can an encoder without an idea of how we perceive things perform better than an encoder who knows about psychoacoustics?

Cheers!
SG
Title: Near-lossless / lossy FLAC
Post by: 2Bdecided on 2007-10-24 17:14:58

FWIW there are circumstances where a real psychoacoustic model (even backed off from the assumed threshold of audibility by several dB via the use of "insane" quality settings), is still inferior to having no psychoacoustic model at all.

I totally disagree. Having no model at all is for sure inferior to having a model that's a bit off. Also, even if you don't trust the raw output of a psy model you can still enforce some safety conditions like it's possible with MusePack (--minSMR so_and_so).

Maybe we interpret "having no/some psychoacoustic model" differently. Let's say we do 2-pass VBR to achieve some target bitrate. How can an encoder without an idea of how we perceive things perform better than an encoder who knows about psychoacoustics?
You can't shoot for a given bitrate (CBR or VBR) with lossyFLAC. You can only shoot for a given quality. Even there, options are limited!

As for "backing off a psychoacoustic model" - well, yes, and at some point you will hit/match lossyFLAC. The idea here is to have a codec which delivers transparency, or transparency plus resilience to anything upstream/downstream. What settings should people use to get that with Vorbis or MPC? I have some ideas, but with lossyFLAC it will be -2 and -1 - that's it. If it works!

Cheers,
David.
Title: Near-lossless / lossy FLAC
Post by: SebastianG on 2007-11-05 17:32:58
(*)
Quote
FWIW there are circumstances where a real psychoacoustic model (even backed off from the assumed threshold of audibility by several dB via the use of "insane" quality settings), is still inferior to having no psychoacoustic model at all.

Let's assume it's true. How would you explain it?

You can't shoot for a given bitrate (CBR or VBR) with lossyFLAC. You can only shoot for a given quality. Even there, options are limited!

I know. I was just being hypothetical. In any case (2pass VBR with target bitrate or quality-controlled VBR) an encoder would benefit from a component that estimates the optimal distribution of distortions in the time/frequency plane. Without such a component you'll get highly varying MNRs. What good is a high mask-noise-ratio in some frequency/time region when in another time/frequency region it's too low? The goal needs to be to maximize the minimum mask-noise-ratio.

By saying (*) aren't you implying that the the benefit of a psychoacoustic model is outweighed by its uncertainty? I don't think current models are that bad.

The idea here is to have a codec which delivers transparency, or transparency plus resilience to anything upstream/downstream.

<=> high min(MNR).


Cheers!
SG