Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: Near-lossless / lossy FLAC (Read 176736 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Near-lossless / lossy FLAC

Reply #175
Playing with more samples now and when no bits are "forced" to be removed the size of the sample set compressed in FLAC is almost identical to that of OGG at -q 10.

I managed to successfully implement a variable and conditional reduction to avoid clipping when rounding. However, it always produces a maximum possible amplitude of 2^(bs-1)-2^(bs-minimum_bits_to_keep)-1 i.e. 1023, 2047, 4095 from the absolute peak - for possibly 1 or two samples in the whole set. What would happen if I were to allow the occasional 32768 then reduce it to 32767 (by force)?
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)

Near-lossless / lossy FLAC

Reply #176
Playing with more samples now and when no bits are "forced" to be removed the size of the sample set compressed in FLAC is almost identical to that of OGG at -q 10.
That's interesting. It would be interesting to compare the added noise - I have no idea what OGG does at q10.
Quote
I managed to successfully implement a variable and conditional reduction to avoid clipping when rounding. However, it always produces a maximum possible amplitude of 2^(bs-1)-2^(bs-minimum_bits_to_keep)-1 i.e. 1023, 2047, 4095 from the absolute peak - for possibly 1 or two samples in the whole set. What would happen if I were to allow the occasional 32768 then reduce it to 32767 (by force)?
Nothing bad. It would just mean that, in that block only, FLAC couldn't take advantage of the wasted bits, and so would encode all 16. At least, that's my understanding. David Bryant has suggested that Wavpack might be able to handle the situation more intelligently.

What you suggest is a useful strategy (can you share it please?). I think I'd want to implement it across albums rather than tracks - both to avoid very slight but possibly audible loudness changes between tracks (4095 from peak is -1.16dB down, which is the point where it might just become audible) and any possible issues on gapless albums. Still, if you limited it to 2047 it would usually be OK on a per-track basis, and any issues on larger changes will still be relatively small.

For my use, I'd apply album ReplayGain before encoding (as long as it was negative - i.e. lower volume) so wouldn't expect to see much clipping.

Cheers,
David.

Near-lossless / lossy FLAC

Reply #177
*&*&^%$^&^&$! Editor / user compatibility issues.........

Revised code, changing implementation of fix_clipped and hard limiting to (2^(bs-1)-2^(bits_to_remove(codec_block_number))) for each block.

Edit: Codebox removed.  Forum no like.  Code attached instead.


[edit] Synthetic Soul, you're a gentlemen and a scholar - cheers! [/edit]
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)

Near-lossless / lossy FLAC

Reply #178
I've figured out a way to work out the optimum fix_clipped ratio based on the actual bits to remove per codec block rather than the maximum_bitd_to_remove for the whole WAV file. Will implement and revert.

Script updated - functional blocks of code moved about a bit...... text file is in uploads, Lossy FLAC thread post.

Optimum fix_clipped ratio refined and existing methods removed. Codec_block_size changed to 576 samples as this reduced file size for sample set (lossy flac'ed) from 33.6MiB to 32.3MiB. Code tidied up a fair bit and partially commented.
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)

Near-lossless / lossy FLAC

Reply #179
Source modified again - see Lossy FLAC thread in Uploads.
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)

Near-lossless / lossy FLAC

Reply #180
Keep up the good work gents, I'm certainly keeping an eye on this topic.
I've tried using Octave but it doesn't want to install on my PC and I'm too poor to purchase Matlab so I can't try the matlab files. I might have a go at installing Octave on my Linux box.

I have in the past done a bit of C/C++ coding so I'm trying to put something together in C++. I'm not promising anything but so far I've managed to piece together a working WAV read/write and FFT routine in one app just need to join them together. Never tried anything like this before, but its good fun trying.

Steve

Near-lossless / lossy FLAC

Reply #181
Oh wow. I've only discovered this thread today. Near-lossless / lossy FLAC needs a catchy name so that you can avoid questions like "is it lossy or lossless?"

I propose "Flossy".

(Hat tip to Garf for Floggy.)

Near-lossless / lossy FLAC

Reply #182
Earlier in the thread it was discussed and, basically, the only bit being modified is the WAV file input to the FLAC file. As such, any FLAC file created from the processed WAV file is still a perfectly compatible FLAC file - not any kind of new format.

I totally agree that these processed files should be in some way differentiated from FLAC files created from lossless sources. In the script, 2Bdecided renames the WAV file from ".wav" to ".lossy.wav" to clearly mark which is which.

[edit]
ps. Calling the processor SoundSimplifier was put forward and in my variations on 2Bdecided's script, I name the processed ".wav" files ".ss.wav".
[/edit]
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)


Near-lossless / lossy FLAC

Reply #184
As a colloquial reference, flossy is suitably amusing and rolls off the tongue - but for file naming, it still needs to end in ".flac"
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)

Near-lossless / lossy FLAC

Reply #185
The idea is simple: lossless codecs use a lot of bits coding the difference between their prediction, and the actual signal. The more complex (hence, unpredictable) the signal, the more bits this takes up. However, the more complex the signal, the more "noise like" it often is. It's seems silly spending all these bits carefully coding noise / randomness.

So, why not find the noise floor, and dump everything below it?

This isn't about psychoacoustics. What you can or can't hear doesn't come into it.


I beg to differ. Well, I understand what you want to do, but:
- the "lossy" part of "lossy compression" is really about optimizing "what to remove". There are bits that have higher "listening value" than others, and for each post-compression file size S, you want to keep the "most valuable subset of size S".
- of course it is not as simple as "find the most valuable subset of the raw PCM and compress it", it is "find the most valuable compressed subset" -- cropping and compression are (in principle) interacting. 
- your idea assumes -- possibly implicitely -- that everything buried in the noise floor is of lesser value and can be cropped.

So, the question is:
For a B-bit bitstream with the noise floor occupying the lower b bits, you have one possible lossy compression by cropping at b and applying FLAC. This gives you a file of size S -- is there no lossy compression of file size at most S, sounding at least as good to the human ear? And if there is one better, is this crop+flac procedure reasonably close?

That question is all about psychoacoustics, imho.

Of course, if a lossy encoder sets out not to discriminate too much between musical styles -- including Japanese noise music -- it might (for all that I know) be that they cannot work by detecting and removing the noise floor, and so it might be possible to improve by hand-picking recordings where you don't care about the audible noise (you might even think that filtering it off is a subjective improvement). But in principle a lossy encoder could then add a -toleratenoisefloorremoval flag (with a parameter setting the aggressiveness!) and improve on both your and their software?

Near-lossless / lossy FLAC

Reply #186
So, the question is:
For a B-bit bitstream with the noise floor occupying the lower b bits, you have one possible lossy compression by cropping at b and applying FLAC. This gives you a file of size S -- is there no lossy compression of file size at most S, sounding at least as good to the human ear?
Of course. Any of the well known psychoacoustic codecs are going to give you smaller file size, more noise, and (usually) a perceptually transparent result.

Quote
That question is all about psychoacoustics, imho.
Only in this respect: if you quantise at (or below) the noise floor (actually the lowest FFT bin), is it audible?

Quote
Of course, if a lossy encoder sets out not to discriminate too much between musical styles -- including Japanese noise music -- it might (for all that I know) be that they cannot work by detecting and removing the noise floor, and so it might be possible to improve by hand-picking recordings where you don't care about the audible noise (you might even think that filtering it off is a subjective improvement). But in principle a lossy encoder could then add a -toleratenoisefloorremoval flag (with a parameter setting the aggressiveness!) and improve on both your and their software?
Quantising at the noise floor doesn't remove noise - by definition it adds noise since the signal is now even less like the orginal. There's already an option to decide how far below the noise floor to quantise.

lossyFLAC can cope perfectly well with pure noise (it keeps about 5-7 bits for pure white noise), but I'd be interested to hear some Japanese noise music - do you have a sample you could post?

Cheers,
David.

Near-lossless / lossy FLAC

Reply #187
I'd be interested to hear some Japanese noise music - do you have a sample you could post?


From Merzbow's extensive discography:
http://www.fulldozer.ru/distribution/175 (Three WMA samples)
http://zzik.free.fr/dexpress/merzbow.mp3
http://www.artificialmusicmachine.com/mp3/...d_1-excerpt.mp3

Near-lossless / lossy FLAC

Reply #188
Thanks. I'm guessing lossyFLAC would be fine, though I bet there would be mp3 problem samples in there somewhere.

Cheers,
David.

Near-lossless / lossy FLAC

Reply #189
Nick,

Can you list the exact test set you were using when assessing file size please?

I have something new working.

Cheers,
David.

Near-lossless / lossy FLAC

Reply #190
Files as follows:

Code: [Select]
13/07/2007  07:46         1,763,156 06_florida_seq.wav
28/06/2007  14:25         4,207,772 10 - Dungeon - The Birth- The Trauma Begins.wav
13/07/2007  07:46         2,116,844 14_Track03beginning.wav
13/07/2007  07:46         2,249,144 16_Track03entreaty.wav
13/07/2007  07:46         4,233,644 18_Track04cakewithtea.wav
12/07/2007  13:54         2,727,980 34_Gabriela_Robin___Cats_on_Mars.wav
29/06/2007  09:24         5,292,048 41_30sec.wav
28/06/2007  14:25         1,464,340 A02_metamorphose.wav
12/07/2007  13:54         1,058,444 A03_emese.wav
08/08/2007  12:35         1,344,704 Angelic.wav
27/06/2007  10:29         2,822,444 annoyingloudsong.wav
28/06/2007  14:25           886,144 aps_Killer_sample.wav
09/07/2007  16:29         2,145,060 Atem_lied.wav
09/07/2007  16:29         3,377,108 ATrain.wav
09/07/2007  16:29         4,410,076 Bachpsichord.wav
13/07/2007  07:55         4,669,484 badvilbel.wav
09/07/2007  16:29         4,320,784 BigYellow.wav
09/07/2007  16:29           717,072 birds.wav
08/08/2007  18:45         2,428,708 bruhns.wav
18/07/2007  07:49         1,487,684 cricket__insect___edit_.wav
27/06/2007  11:24         1,522,796 E50_PERIOD_ORCHESTRAL_E_trombone_strings.wav
09/07/2007  16:29         2,646,180 eig.wav
08/08/2007  18:45           797,372 Furious.wav
09/07/2007  16:29           562,952 glass_short.wav
13/07/2007  07:55         2,891,112 harp40_1.wav
13/07/2007  07:55         1,986,864 herding_calls.wav
09/07/2007  16:29         1,319,320 jump_long.wav
08/08/2007  12:07           168,104 keys_1644ds.wav
12/07/2007  13:54         1,766,396 ladidada_10s.wav
12/07/2007  13:55         1,845,132 Liebe_so_gut_es_ging.wav
28/06/2007  14:25           663,492 Moon_short.wav
12/07/2007  13:54         1,416,420 Poets_of_the_fall___Shallow.wav
09/07/2007  16:29         5,292,044 rach_original.wav
09/07/2007  16:29         3,130,908 rawhide.wav
12/07/2007  13:54         1,785,436 Rush___Hold_Your_Fire___Turn_the_Page.wav
09/07/2007  16:29         1,697,724 S13_KEYBOARD_Harpsichord_C.wav
27/06/2007  11:24           882,048 S30_OTHERS_Accordion_A.wav
09/07/2007  16:29         3,357,548 S34_OTHERS_GlassHarmonica_A.wav
27/06/2007  11:24         1,170,784 S35_OTHERS_Maracas_A.wav
27/06/2007  11:24         2,292,528 S53_WIND_Saxophone_A.wav
08/08/2007  12:35           486,196 SeriousTrouble.wav
18/07/2007  07:49         3,514,148 swarm_of_wasps__edit_.wav
09/07/2007  16:30         6,218,144 thewayitis.wav
08/08/2007  12:35         7,605,028 the_product.wav
08/08/2007  12:09           317,656 triangle.wav
08/08/2007  19:15           777,516 triangle_2_1644ds.wav
13/07/2007  07:55         1,769,512 trumpet.wav
28/06/2007  14:25         2,095,424 VELVET.wav
11/07/2007  11:33         3,707,200 wait.wav
              49 File(s)    117,408,624 bytes
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)

Near-lossless / lossy FLAC

Reply #191
Thanks.

Do you have links to...

Code: [Select]
28/06/2007  14:25         4,207,772 10 - Dungeon - The Birth- The Trauma Begins.wav
12/07/2007  13:54         2,727,980 34_Gabriela_Robin___Cats_on_Mars.wav
18/07/2007  07:49         1,487,684 cricket__insect___edit_.wav
12/07/2007  13:54         1,416,420 Poets_of_the_fall___Shallow.wav
12/07/2007  13:54         1,785,436 Rush___Hold_Your_Fire___Turn_the_Page.wav
18/07/2007  07:49         3,514,148 swarm_of_wasps__edit_.wav
08/08/2007  12:35         7,605,028 the_product.wav
08/08/2007  12:09           317,656 triangle.wav
11/07/2007  11:33         3,707,200 wait.wav


...please?

Cheers,
David.

Near-lossless / lossy FLAC

Reply #192
Thanks.

Do you have links to...

Code: [Select]
28/06/2007  14:25         4,207,772 10 - Dungeon - The Birth- The Trauma Begins.wav
12/07/2007  13:54         2,727,980 34_Gabriela_Robin___Cats_on_Mars.wav
18/07/2007  07:49         1,487,684 cricket__insect___edit_.wav
12/07/2007  13:54         1,416,420 Poets_of_the_fall___Shallow.wav
12/07/2007  13:54         1,785,436 Rush___Hold_Your_Fire___Turn_the_Page.wav
18/07/2007  07:49         3,514,148 swarm_of_wasps__edit_.wav
08/08/2007  12:35         7,605,028 the_product.wav
08/08/2007  12:09           317,656 triangle.wav
11/07/2007  11:33         3,707,200 wait.wav


...please?

Cheers,
David.

David - at work, so no time to find links, however, check your e-mail!

Best regards,

Nick.

As an aside, to allow the (timely) calculation of extreme fft_lengths I employed the following (as the longer fft_bit_length analyses seemed to converge more quickly):

noise_averages_bits=25;

noise_averages=ceil(2^(max(0,(noise_averages_bits-fft_bit_length(analysis_number)))^0.9));
so, for fft_bit_length=17:91 iterations; 14:404; 11:1726; 8:7160 and 5:28979.
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)

Near-lossless / lossy FLAC

Reply #193
Thanks Nick.

I've implemented noise shaping, something like...

http://telecom.vub.ac.be/Research/DSSP/Pub.../AES-2002-B.pdf
http://telecom.vub.ac.be/Research/DSSP/Pub...ICASSP-2003.pdf

All due credit to SebG - this is almost exactly what he suggested on page 1 of this thread.

I make no guarantee that it's transparent (though I've tried!) - it's been a challenge to stop it going unstable (50dB more noise then you expected isn't great for the audio quality!) but it seems to have settled down now.

I'm getting these bitrates...

wav: 111 MB (117,408,624 bytes) = 1411kbps
lossless FLAC: 64.1 MB (67,304,026 bytes) = 809kbps
lossyFLAC6: 34.0 MB (35,754,414 bytes) = 429kbps
lossyFLAC10: 27.0 MB (28,378,924 bytes) = 341kbps

32kRG:
lossyFLAC6: 32.6 MB (34,209,716 bytes) = 411kbps
lossyFLAC10: 21.6 MB (22,696,905 bytes) = 273kbps

blocksize=576 throughout (may not be optimal for v10 and 32k)

32k = PPHS in foobar2k
RG=ReplayGain-by-track+clipping-prevention-by-peak in foobar2k
- which increases the volume of lots of clips in this test set (i.e. makes it less efficient)

No clipping prevention enabled in lossyFLAC, no dither.



The code is a mess for now, and about 10 times slower than the previous version. It uses lpc.m from the MATLAB sig proc toolbox, which means implementing it without this would be some work.


There's a big problem though. I contacted Prof Werner Verhelst to ask if the approach was patented. He said it wasn't - it had been a contract for an audio equipment manufacturer, so he couldn't share the code, but there was nothing to stop me implementing it myself and he'd be interested to hear how I got on.

So far so good. But if you actually read that paper, they make it quite clear that what they're suggesting is very close (a generalised version, if you like) of Sony's Super Bit Mapping technique. I assume this is patented - US 5,204,677 may be the correct patent.

Does this cover what I'm doing, and what's in that paper? I don't know.

Under UK patent law, you can play with it all you like privately, and also perform research (including commercial or public research) on the algorithm, but not with the algorithm. That's my understanding - it may be wrong. Other countries have other patent exemptions for R&D.


So I don't know what to do. I have no desire to fall foul of any patents.

FWIW the Sony patent can't have that long left - it claims priority from a Japanese patent from 1990. That means you can probably have lossyFLAC10 in 2010!

I welcome suggestions and legal opinions.

Cheers,
David.

Near-lossless / lossy FLAC

Reply #194
No lawyers on HA then?

Near-lossless / lossy FLAC

Reply #195
No lawyers on HA then?


I was sitting this out to see whether you got any bites.

I've only got experience as a patent coordinator within my R&D group at my previous employer, and mostly on physical embodiment types of patents rather than method patents (essentially algorithms). I read the situation pretty much the same as you.

If it's not invalidated by prior art (I didn't look back at the "A" application and look for "X" and "Y" citations in the Search Report) then it would appear to cover the method you're trying to implement. I can't recall where I'd found that on previous occasions, and whether it's a publically accessible source like uspto.gov or a subscription service.

You might have scope to carry out research alone and publish source code for academic purposes (like the LAME project, which does not distribute the encoder, but publishes the source code). Anyone who then compiled that code or used it for non-research purposes might then be committing a breach of the patent in certain countries, while others in some jurisdictions might be free to share compiled code or even use it commercially. I'm not really au fait with the legality of this, but LAME seems to have been OK, and I didn't think one would be prevented from conducting private/commercial research on the algorithm or with the algorithm. The exception for research "with the algorithm" might be where you use a method not to research the method itself but as a tool for producing a commercial product as part of the production process. Proving that a company made an item using a particular method in court is rather difficult, which is why method claims weren't favoured by my previous employer. I understand that in certain fields, certain novel methods would thus remain secret (only documented internally in case they need to defend against an infringement lawsuit) rather than being publicly disclosed.

If the patent could be considered to be invalid at least for those claims applicable to your techniques, you might be able to prove it, but if you actually infringed it yourself and were sued, could you afford the lawyers to go to court and to pay Sony's lawyers if you lost. That's the kind of decision that companies have to make from time to time. OTOH, if you have meagre financial resources and don't actually impinge on Sony's business, would Sony actually sue you in the first place?

It's a tough one.

I've even heard of companies (possibly in desperate financial states or having farmed out their patent portfolio management to revenue generation firms) attempting to threaten their competitors or "offering the opportunity to license our inventions" with ungranted patent applications for which no claims were without "X" or "Y" citations indicating prior art found in the search report, and which at least some of the competitors hadn't implemented anyway and didn't look like they were about to either.

In summary, the LAME source-code and documentation only approach is worth consideration and investigation. This might also let you publish fair-use excerpts processed by the method for public listening tests in the interest of research and not personal gain, but not let you directly provide anyone with the tools that implement the methods, which would appear to be much more of a grey area.
Dynamic – the artist formerly known as DickD

Near-lossless / lossy FLAC

Reply #196
Thanks dynamic.

Yes, I was wondering about posting samples. If it can't be made transparent at a reasonable bitrate (i.e. significantly less than the non noise shaped approach), then it's not much use anyway - so samples are essential, and clearly research on the algorithm itself.

However, there's not much incentive for anyone to test if they can't then use it themselves.

I understand what the Lame project has done to get around the IP issues, and that they have "got away with it" so far. Maybe some people using lame commercially are actually paying mp3 license fees so it makes money for the patent holders.

However, I'm not comfortable with taking that route myself.

I might just ask Sony.

Cheers,
David.

 

Near-lossless / lossy FLAC

Reply #197
Latest version of delphi transcode of David's script is in the LossyFLAC thread in uploads.

It is giving a close approximation to the output from the Matlab script.
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)

Near-lossless / lossy FLAC

Reply #198
David,

I have been trying to find a (relatively simple) weighting which is a bit more scientific than the primitive skew I have used in lossyWAV.

I found the formulae for A, B, C & D weighting and also found tabulated values for ITU-R.468 (BBC Research Department noise weighting). Of course, I also have the tabulated values of the equal-loudness curve you use in Replay Gain.

I am wondering as to the applicability of D-Weighting (principally because I have the formula) or ITU-R.468 (as it may be simple to implement) as a substitute for the 20Hz to 3.7kHz skew currently available as an option in lossyWAV.

Best regards,

Nick.
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)

Near-lossless / lossy FLAC

Reply #199
Sorry Nick, I don't think they're appropriate for this. SebG's suggestion of testing white noise in vorbis was a good one (see page 1!).

EDIT: Those curves are the audibility, or perceived loudness, of something on it's own. Whereas the noise we're adding here is added below something else. So you need masking curves, not absolute threshold / equal loudness curves. Still, I've learnt something - I hadn't heard of D-weighting before - sounds interesting for its intended application.

Cheers,
David.