HydrogenAudio

Knowledgebase Project => Wiki Discussion => Topic started by: UED77 on 2009-01-29 00:32:26

Title: Lossless Comparison article
Post by: UED77 on 2009-01-29 00:32:26
The venerable and useful page Lossless comparison (http://wiki.hydrogenaudio.org/index.php?title=Lossless_comparison) could use a little work. Not wanting to radically overhaul anything without seeking a constructive consensus, I'm wondering what you'd like to see on this page?

A couple preliminary ideas:

The page needs to feature two primary sections: a format comparison table, and a popular setting comparison table.

The format comparison would include strictly information that's applicable to the format itself, like RIFF chunk support, tagging, and hardware support. Compression ratio would be displayed as a range of possible percentages, from the lowest to the highest.

The other table would account for typical usage scenarios, by comparing the most popular settings for all included formats. For example, a FLAC settings poll (http://www.hydrogenaudio.org/forums/index.php?showtopic=58731) in 2008 showed that 62.2% of voters use flac -8, 18.7% use flac -5, and 11.2% use flac -6, with the other settings having negligible usage; therefore this table would include flac -8 and flac -5, representing FLAC. We would need to determine (i.e. poll) the most popular settings for Wavpack, TAK, Monkey's, and OptimFROG; formats that only offer one option would obviously feature their default settings. Accompanying this table could be a graphic plot, showing encoding speed versus compression for all settings.

Furthermore, the detailed per-format pro/con listing should be removed; most of the truly relevant data would be factored into the table, or into the articles of the formats themselves.

What do you all think?
Title: Lossless Comparison article
Post by: vpa on 2009-01-29 18:55:19
I also think it's time for an overhault. The Wiki looks very pro FLAC, as it's not mentioning the bad sides of FLAC.
I think it's hard to add a compression rate, as it's depending on the music you encode. You'll need a test corpus with extreme stuff with a track that contain 5 minutes of silence, as well as a track that is white noise, and some steps between (classical, pop, folk, rock, trance, thrash metal, industrial noise, etc.).
Title: Lossless Comparison article
Post by: jcoalson on 2009-01-29 19:59:31
The Wiki looks very pro FLAC, as it's not mentioning the bad sides of FLAC.
which are...
Title: Lossless Comparison article
Post by: InspectorGadget on 2009-01-29 20:14:24
Yeah, the presence of positive coverage is not so much indicative of bias as it is indicative of how useful FLAC is 
Title: Lossless Comparison article
Post by: UED77 on 2009-01-29 21:03:50
I also think it's time for an overhault. The Wiki looks very pro FLAC, as it's not mentioning the bad sides of FLAC.
I think it's hard to add a compression rate, as it's depending on the music you encode. You'll need a test corpus with extreme stuff with a track that contain 5 minutes of silence, as well as a track that is white noise, and some steps between (classical, pop, folk, rock, trance, thrash metal, industrial noise, etc.).


I don't think the Wiki is too pro-FLAC. FLAC is the most-used lossless codec, and the table does a good job covering its strengths and weaknesses.
I disagree about the need for extreme scenarios. The focus should never be on compressing silence or white noise, rather, on typical usage scenarios. Lots of rock and pop music, with a little electronica, industrial, and classical thrown in for good measure. But definitely a large test corpus, something like what Synthetic Soul was doing with his lossless comparison. We might find it convenient to get the numbers from there, though periodic refreshing of our data wouldn't hurt.

Would everyone agree with the two-tiered structure I presented in the first post, though?
Title: Lossless Comparison article
Post by: HotshotGG on 2009-01-29 22:04:54
Quote
The venerable and useful page Lossless comparison could use a little work. Not wanting to radically overhaul anything without seeking a constructive consensus, I'm wondering what you'd like to see on this page?

A couple preliminary ideas:

The page needs to feature two primary sections: a format comparison table, and a popular setting comparison table.

The format comparison would include strictly information that's applicable to the format itself, like RIFF chunk support, tagging, and hardware support. Compression ratio would be displayed as a range of possible percentages, from the lowest to the highest.

The other table would account for typical usage scenarios, by comparing the most popular settings for all included formats. For example, a FLAC settings poll in 2008 showed that 62.2% of voters use flac -8, 18.7% use flac -5, and 11.2% use flac -6, with the other settings having negligible usage; therefore this table would include flac -8 and flac -5, representing FLAC. We would need to determine (i.e. poll) the most popular settings for Wavpack, TAK, Monkey's, and OptimFROG; formats that only offer one option would obviously feature their default settings. Accompanying this table could be a graphic plot, showing encoding speed versus compression for all settings.

Furthermore, the detailed per-format pro/con listing should be removed; most of the truly relevant data would be factored into the table, or into the articles of the formats themselves.


Normally I would say "yes" it would be fine to update the page with relevant information. Please keep in mind though (as I was a major contributor in the wiki in the past) that the page is extremely popular. Over 100,000 people have already viewed it! It's exactly like the EAC section of the wiki. I wouldn't make any major changes to it first without consulting the community. If it were a smaller page I would say "go ahead", but I just think you need to be careful with this one. Your ideas seem pretty good, however I don't feel that results should be compared based upon a "poll" that seems fairly ridiculous in my opinion. It doesn't really matter what compression level you use as lossless is lossless. It could benefit from added information, but to overhaul it and add more information would be a major PITA. That's just my opinion however.
Title: Lossless Comparison article
Post by: UED77 on 2009-01-29 23:52:45
Please keep in mind though (...) that the page is extremely popular. Over 100,000 people have already viewed it! I wouldn't make any major changes to it first without consulting the community.

Right, hence this thread. I don't just want to start making unannounced changes on the second most popular page.

Quote
(...) I don't feel that results should be compared based upon a "poll" that seems fairly ridiculous in my opinion.


What do you mean exactly? I was referring to how in the second table, the "popular settings comparison table", we should be including a specific format's most popular settings. So the second table would have entries like:

flac -5
flac -8
wavpack [default]
wavpack -hx3
tak -p2
tak -p4m
WMA Lossless
Apple Lossless

But in order for us to figure out what "entries" to use, we should create some HA polls. Or is there a better way to do it?
Title: Lossless Comparison article
Post by: Synthetic Soul on 2009-01-30 09:19:20
The focus should never be on compressing silence or white noise, rather, on typical usage scenarios. Lots of rock and pop music, with a little electronica, industrial, and classical thrown in for good measure. But definitely a large test corpus, something like what Synthetic Soul was doing with his lossless comparison.
Thanks for the mention; however, as I think you are implying, I believe that my test corpus could really do with a little more variety.

I have previously considered trying to organise a mass test of the most popular lossless codecs, which would collate compression ratios and relative speeds for a variety of music on a variety of systems.  I am increasingly aware that my comparison uses music from a narrow range of genres, and one test system.  Wouldn't it be great to have figures for 20 different systems and hundreds of files from a broad range of genres?

Disclaimer: I am not saying I can do this, or that it makes sense.
Title: Lossless Comparison article
Post by: HotshotGG on 2009-01-30 13:56:36
Quote
What do you mean exactly? I was referring to how in the second table, the "popular settings comparison table", we should be including a specific format's most popular settings. So the second table would have entries like:

flac -5
flac -8
wavpack [default]
wavpack -hx3
tak -p2
tak -p4m
WMA Lossless
Apple Lossless

But in order for us to figure out what "entries" to use, we should create some HA polls. Or is there a better way to do it?


I understand what you are saying. I just don't think that is necessary. Take FLAC for example. The codecs settings are already recommended on the FLAC page. Lossless is lossless it doesn't really matter what setting you use. By default most people are going to use a --compression-level 5. --compression-level 8 doesn't really provide you with any noticeable encoding yield. I personally use --compression-level 3 on my hard-drive, because I have a large amount of space. It really depends upon the needs of the user. In order to go out and do what you are asking of also would take months. I am sure it was hard enough just to tally the information for that table.


Quote
Disclaimer: I am not saying I can do this, or that it makes sense.


. I think you have contributed enough to the community with the great software you have written. 
Title: Lossless Comparison article
Post by: TBeck on 2009-01-30 16:20:47
First:
Quote
Disclaimer: I am not saying I can do this, or that it makes sense.

. I think you have contributed enough to the community with the great software you have written. 

Oh yes! And also with your continuous TAK testing!

Nevertheless i am impudent enough to ask you for a modification of your comparison... 

Especially because it seems to be a good timing now that you possibly may have to perform a lot of retests because of your os switch. But i am not really sure, if this already happened.

I have previously considered trying to organise a mass test of the most popular lossless codecs, which would collate compression ratios and relative speeds for a variety of music on a variety of systems.  I am increasingly aware that my comparison uses music from a narrow range of genres, and one test system.  Wouldn't it be great to have figures for 20 different systems and hundreds of files from a broad range of genres?

Disclaimer: I am not saying I can do this, or that it makes sense.

Some thoughts:

1) I would like to see a test corpus covering more genres.

From my experience with the evaluation of lossless codecs this comes down to:

a) Loud and/or dynamically compressed music.
b) Quiet and/or dynamic music.

That's the most important general factor affecting the compression results of lossless codecs.

Your test corpus falls into a) and the corpus of the FLAC site into b).

I am convinced a differentiation of these two categories would be totally sufficient if users want to choose a codec based upon their musical preferences.

If you don't want to differentiate i would recomment a ratio of maybe 0.5/0.5 or 0.7/0.3 (a/b) of the files in your test corpus.

2) How large has a test corpus to be to be quite representative?

If your test file selection isn't a very unfortunate one, i would guess 50 or 60 is enough.

It's always possible you will have one file with special rare properties that will overemphasize weaknesses or strenght (what's the plural here?) of one particular codec. For instance Joseph Pohm once sent me a such a file. It had 1 wasted bits (low bit of all samples constant) not in the left/right channel but in the difference of those channels. Codecs evaluating this absolutely rare case could easily achieve 3 percent better compression!

But with 50 test files such a misleading (because rare and not representative) single file result will only influence the mean by 3/50 = 0.06 percent.

3) System dependend tests

No need for this!

Speed differences of lossless codecs are mostly related to their general design (for instance symmetric vs. asymmetric) and algorithms.

TAK for instance can decode fast, because it is an asymmetric codec. It can encode fast, because i have found heuristics to estimate relevant properties of the audio signal instead of fully evaluatiung them. This will not change, if you choose another system with a different cpu.

And TAK is using only one set of assembly optimizations for all cpu's. From my experience this works well for any i86 cpu other than the Pentium 4.

I am convinced, your Athlon XP will generate representative results. The only advice is to stay away from the crazy P4.

4) Comparison of TAK versions

The initial goal of your comparison was to help me to see if modifications of the codec are advantegous. This was great in the YALAC-days when there often were quite large differences between two versions.

But now not really much happens regarding the codec efficieny. Therefore i would be perfectly happy with a modification of your test corpus.


That's all for now.

I really don't want to put pressure on you to modify your comparison! I am so thankful for all your help to improve TAK!

But if you are seriously thinking about an update, those are my recommendations.

  Thomas


Title: Lossless Comparison article
Post by: vpa on 2009-01-30 17:05:48
The Wiki looks very pro FLAC, as it's not mentioning the bad sides of FLAC.
which are...


Like bad compression rates especially with 24bit/192 kHz Files (http://www.hydrogenaudio.org/forums/index....=64933&st=0 (http://www.hydrogenaudio.org/forums/index.php?showtopic=64933&st=0))... I know using those files isn't a common scenario, but if you want to archive your studio recording or bootlegs that are in such high quality and have a look at the Wiki, you'll can't do any  better than using FLAC. If you would use Monkey Audio, LA, Optimfrog or most other codecs, then you would be able to save some room.
I also think that it should be mentioned that there are codecs that are compressing better also at 16 bit / 44 kHz. If you have a close look at the comparison table, then you will realise that only Shorten has a worse compression ratio. Lots of people don't want to use two libraries of their music and as DAPs / PMPs get more powerfull, the batteries last longer and the players support more than just one lossless codec, it is just misguiding people. They would be able to squeeze a few more songs on their player if they would use Monkey Audio or WavPack. On the other hand it should be noted that FLAC is very tame on the CPU usage and battery usage. Everything has it's pros and cons.
Another thing is that TAK isn't mentioned, and it should also be noted that at the moment it is only a closed source windows only codec.
Don't get me wrong, the Wiki ain't bad, but if you only read the pros and cons because you are in a hurry you easily copuld be misguided.

I agree that nobody listens to silence or white noise, but it would be helpfull to define the possible maximum and minimum compression rate of a codec. If you have just one number that suggest you'll save 30% using this codec, but if someone uses a very noisey Merzbow track and only saves about 3% - well that person would think that the Wiki isn't telling the truth. On the other hand if someone uses a very silent classical track and sees that there is 70% space saved, well - he might get the impression that there must be something wrong and that the encoding wasn't lossless...

Just my 2 cent
Title: Lossless Comparison article
Post by: TechVsLife on 2009-01-30 20:49:16
re flac:
fwiw, I ended up going with alac, because it's very similar to flac in performance/efficiency, maybe only slightly worse, but has ipod/iphone/itunes support and the same tags as m4a/aac. (however, I actually avoid itunes because I find it irritating, too far removed from the filesystem, too many layers--foobar 2000 is great, clean and direct interface).
I don't see much difference between the best three or four lossless formats, but then I don't see much difference between the lossy formats either (other than that mp3 has the most compatibility and aac is the format of the itunes store and is in its design slightly better than the older mp3 format).
I'm tempted just to use lossless only, but so much less music is available in lossless and solid-state portable players are not quite there yet for storage.

--I agree the pro/con section is redundant, and should be removed or changed to convey only what's not already in the table.
Title: Lossless Comparison article
Post by: UED77 on 2009-01-31 00:23:58
Oh dear. Nice can of worms I opened. I'll probably work on a concrete example page in my userspace, so I have something to show, rather than merely tell.

Synthetic Soul, I think your test corpus is quite all right; in my reply to an earlier post I was trying to explain why we shouldn't focus on atypical scenarios like silence or extremely [dynamic range-]compressed sources, but rather real world music over a variety of genres, which I think you did decent with.
Title: Lossless Comparison article
Post by: greynol on 2009-01-31 08:11:04
Discussion about H.264 vs. XviD moved to:
http://www.hydrogenaudio.org/forums/index....showtopic=69043 (http://www.hydrogenaudio.org/forums/index.php?showtopic=69043)
Title: Lossless Comparison article
Post by: Synthetic Soul on 2009-01-31 09:41:59
Thomas, thanks for your input.  As I am going to have to find time to test all codecs again I will certainly consider adding some quieter, more dynamic, music to my corpus.  I did some testing for FLAC when Josh was looking at different windows and used a different, smaller, corpus that I hope provided more variety.  IIRC I had compression ratios varying from 30 -70%.  I will try to build a larger corpus with a similar range.  It will take me some time though, as the "test machine" is actually the family PC that gets more and more use by my wife and three kids!

... that will overemphasize weaknesses or strenght (what's the plural here?)
"weaknesses or strengths" would be fine, although, curiously, I'd say we predominantly tend to use the phrase "strengths and weaknesses".
Title: Lossless Comparison article
Post by: TBeck on 2009-01-31 23:14:25
Thomas, thanks for your input.  As I am going to have to find time to test all codecs again I will certainly consider adding some quieter, more dynamic, music to my corpus.  I did some testing for FLAC when Josh was looking at different windows and used a different, smaller, corpus that I hope provided more variety.  IIRC I had compression ratios varying from 30 -70%.  I will try to build a larger corpus with a similar range. ngths and weaknesses".

Sounds very good! You once sent me some results from the FLAC corpus and they looked fine.

... that will overemphasize weaknesses or strenght (what's the plural here?)
"weaknesses or strengths" would be fine, although, curiously, I'd say we predominantly tend to use the phrase "strengths and weaknesses".

Well, we too... Thank you!

  Thomas
Title: Lossless Comparison article
Post by: forum neophyte on 2009-02-01 07:48:02
Off topic perhaps, but can someone advise on significance of Flac compression level?  In CDex for example under encoder configuration tab with Flac selected there are eight different compression settings from 1 thru 8.  I am particularly concerned with preserving the music files data integrity (for in the event I again have privilege of listening thru $40k in playback equipment in home).
Title: Lossless Comparison article
Post by: Synthetic Soul on 2009-02-01 10:16:53
FLAC is lossless, so no concern about quality.  The compression setting represents how hard FLAC tries to compress the audio data. -0 will be quicker, but compress less; -8 will be slowest but compress slightly more.  Take a look at my comparison (http://www.synthetic-soul.co.uk/comparison/lossless/) for some idea in the differences.  NB: Many people seems to use -8 regardless.  -5 is the default, but -6 always seems like the sweet spot to me.