Skip to main content

Topic: Multiformat Listening Test @ 64 kbps - FINISHED (Read 88194 times) previous topic - next topic

0 Members and 1 Guest are viewing this topic.
  • kdo
  • [*][*][*][*]
  • Members (Donating)
Multiformat Listening Test @ 64 kbps - FINISHED
Reply #75
All of a sudden, I have got a small question -- about the error bars on all the plots.

If we compare the plots for two different samples, the error bars are shorter for the sample with more listeners. This makes sense. (More listeners --> more representative statistics --> less error)  Ok.

But if we look at just one plot (any one of the plots), it seems the error bars of all 5 contenders have exactly the same size. Are they actually exactly the same? Is it how it's supposed to be due to the design of the test?
Are there any circumstances when error bars could have different size for different contenders?

Multiformat Listening Test @ 64 kbps - FINISHED
Reply #76
Within a sample plot, all bars should have the same size - always.

  • kdo
  • [*][*][*][*]
  • Members (Donating)
Multiformat Listening Test @ 64 kbps - FINISHED
Reply #77
Within a sample plot, all bars should have the same size - always.

Somehow this feels counter-intuitive.

Imagine an extreme case when one contender is rated 3.0 by ALL listeners (i.e. all of them give exactly the same rating), but other contender gets different ratings between 1.0 and 5.0
Why should the error bars be equal?

(I don't doubt the results, just want to understand a little deeper.)

Multiformat Listening Test @ 64 kbps - FINISHED
Reply #78
Maybe someone with more knowledge in statistics can answer your question.

  • robert
  • [*][*][*][*][*]
  • Developer
Multiformat Listening Test @ 64 kbps - FINISHED
Reply #79
Who said all bars should be equal? What do you want the bars to represent?

some boxplot example: http://www.physics.csbsju.edu/stats/box2.html

Multiformat Listening Test @ 64 kbps - FINISHED
Reply #80
In my results (and Roberto's, Guru's and ff123's), the bars for the various contenders of the same sample will have the same length.

  • naylor83
  • [*][*][*]
Multiformat Listening Test @ 64 kbps - FINISHED
Reply #81
In my results (and Roberto's, Guru's and ff123's), the bars for the various contenders of the same sample will have the same length.


If the bars are supposed to indicate the quartiles they should vary a bit. But I haven't checked what those bars are supposed to be...
davidnaylor.org

  • ff123
  • [*][*][*][*][*]
  • Developer (Donating)
Multiformat Listening Test @ 64 kbps - FINISHED
Reply #82
For this type of analysis, the error bars are all the same size.  Another way you can do the analysis is to have a different confidence range for every comparison.  So  for the 5 codecs (including the anchors), you would have 10 different numbers.  This can be represented well in matrix table format, but not nicely in a graph format.  If you want to get matrix type confidence ranges, download the bootstrap program from my site:

http://ff123.net/bootstrap/

which performs this type of analysis.  In practice, the two types of analyses yield very similar results.

  • robert
  • [*][*][*][*][*]
  • Developer
Multiformat Listening Test @ 64 kbps - FINISHED
Reply #83
So the bars do not represent the distribution of data collected for each codec, as, for example, you could have one codec rated by all people 5.0 and you'll add bars to it. I find this confusing. What is the meaning of the painted bars? How should I read them?

  • Moguta
  • [*][*][*]
Multiformat Listening Test @ 64 kbps - FINISHED
Reply #84
I would've loved to see MP3 involved in this test.  We know that Vorbis, AAC, and WMA are better, but just as a comparison it's always interesting to see how the newer, improved codecs rate nowadays against our friendly ol' MP3 fomat, to know exactly how much of an improvement there is.

  • ff123
  • [*][*][*][*][*]
  • Developer (Donating)
Multiformat Listening Test @ 64 kbps - FINISHED
Reply #85
So the bars do not represent the distribution of data collected for each codec, as, for example, you could have one codec rated by all people 5.0 and you'll add bars to it. I find this confusing. What is the meaning of the painted bars? How should I read them?


If the bottom of the bar of one codec does not touch the top of the bar of another codec, you can state with at least 95% confidence that the first codec is better than the second one.

The bars being all the same size means that you might lose a bit of power in making statistical distinctions between codecs.  But I think that's more than balanced by having the nice, easy-to-look at pictures instead of tables of numbers.

There are some who assert (and they have a point) that even if there are statistical differences between codecs, it may not make a practical difference if the ratings are relatively close to each other (close being determined by looking at the pictures and making a judgment).

  • muaddib
  • [*][*][*][*]
  • Developer
Multiformat Listening Test @ 64 kbps - FINISHED
Reply #86
It seems that Itunes at 96 VBR has outscored Itunes 128 CBR from the previous multi-format test.
That's a substantial improvement unless the difficulty of the samples is not comparable.
Different samples, different participants. Just look at how personal results posted here differ from the average.
Results from different listening tests are just not easily comparable.

Sorry for bringing this up again, but I have one more note about this. iTunes 96kbps VBR was used in this test at 64kbps and in previous at 48kbps. Some samples were used in both tests. But score for those sample is not the same (example: Toms Diner 4.70 vs 4.86) and the decoded sample is the same. Even a participant involved in both tests didn't give the same rating (examples: Alex B 4.0 vs 4.2, haregoo 5.0 vs 4.5).
Unfortunately it is not possible to get consistent results

Multiformat Listening Test @ 64 kbps - FINISHED
Reply #87
Yes, this is normal and depends on the mood, the listening-conditions (maybe different headphones or soundcard, possible noise from the neighbors, etc.) and health (maybe the listener just got better from a cold or still has a cold while testing).

Multiformat Listening Test @ 64 kbps - FINISHED
Reply #88
Say, are there any plans for doing a new test here?

As I mentioned elsewhere, it's not a good apples-to-apples to compare CBR WMA to qulaity VBR in other codecs. The WMA family supports quality VBR, as well as 2-pass CBR and bitrate VBR modes.

And for a streaming test, CBR is really the appropriate encoding mode. While fixed quality is an interesting thing to look at, it excludes rate control, which is a very important part of codec design, and a place where a lot of engineering effort goes.

  • benski
  • [*][*][*][*][*]
  • Developer
Multiformat Listening Test @ 64 kbps - FINISHED
Reply #89
Say, are there any plans for doing a new test here?

As I mentioned elsewhere, it's not a good apples-to-apples to compare CBR WMA to qulaity VBR in other codecs. The WMA family supports quality VBR, as well as 2-pass CBR and bitrate VBR modes.

And for a streaming test, CBR is really the appropriate encoding mode. While fixed quality is an interesting thing to look at, it excludes rate control, which is a very important part of codec design, and a place where a lot of engineering effort goes.


I would agree here.  Streaming is the main use so far for 64kbps.  Low bitrates are interesting for portable devices, but the CPU usage (and hence battery life) of the winners of this test (HE-AAC and WMA Pro) leaves a lot to be desired.
  • Last Edit: 10 April, 2008, 06:30:07 PM by benski

Multiformat Listening Test @ 64 kbps - FINISHED
Reply #90

Say, are there any plans for doing a new test here?

As I mentioned elsewhere, it's not a good apples-to-apples to compare CBR WMA to qulaity VBR in other codecs. The WMA family supports quality VBR, as well as 2-pass CBR and bitrate VBR modes.

And for a streaming test, CBR is really the appropriate encoding mode. While fixed quality is an interesting thing to look at, it excludes rate control, which is a very important part of codec design, and a place where a lot of engineering effort goes.


I would agree here.  Streaming is the main use so far for 64kbps.  Low bitrates are interesting for portable devices, but the CPU usage (and hence battery life) of the winners of this test (HE-AAC and WMA Pro) leaves a lot to be desired.

How are you measuring CPU use/battery drain of the codecs? We've done a ton of work for the mobile implementations of WMA Pro to get the CPU hit low enough to make it feasible for phone use. I haven't done any formal testing with recent devices though.

Multiformat Listening Test @ 64 kbps - FINISHED
Reply #91
The reason why WMA was tested in CBR mode is that Microsoft seems to recommend CBR over VBR for WMA. Also, IIRC, VBR produced target bitrates that deviated from the average bitrate of the other encoders by more than 10%. 2-pass modes for short samples are also not an option - using 2-pass must be done on complete tracks and then samples have to be extracted out of the encoded full tracks.

A pure CBR test could be interesting for streaming indeed.

Multiformat Listening Test @ 64 kbps - FINISHED
Reply #92
The reason why WMA was tested in CBR mode is that Microsoft seems to recommend CBR over VBR for WMA.

Do we? Do you have a link - I'd like to have that corrected. Speaking for Microsoft, I recommend that content that needs CBR be encoded as 2-pass CBR, and otherwise 2-pass VBR be used. We've done a lot of work around 2-pass audio encoding.

Quote
Also, IIRC, VBR produced target bitrates that deviated from the average bitrate of the other encoders by more than 10%. 2-pass modes for short samples are also not an option - using 2-pass must be done on complete tracks and then samples have to be extracted out of the encoded full tracks.

Hmmm. How short are the clips you're using? If you can give me a reproducible test for this, I'll pass it on to our engineers. In my experience, VBR audio comes out within 1% of the target, but I'm normally encoding at least 60 second clips.

2-pass VBR peak limited might work better in this case. But if you need to use CBR, at least use 2-pass.

Quote
A pure CBR test could be interesting for streaming indeed.

Great, I'd love to see that as well.

For the WMA codecs, the proper mode to use for that (unless it's a test of live encoders) would be 2-pass CBR. We are able to get a meaningful reduction in peak QP with 2-pass CBR.

Multiformat Listening Test @ 64 kbps - FINISHED
Reply #93
The test performed by NSTL featured WMA in CBR mode. Since you explicitly instructed NSTL what settings to use, one would assume you had a reason why you did this: obtain best quality results.

If that is not the case, well, sorry. IIRC, WMA did not offer a quality based VBR mode that produced files with the target bitrate.

Could you explain me what multi-pass CBR is supposed to do? I thought multi-pass encoding was good for ABR only. For CBR you always assign the same number of bits (don't know if WMA has something like a bit reservoir -in case it does, I imagine that could be the only variable thing that could be influenced by multi-pass encoding).
As for bitrate based VBR (which I call ABR) I would prefer to encode full tracks and then extract the sample from that. Otherwise the test has no or less usage.

Multiformat Listening Test @ 64 kbps - FINISHED
Reply #94
The test performed by NSTL featured WMA in CBR mode. Since you explicitly instructed NSTL what settings to use, one would assume you had a reason why you did this: obtain best quality results.
That test was done before my time, but my understanding is that we used 1-pass CBR in that case as that was the only rate-controlled mode supported by HE AAC, and the goal was to have an apples-to-apples test. It was never meant to be a demonstration of best practices. 1-pass CBR is certainly the most challenging codec mode, so it's interesting to test, but nothing I use other than for live encoding.

Quote
If that is not the case, well, sorry. IIRC, WMA did not offer a quality based VBR mode that produced files with the target bitrate.
Understood. I just want to help make future tests a more scenario-relevant comparison.

Quote
Could you explain me what multi-pass CBR is supposed to do? I thought multi-pass encoding was good for ABR only. For CBR you always assign the same number of bits (don't know if WMA has something like a bit reservoir -in case it does, I imagine that could be the only variable thing that could be influenced by multi-pass encoding).
Correct. with 2-pass CBR, you're able to essentially request a bigger bit reservoir in advance of complex audio, to keep worst-case QP lower. With 2-pass VBR, we essentially calculate the QP that will produce closest to the optimum bitrate, and then vary QP's per block a little in order to hit the target. But in essence an unconstrained 2-pass VBR is a lot like a "magic" way to figure out what quality level to use to give a file of the requested size.

Quote
As for bitrate based VBR (which I call ABR) I would prefer to encode full tracks and then extract the sample from that. Otherwise the test has no or less usage.
Makes sense to me.

Moderation: Fixed quotes.
  • Last Edit: 11 April, 2008, 06:35:47 PM by greynol

  • hellokeith
  • [*][*][*][*]
Multiformat Listening Test @ 64 kbps - FINISHED
Reply #95
Do we? Do you have a link - I'd like to have that corrected. Speaking for Microsoft, I recommend that content that needs CBR be encoded as 2-pass CBR, and otherwise 2-pass VBR be used. We've done a lot of work around 2-pass audio encoding.


Hi Ben,

Nice to see you here at HA.  I think you'll find this place somewhat subdued compared to AVSF..

Interesting you speak of 2-pass VBR WMA.  I have been using -a_codec WMA9STD -a_mode 3 -a_setting 128_44_2 for more than a year with excellent results on my portable.  I think perhaps it is underrated/underused in the lossless community, though it wasn't trivial to get the VBS command line options all sorted out.  The reason I ended up with ~128kb 2-pass VBR WMA was that during my testing, I found it maintained the best stereo imaging during intricate percussion/cymbal passages.

  • IgorC
  • [*][*][*][*][*]
Multiformat Listening Test @ 64 kbps - FINISHED
Reply #96
I tried 1 and 2 pass CBR wma10 at 64 kbit/s in past. I didn't share the results here. There were miscellaneous changes but I couldn't abxed the difference.
So maybe 2 pass has a bigger reservoir and other kind of grass called "magic" it makes no sense for audio CBR encoding. If anyone doesn't agree provide samples where 2 pass CBR is better than 1 pass for wma10.
  • Last Edit: 12 April, 2008, 07:56:17 AM by IgorC

Multiformat Listening Test @ 64 kbps - FINISHED
Reply #97
Nice to see you here at HA.  I think you'll find this place somewhat subdued compared to AVSF..

Thank goodness !

Quote
Interesting you speak of 2-pass VBR WMA.  I have been using -a_codec WMA9STD -a_mode 3 -a_setting 128_44_2 for more than a year with excellent results on my portable.  I think perhaps it is underrated/underused in the lossless community, though it wasn't trivial to get the VBS command line options all sorted out.  The reason I ended up with ~128kb 2-pass VBR WMA was that during my testing, I found it maintained the best stereo imaging during intricate percussion/cymbal passages.

Cool, glad it's working out for you.

I'd probably recommend using -a_mode 4 and set a peak bitrate instad of leaving it entirely unconstrained, since devices may have a maximum supported rate. For Zune, it's 320 for audio-only files, and 192 for soundtracks in WMV files, IIRC.

Stuff like stereo seperation is a great thing to use VBR for, since it gets you the bits were you need them. I think people spend so much time sweating the hard clips they can miss that most of most full tracks aren't that hard.

  • vinnie97
  • [*][*][*][*]
Multiformat Listening Test @ 64 kbps - FINISHED
Reply #98
I'm still anxiously awaiting the forthcoming ~80kbps multiformat test, especially now that Ayoume has just released beta 5.5 to infuse more life into Vorbis.