Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: New Public Multiformat Listening Test (Jan 2014) (Read 165417 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

New Public Multiformat Listening Test (Jan 2014)

Reply #25
... Bitrates of WMA VBR:
WMA Std Q50: 74 kbps;
WMA Std Q75: 115 kbps;
WMA Pro Q25: 83 kbps;
WMA Pro Q50: 113 kbps.

None of them is close to the target bitrate.

Then CBR @96 kbps  is the only option.

By the way, You've posted here http://www.hydrogenaudio.org/forums/index....st&p=779933
Quote
+1 for AAC
+1 for Vorbis
+0.5 for MP3
+0.5 for WMA standard (who knows, maybe it isn't very bad...)

IMHO mp3@96kbps cannot compete with aac/vorbis@96kbps. 112 or 128 kbps MP3 is more interesting.


Is it still so?

New Public Multiformat Listening Test (Jan 2014)

Reply #26
Doesn't MPC really ony shine at settings that are intended to deliver transparent results which are like 3x what is being proposed for this test?
I only knew about this listening test at 128 kbps, where Musepack was tied for the win, and beat at least MP3, and WMA, which half of the people here want to see tested again, and a young implementation of AAC. But...

1) Musepack at 96kbps will have lowpass ~14kHz. That's too low IMHO.
...might be more important than preconceptions about the format. I also found another test at 96kbps, where Musepack was outperformed by a huge margin, maybe due to the mentioned lowpass. With this in mind I'd opt to only test Opus, QTAAC and Vorbis, and maybe use MPC as low anchor. Looking at the latter listening test, where's the point in testing WMA and MP3 again? It will put unneccessary strain on all listeners, by having them to test another one or two encoders per sample. Why not try to get smaller error bars on the tested encoders by having more results on the relevant modern codecs, instead of discouraging peope with a flood of encoders which have already been shown to be outperformed?*

Garf's suggestion of comparing HE/LC AAC implementations is also interesting, but should be done in a different test, similar to the recent MP3 listening test. For this one QTAAC-LC is probably fine.

EDIT: *OK, there are no more recent results than this old listening test, so maybe there were improvements to WMA and MP3 (Helix?) warranting another comparison. Also the PR effect might be larger if MP3 was included (Slashdot: "MP3 beaten once again by modern codecs"  )
It's only audiophile if it's inconvenient.

New Public Multiformat Listening Test (Jan 2014)

Reply #27
Holding a huge comparison isn't going to work; there just aren't enough people ready to spend the time testing seven codecs on many samples, esp. with modern encoders at 96kbps where differences are anything but obvious to most listeners.

The real priority here is getting a comparison between Opus 1.1 and the best AAC encoder, and getting enough samples and enough participants to feel confident about the result.

I still think that being able to compare to MP3 at the same rate is important to make the results meaningful to a wider audience beyond HA regulars. Yes, it won't win. That's fine. Maybe it could serve as low anchor; maybe it's too good to serve as low anchor.

Rather than tossing in another MP3 rate, I think trying to nail down "modern codecs at X kbps ~= MP3 at Y kbps" should be a separate test- probably one with just one modern codec and around three different MP3 rates.

The last time we talked about this I said we should include Vorbis because it was still somewhat commonly used by end users and because of its HTML5 etc use. But end-user Vorbis use has been slowly but steadily declining, webm never really took off, and Firefox gave in and started supporting using system codecs for MP3 and AAC in HTML5. Vorbis results could still be nice to have but I don't think it's a priority.

Musepack and WMA simply don't garner sufficient interest these days to justify the additional workload on volunteers for this test and the accompanying reduction in how many results actually get submitted and in the statistical meaningfulness of the conclusions. Esp. since musepack is (similar to what greynol said but without the silly exaggeration) generally considered to only be interesting at --quality 4 and up; musepack --quality 3 lost to even same-bitrate LAME ABR by fairly wide margins in tests (e.g. this).

So: either Opus v QT-AAC v LAME 96kbps + some other low anchor, Opus v QT-AAC + LAME 96kbps as the low anchor, or possibly Opus v QT-AAC v one other codec + LAME 96kbps as low anchor.

New Public Multiformat Listening Test (Jan 2014)

Reply #28
I started writing my post, left and came back, submitted it, then saw Kohlrabi's post which says some of the same things, esp. re. Musepack and re. MP3 being important to wider audiences.

Garf's suggestion of comparing HE/LC AAC implementations is also interesting, but should be done in a different test, similar to the recent MP3 listening test. For this one QTAAC-LC is probably fine.
Garf's second suggestion there really depends on his first. At 80kbps LC vs HE and FhG vs Apple are questions that may warrant exploring.

New Public Multiformat Listening Test (Jan 2014)

Reply #29
Garf's suggestion of comparing HE/LC AAC implementations is also interesting, but should be done in a different test, similar to the recent MP3 listening test. For this one QTAAC-LC is probably fine.
Garf's second suggestion there really depends on his first. At 80kbps LC vs HE and FhG vs Apple are questions that may warrant exploring.


I specifically put 80kbps because I'm reasonably confident on the answer for 96kbps. If we're going 96kbps we avoid he entire LC/HE question and I would put FhG (libfdk) and Apple. So for 96kbps I would do (consider this my serious suggestion):

- Apple 96kbps
- FhG libfdk 96kbps
- Opus 1.1 96kbps
- MP3 96kbps (~low anchor)
- MP3 128kbps (~high anchor but it may fail at that )

I would exclude Vorbis. It's still used for streaming a lot (e.g. Spotify), but it also didn't evolve since the previous test. I have no real idea whether Apple evolved, but libfdk is significant in that it's a state of the art open sourced encoder easily available in ffmpeg, used by Android, and AAC is getting enough use nowadays that the two best encoders are interesting to compare. Nobody outside HA uses Musepack, and I have no seen no case that it's competitive at 96kbps, so I would exclude it as well. Someone here stated it did good in the last 128kbps test but check what encoder it lost against and how that one did in the last tests...

LAME or Helix? Helix did actually win the last test many years ago, but hasn't evolved at all since and I'm not sure anyone actually uses it.

If the FhG guys want to submit a codec then it's going to be tricky to decide what to do with it. I think it'd almost need a pretest vs  Apple, as I wouldn't want to lose libfdk in the test. (Edit: Eh actually IIRC last time we allowed it because they were going to update Winamp with it. Winamp is dead now, so the most logical thing for a new FhG codec would be for them to update libfdk? Or are the FhG encoders available elsewhere?)

New Public Multiformat Listening Test (Jan 2014)

Reply #30
I still think that being able to compare to MP3 at the same rate is important to make the results meaningful to a wider audience beyond HA regulars. Yes, it won't win. That's fine. Maybe it could serve as low anchor; maybe it's too good to serve as low anchor.


I agree with what jensend said. We already know 128Kbps MP3 should be transparent to most users anyway.
There needs to be a 96Kbps MP3 even if it doesn't sound too good
We might use that as a low anchor instead of FAAC 96Kbps?

 

New Public Multiformat Listening Test (Jan 2014)

Reply #31
I specifically put 80kbps because I'm reasonably confident on the answer for 96kbps.

I totally agree. Is there no interest in lower bit-rates? 48 kbps perhaps?

Quote
If the FhG guys want to submit a codec... Or are the FhG encoders available elsewhere?)

Why would Fraunhofer want to submit a codec? The codec is already out there, you guys decide whether it should be in the test.
The codec has long been available in e.g. some Magix, Sonnox, or Sony software. They should contain the same version as Winamp, especially the Sonnox plug-in. Or are you asking for free-of-charge software?

Chris
If I don't reply to your reply, it means I agree with you.

New Public Multiformat Listening Test (Jan 2014)

Reply #32
Quote
LAME or Helix? Helix did actually win the last test many years ago, but hasn't evolved at all since and I'm not sure anyone actually uses it.

What is the latest version of the Helix MP3 encoder? I want to do a quick test to 'hear' if there any differences between Helix and LAME at 96Kbps.
I only found a binary on the RareWares site.

New Public Multiformat Listening Test (Jan 2014)

Reply #33
I think we should increase the number of samples. More samples leads to more statistically valid results.
And I think we should choose the samples so that the average bitrate of the samples tested, and average bitrate of albums, is roughly equal, like I did;
http://www.hydrogenaudio.org/forums/index....howtopic=100896
If the average bitrate of albums is 96k and the average bitrate of tested samples is 144k, the corpus is overrepresented by critical samples.

More than 20 samples? hm, maybe, I don't know.
20 is already enough high number. During the last we've waited a little bit more than month to get enough results. 


What do other think about it?

In the last Opus test in 2011, the contributors submitted 531 valid results, but there were only 30 samples. (17.7 results/sample)
http://listening-tests.hydrogenaudio.org/igorc/results.html
This is not the most efficient use of the effort. The number of sample, 30, is the statistical bottleneck that hinders to draw even more conclusions.
I recommend the number of samples by this formula: 4*sqrt(expected number of valid results/4)
By this, donators are putting 50% of their effort to the overall conclusions, while remaining 50% of their effort to accurately measure the quality of one sample, which helps developers.
The value used in overall conclusion is about 2x more accurate than the average quality of one sample.

In the last AAC 96kbps, there were 280 results, so if we were to expect the same number of contributes, 33 is the proposed number of samples.

New Public Multiformat Listening Test (Jan 2014)

Reply #34
Instead of asking for the desired codecs, I'd like to ask the following: "Which questions would you like answered by the listening test?"

I'm having a hard time coming up with any relevant questions which could be answered by a 96k or 80k multi-codec test, but that's just me. What about you guys? Are any of you using these codecs at these bitrates?

As far as I'm concerned, the most interesting questions revolve around Opus. All of the other codecs seem mature. libfdk may also be interesting, but I hardly know anything about it.
"Can Opus be considered (pretty much) transparent (most of the time) and at what bitrate?"
"How much has Opus 1.1 improved over older versions?" (e.g. the ones used in http://www.ietf.org/proceedings/80/slides/codec-4.pdf)
"As an online-radio station, should I replace my 64k ACCP stream with Opus?" (If this is even currently possible)

New Public Multiformat Listening Test (Jan 2014)

Reply #35
Quote
If the FhG guys want to submit a codec... Or are the FhG encoders available elsewhere?)

Why would Fraunhofer want to submit a codec? The codec is already out there, you guys decide whether it should be in the test.
The codec has long been available in e.g. some Magix, Sonnox, or Sony software. They should contain the same version as Winamp, especially the Sonnox plug-in. Or are you asking for free-of-charge software?

We can buy the encoder, that's not the problem. But I'm wondering where to get the latest and greatest since that's not so obvious from my side. If the Winamp encoder in whatever the latest Winamp release was is current, that's great.

Are there relevant differences between the libfdk_aac that you sold to Google and this encoder?

New Public Multiformat Listening Test (Jan 2014)

Reply #36
Instead of asking for the desired codecs, I'd like to ask the following: "Which questions would you like answered by the listening test?"

I'm having a hard time coming up with any relevant questions which could be answered by a 96k or 80k multi-codec test, but that's just me. What about you guys? Are any of you using these codecs at these bitrates?


Spotify currently streams to mobile devices at 96kbps. There is supposed to be a re-launch of their mobile stuff with free streaming this week, BTW.

Realistically most applications are using even higher bitrates nowadays but they are likely pointless to test. YouTube used 96kbps for many videos, but switched to 128kbps a year or so ago. 80kbps stereo means about 256kbps for 5.1 audio which I also use. But such results aren't directly comparable.

You could say 96kbps is the highest bitrate where we still expect to be able to detect differences between codecs.

Quote
I totally agree. Is there no interest in lower bit-rates? 48 kbps perhaps?


Technically, yes. But practically, do you know much examples where people are still deploying 48kbps music?

New Public Multiformat Listening Test (Jan 2014)

Reply #37
I will participate in this test too so here is my wishlist.

1. MP3 128 kbps. LAME 3.99.5 -V5 (high anchor)
2. MP3 96 kbps . LAME ABR is better than VBR (?)
3. Apple AAC 96 kbps (QAAC highest quality TVBR or CVBR.)
4. Opus 1.1 vbr 96 kbps.
5. Vorbis AoTuv 6.0.3 vbr 96 kbps.

low anchor - FAAC CBR 96 kbps, as Kamedo2 said. It has a reasonably low quality.
We had also discussion to have 2 low anchors. Actually low anchor and low-middle anchor.  It's good to have two acnhors to validate results. Low-middle anchor should be better than low anchor.
It can be: FAAC 64 (low anchor) and FAAC 96 (low-middle anchor).

I recommend FFmpeg MP2 96 for the very-low anchor, if you want to split the low anchor into two. It has a low-pass filter of 5.6kHz, much lower than the FAAC 96 which is 10kHz.
The comparison of FAAC 64, FAAC 96, LAME 96, LAME 128 is below. I think MP3 96 kbps is too good to be a low anchor.
http://www.hydrogenaudio.org/forums/index....howtopic=102876

New Public Multiformat Listening Test (Jan 2014)

Reply #38
Thanks for answering, I hadn't considered streaming to mobile. I guess that would be an interesting question: "How does the audio quality of popular mobile streaming services compare?" Codecs/settings should be chosen accordingly.

I agree that higher bitrates are pointless, even though there are many threads looking for the "absolute best mp3" etc.

Quote
Technically, yes. But practically, do you know much examples where people are still deploying 48kbps music?

If shoutcast counts (hopefully they'll leave the shoutcast page online after taking down Winamp) low bitrates seem to be quite popular. Maybe also due to mobile use? Here in Germany, most data flatrates are throttled to 64kbps after using up the paid-for high-speed traffic.

Bitrates of the top 10 stations (sorted by listeners):
192 x 1 (mp3)
128 x 1 (mp3)
64 x 3 (2x mp3, aac+)
48 x 1 (mp3 <-- yikes!)
32 x 4 (aac+)


New Public Multiformat Listening Test (Jan 2014)

Reply #39
OT
lvqcl, where did you get the bit rates for the WMA quality settings? Do you have all of them (Std and Pro)?

I simply encoded several albums and took the average bitrate. The bitrates are as follows (Quality 10/25/50/75/90/98):
std: 42 / 53 / 74 / 115 / 176 / 322
pro: 53 / 83 / 113 / 134 / 166 / 266

By the way, You've posted here
[...]
Is it still so?

+ AAC (Apple or FhG or both)
+ Opus 1.1
+ Vorbis (aotuv)
(MP3 and WMA aren't very interesting to me now)

New Public Multiformat Listening Test (Jan 2014)

Reply #40
In the last AAC 96kbps, there were 280 results, so if we were to expect the same number of contributes, 33 is the proposed number of samples.

Some info about the last AAC@96 test:

Discarded listeners: 13

Accepted listeners: 25. Among them:

10 listeners submitted results for all 20 samples
3 listeners submitted results for only 1 sample
2 listeners: results for 4 samples
2 listeners: results for 7 samples
and the remaining 8 listeners: results for 2, 3, 5, 6, 9, 10, 11, 14 samples.

New Public Multiformat Listening Test (Jan 2014)

Reply #41
Some info about the last AAC@96 test:

Discarded listeners: 13

Accepted listeners: 25. Among them:

10 listeners submitted results for all 20 samples
3 listeners submitted results for only 1 sample
2 listeners: results for 4 samples
2 listeners: results for 7 samples
and the remaining 8 listeners: results for 2, 3, 5, 6, 9, 10, 11, 14 samples.

Thank you.

It seems the majority come from the "full" contributors. I think testing the same sample by more than 10 people is a bit overkill, but if we were to double the sample size to 40, which is good for the statistical point of view, few can be the "full" contributors.

New Public Multiformat Listening Test (Jan 2014)

Reply #42
In the last Opus test in 2011, the contributors submitted 531 valid results, but there were only 30 samples. (17.7 results/sample)
http://listening-tests.hydrogenaudio.org/igorc/results.html
This is not the most efficient use of the effort. The number of sample, 30, is the statistical bottleneck that hinders to draw even more conclusions.
I recommend the number of samples by this formula: 4*sqrt(expected number of valid results/4)
By this, donators are putting 50% of their effort to the overall conclusions, while remaining 50% of their effort to accurately measure the quality of one sample, which helps developers.
The value used in overall conclusion is about 2x more accurate than the average quality of one sample.

In the last AAC 96kbps, there were 280 results, so if we were to expect the same number of contributes, 33 is the proposed number of samples.


I would actually go one step further. Why not have only one listener for each sample, i.e. give everybody different samples. That would maximize both the statistical significance of the conclusion and the usefulness to the developers (at least for me).

New Public Multiformat Listening Test (Jan 2014)

Reply #43
Well, a lot of suff is going on.

Kamedo2,
Until now one of the conditions of HA tests is "no less than 10 results per sample".
Please have a look through these 10 "full" contributors. http://listening-tests.hydrogenaudio.org/i...-a/results.html zip file.

Sadly some of them have got tired let's say after 10 samples and after that they have just rated the low anchor.

New Public Multiformat Listening Test (Jan 2014)

Reply #44
I would like to change my choice to:

AAC (Apple/qaac) 80 kbps

AAC (Fraunhofer/fhgaacenc) 80 kbps

AAC (Fraunhofer/fdkaac) 80 kbps

Opus (1.1) 80 kbps

Vorbis (libvorbis 1.3.3)

Vorbis (aoTuV b6.03)

WMA Standard

WMA Pro

Still, don't care about MP3 and MPC.

I simply encoded several albums and took the average bitrate. The bitrates are as follows (Quality 10/25/50/75/90/98):
std: 42 / 53 / 74 / 115 / 176 / 322
pro: 53 / 83 / 113 / 134 / 166 / 266
Thanks, I'll do some test too.

New Public Multiformat Listening Test (Jan 2014)

Reply #45
Instead of asking for the desired codecs, I'd like to ask the following: "Which questions would you like answered by the listening test?"

Hi, Gecko.

Agree. But there are so many questions those can be answered by one public test.
Streaming, portable use etc.  People will express they need and we will test that.

I'm having a hard time coming up with any relevant questions which could be answered by a 96k or 80k multi-codec test, but that's just me.

Also 96 kbps (VBR actually goes quite high 110-120 kbps max) can give a hint what happens on ~128 kbps.

YouTube used 96kbps for many videos, but switched to 128kbps a year or so ago.

Just checked a few fresh videos at YouTube. The default 360p Youtube's videos still uses AAC 96 kbps.

New Public Multiformat Listening Test (Jan 2014)

Reply #46
I would actually go one step further. Why not have only one listener for each sample, i.e. give everybody different samples. That would maximize both the statistical significance of the conclusion and the usefulness to the developers (at least for me).

Yes, I had thought of that, but in that case, the standard error of the score will be unacceptably big; I mean, the each score will be unreliable. Many people have different idea of the score, and it will worsen the situation. We cannot even say which sample resulted in the worst quality.

New Public Multiformat Listening Test (Jan 2014)

Reply #47
I will ask people, please,  to mainly concentrate in the choice of codecs, bitrate etc. Lately we will have time to discuss samples, a quantity of them and another conditions.

First of all we should figure out what we want to test.
Though parallel discussions are ok.

New Public Multiformat Listening Test (Jan 2014)

Reply #48
I would like to change my choice to:

OK, eahm. Updating your choice.

It's worth to clear that everybody can change his/her choice.

New Public Multiformat Listening Test (Jan 2014)

Reply #49
I would like to change my choice to:

AAC (Apple/qaac) 80 kbps  AAC (Fraunhofer/fhgaacenc) 80 kbps  AAC (Fraunhofer/fdkaac) 80 kbps
Opus (1.1) 80 kbps

Do all these codecs have VBR mode at 80 kbps?