Public tests. What comes next?

Topic: Public tests. What comes next? (Read 15412 times) previous topic - next topic

0 Members and 1 Guest are viewing this topic.

Public tests. What comes next?

2015-07-19 21:06:08

Hi, guys and girls.

I believe that public tests are useful. It's the only reason they were performed. This way people can choose optimal encoders.

And it's time to ask a question. Rather a new one: Do we need a new public tests of lossy encoders?
Lossless has become a choice of many people but lossy still applicable for portable use, streaming, video rips etc… Users don't need that much to know anymore which encoders produce acceptable results at relatively low bitrates (64-96 kbps) but today the most interesting question is another:

Which encoders/formats produce transparent or nearly transparent results at higher bitrates?

It's more than possible to test codecs at 128 kbps and even higher. As tests have shown even the most advanced formats are far to be transparent at 96-100 kbps. For example, we could test AAC encoders at 128 kbps or LAME MP3 up to ~160-170 kbps, maybe higher as well or different versions of it. Also MPC, Vorbis, Opus and hopefully some new formats.
People are mostly interested in mid and high bitrates. http://www.hydrogenaud.io/forums/index.php...=105925&hl=

Also the last public test was performed mainly by Kamedo2 while Steve Forte Rio, me and another people have helped here and there. All real credit goes to Kamedo2 here. Personally I don't plan to do public tests (at least in near future because of personal life/reasons) but I will be glad to help and also submit a dozen or another of results as a listener.

It's time for a new people with a fresh ideas.

Leave your comments. It shouldn’t be necessary something elaborated. Just something that comes to your mind… that will be already a good start.

Igor

Public tests. What comes next?

Reply #1 – 2015-07-19 22:29:53

The things were changing but this guide is still good http://www.rarewares.org/rja/ListeningTest.pdf

Public tests. What comes next?

Reply #2 – 2015-07-20 00:06:03

Quote from: IgorC on 2015-07-19 21:06:08

Which encoders/formats produce transparent or nearly transparent results at higher bitrates?

I agree that this is the primary question that people encoding (or buying) their music are interested in. But I think you should clarify this.

The question of "near transparency" has been answered by Kamedo's test, I would say. If you aim for "transparent results", do you mean transparency in the average codec score (i.e. averaged over all samples in the test)? Or transparency for every sample in the test?

The latter would require an extreme amount of preparation (for item selection) and listening effort, so I do not recommend trying that out.

A test for "average transparency" could be reasonable if the goal is to test for statistically significant differences between the mean codec scores and the mean hidden-reference score.

Chris

Public tests. What comes next?

Reply #3 – 2015-07-20 01:17:39

Yes, it's about an average transparency with a similar criteria of "virtual transparency" as of verification test of AAC 128 kbps during its standarization.

In my opinion there are 2 moments to pay attention:
1. An average score which should be on a high side. I would say ~4.7-4.8 (MUSHRA 95 score) (?)
2. Distribution of score for particular samples. The lowest scores shouldn't be less than ~4.2-4.3 (?) (similar to R-FACTOR's levels) http://www.tamos.com/htmlhelp/voip-analysi...andr_factor.htm.

That's just a rough idea. I'm not fully sure if that can be met at 128 kbps testing codecs like AAC, Opus, Vorbis, MPC etc... Well, that's why it will be useful to perform a test.
None of these conditions were met at 96 kbps.

P.S. AAC codecs should be retested as well.

Public tests. What comes next?

Reply #4 – 2015-07-20 03:28:45

Quote from: IgorC on 2015-07-19 21:06:08

Which encoders/formats produce transparent or nearly transparent results at higher bitrates?

Yes, it's an important question, but donators feel they are not entitled to test and give up testing. That's a problem.
Two ideas:

Testing only critical samples
Test multiple bitrates, so that donators will be trained by lower bitrates, and feel they are entitled to test.

To collect critical samples, One can pick from the last 2014 public listening test, so that the average MOS will be lower and equally lower by encoders.
I think I should try to write a script to pick critical samples without affecting the conclusions.

P.S. I have just finished testing MP3, Opus, AAC at 96kbps, using three different set of samples; 40 samples from 2014 public listening test, 25 samples from my corpus, 9 samples from SoundExpert. These three corpora produced almost equal conclusions.

Public tests. What comes next?

Reply #5 – 2015-07-20 06:53:10

I welcome a high bitrate test though I see the problems mentioned.

To me giving the test a meaning different from the usual one solves the problem.
I guess we all agree that in a public listening test with samples chosen the usual way all reasonable codecs tested will show up as transparent or near-transparent, with the psychologic burden for the listeners mentioned.
Such a test doesn't make sense to me.

So IMO a high bitrate test should focus on problematic samples.
The good thing is: problematic samples are often problematic to several codecs though not to the same degree.
Sure the results of such a test have to be interpreted in a different way than most people are used to do.
A good target would be: getting experience in which musical context the codecs tested yield results lower than say 4.3 {or whatever be defined as not near-transparent in the test).
Not targeting at a winner in the test brings relief to the critical sample selection process. We could define a set of musical genres, and choose samples so that especially electronic stuff has only a moderate contribution to the test. Within each genre we should have roughly half of the samples as transient problems and half of them as tonal ones.

Such a test would help everybody to choose what codecs are appropiate to him with respect to his genres and his abilities to hear artifacts.

A winner isn't needed (though I guess some codecs will be less favorable than others for many users according to the test results}.

Public tests. What comes next?

Reply #6 – 2015-07-20 08:49:43

I should add that an average score over all samples and all listeners doesn't make sense then.
Instead for each sample average score over all listeners together with the confidence interval should be given as the main result IMO.

Public tests. What comes next?

Reply #7 – 2015-07-20 19:32:11

My listening test at 96kbps using three different corpus implies conclusions will not be affected much by the set of samples. It means we can safely test different encoders at high bitrates using only critical samples.

Public tests. What comes next?

Reply #8 – 2015-07-20 19:57:33

What I want to see is a test defined by the developer switch. I don't like any of the existing tests because the "organizer" pushed the bitrate of the different codecs to be all the same or almost, it doesn't work like that, people don't do that.

If the developer tells me VBR 96 for the specific codec I would assume it gives me around 96 and that's what I want to test.

I want a test with Apple AAC, FDK AAC, FHG AAC, LAME MP3, Opus, Vorbis all with the "developer switch" set to 96 or 128 or whatever people want, just don't mess around trying to get the same bitrate.

Public tests. What comes next?

Reply #9 – 2015-07-20 20:08:23

And you don't mind if this "developer switch for 96 kbps" will give you ~110 kbps on average?

Public tests. What comes next?

Reply #10 – 2015-07-20 20:42:29

Quote from: lvqcl on 2015-07-20 20:08:23

And you don't mind if this "developer switch for 96 kbps" will give you ~110 kbps on average?

Not sure, I was actually thinking about lower bit rate optimization.

Public tests. What comes next?

Reply #11 – 2015-07-22 14:52:36

Testing Modern codecs at 128kbps, 48kHz using only critical samples seems to be a good idea. The intended application is the internet video/music streaming, but it should be useful for portable storage as well.
In the last 2014 public listening test, samples were so non-critical that many had trouble maintaining morale. If we use critical samples extracted from the past test, probably we can attract enough volunteers up to 128kbps.

For the "developer switch" problem posted by eahm, we can use multiple commonly used switches resulting in different bitrates, and make graph like this. http://www.hydrogenaud.io/forums/index.php?showtopic=97913

Public tests. What comes next?

Reply #12 – 2015-07-25 18:54:19

There are posts about testing critical samples mainly. And I share this point as well. A high amount of critical samples is the way to go.

Another question is what to test. Would it be MP3, AAC or multiformat test?

I think about 3 possible alternatives.

- MP3 at ~170 kbps ( LAME V3, Helix, etc...)
- AAC at ~128 kbps ( FDK, FhG, Apple ...)
- Multiformat at (MP3 ~170k and AAC, Vorbis, Opus, MPC ~128k)

Or any other (?) like multibitrate LAME at 128,160 and 192k.

The question is open to public.

P.S. It might be useful to look into:
Most popular formats
Most popular encoders

Public tests. What comes next?

Reply #13 – 2015-07-25 20:43:52

List of my multiple rates / high rates listening tests.
AAC/MP3 at 128/192 kbps
AAC/WMA/MP3 at 128/192 kbps
MP3 at 224 kbps
Opus/CELT/AAC at 75/100 kbps

Public tests. What comes next?

Reply #14 – 2015-07-25 22:07:58

a) I also think that mp3 should get a higher avg. bitrate. I guess mp3 users are aware of having to use a higher bitrate than say aac users to achieve the same quality level and are willing to accept this because of mp3's universal usability or other reasons.

b) I personally would prefer settings which result in a near-perfect quality even with the hardest samples. So for mp3 I'd prefer an avg. bitrate like 256 kbps (Lame 3.99.5 -V0 to be precise). For aac I think something around 180 kbps might yield near-perfect quality. As for the mp3 192 kbps / aac 128 kbps pair Kamedo's test gives us a feeling already about the quality achieved, Sure a one person test is something different than what we want to do here. Sure the psychologic burson drops in with my suggestion at least for a test set like Kamedo's, so we should look for harder samples.

Public tests. What comes next?

Reply #15 – 2015-07-30 00:16:02

It's perfectly understandable an interest to test high bitrates (~256 kbps) as today music delivery uses such rates (Spotify, iTunes store etc.).

But LAME V2 (~192 kbps) is already considered transparent for most of people (not sure if it changes considerably or anyhow testing critical samples exclusively). Testing V2 will be not impossible but challenging considering that until now all public tests had max. bitrate 128k.

First tests were donde on low rates (64 kbps) then we gradually went to 96 kbps preparing conditions for future tests (128 kbps and more). It takes time and effort to collaborate with listeners, work with them this way.

I'd say V2 is already on way high side, on a limit of possibility ... as for now

Public tests. What comes next?

Reply #16 – 2015-07-31 19:25:39

I agree.

Guess what I wanted is a bit of wishful thinking that doesn't work with a public test.

Public tests. What comes next?

Reply #17 – 2015-07-31 23:40:46

I agree as well, and I also support Igor's definition:

Quote from: IgorC on 2015-07-20 01:17:39

1. An average score which should be on a high side. I would say ~4.7-4.8 (MUSHRA 95 score) (?)
2. Distribution of score for particular samples. The lowest scores shouldn't be less than ~4.2-4.3 (?) (similar to R-FACTOR's levels) http://www.tamos.com/htmlhelp/voip-analysi...andr_factor.htm.

That's just a rough idea. I'm not fully sure if that can be met at 128 kbps testing codecs like AAC, Opus, Vorbis, MPC etc...

I'm quite sure that point 2 cannot be met for every test sample using the given codecs. For example, Opus struggles on some very tonal material, while the others cannot handle items showing inter-channel phase differences too well.

Engineering-wise, at high bit-rates both problems have only been addressed in the more recent coding standards Extended HE-AAC (MPEG-D) and 3D Audio (MPEG-H). So personally I see little benefit of a high-bit-rate test right now.

A ~128-kbit/s test incl. at least Extended HE-AAC would be interesting and useful since it could serve as a check whether point 2 can be reached with that codec. But like all others here, I have no idea when xHE-AAC will become available :/ (Edit: since in the 2014 test, MP3 was shown to require about 40 kbit/s more bit-rate for comparable quality, one could then include LAME v3.5 - v3 in this test).

Chris

Public tests. What comes next?

Reply #18 – 2015-08-01 08:32:19

Quote from: C.R.Helmrich on 2015-07-31 23:40:46

... while the others cannot handle items showing inter-channel phase differences too well. ...

Can you give samples for this kind of issues? (I'm always interested in issues which might be relevant for my encoding and listening practice).

Public tests. What comes next?

Reply #19 – 2015-08-01 09:43:54

Well, there's the beginning of Drug by Berlin (shadowking's sample BerlinDrug) and the two samples in my test set for which AAC and Vorbis performed worst in the 2014 test.

By the way, shadowking also once posted the worst AAC/Vorbis/... killer sample I am aware of, emese, here. I still have it somewhere... Edit: it is still available in this Zip file.

Chris

Public tests. What comes next?

Reply #20 – 2015-08-01 10:53:30

Thanks a lot.

Public tests. What comes next?

Reply #21 – 2015-08-02 21:46:25

Quote from: C.R.Helmrich on 2015-07-31 23:40:46

I'm quite sure that point 2 cannot be met for every test sample using the given codecs. For example, Opus struggles on some very tonal material, while the others cannot handle items showing inter-channel phase differences too well.

Engineering-wise, at high bit-rates both problems have only been addressed in the more recent coding standards Extended HE-AAC (MPEG-D) and 3D Audio (MPEG-H)...

Also pre-echo artifacts are still present even in advanced formats. While AAC (thanks to TNS + shorter blocks) hugely improves transients over MP3 the first one still requires significant rates to ged rid of this kind of artifacts.

I have tried Fraunhofer (from the last Winamp) and Apple AAC encoders. Both require high rates to be transparent on such samples like Castanets, EIG (192 kbps almost transparent, 256 kbps - transparent) and Linchpin wasn't transparent even at 256 kbps VBR (though it was practically there).

/mnt as well has reported several times about pre-echo artifacts at high bitrates.

Public tests. What comes next?

Reply #22 – 2015-08-03 00:37:55

Quote from: IgorC on 2015-08-02 21:46:25

EIG (192 kbps almost transparent, 256 kbps - transparent) and Linchpin wasn't transparent even at 256 kbps VBR (though it was practically there).

Yes, those "zapping" transients are a problem for AAC and its successors, at least to some listeners (not everyone seems to be sensitive to these artifacts). The problem with Linchpin isn't so much due to the AAC format but due to the difficulty of detecting the necessity of TNS/short blocks on the transient frames in the encoder.

By the way, regarding inter-channel phase differences: I just saw Kamedo's bit-rate/quality statistics here, and apparently Opus has problems on the two items in my test set as well (below-average quality using above-average bit-rates).

Chris

Public tests. What comes next?

Reply #23 – 2015-08-03 02:59:18

Yes, interesting.

Even Vorbis was ok on that sample.

http://listening-test.coresv.net/bytrack/index.htm

Notice