HydrogenAudio

Hydrogenaudio Forum => Listening Tests => Topic started by: IgorC on 2009-12-26 03:06:34

Title: Public Listening Test [2010]
Post by: IgorC on 2009-12-26 03:06:34
Many people will be agree that it’s time for public listening test.

Short Agenda:
1.   Organization
2.   Samples
3.   Codecs
4.   Bitrates

Detailed Agenda:
1.   Organization.
People with experience of personal or public test are welcome here.
Personally I can afford help for conduction of test.
Unfortunately if I understand correctly that Sebastian informed that he don’t want to conduct public tests.

2. Samples.
Different styles of music, different levels of difficulty, pointing issues etc....?  To be discussed here or in separate topic.

3. Codecs
3.a)  Multiformat test.
3.b)  AAC test. 
I think that it’s more appropriate to conduct AAC test because there are at least 3 AAC encoders to test. Nero, Apple and Coding Technologies/Dolby. All these codecs were updated during this year. More information in List of AAC encoders (http://www.hydrogenaudio.org/forums/index.php?showtopic=76751)

4.Bitrates.
96-100 kbits/s?
It's possible to perform test at 128 kbits/s with very hard samples. Aemese-like samples will be very easy to spot from original.


Sorry for my English.

All kind of thoughts and suggestions are welcomed here.
Title: Public Listening Test [2010]
Post by: Fandango on 2009-12-26 17:02:18
Codecs: I would love to see an AAC test including Nero 1.3.3.0 vs 1.5.1.0.

Samples: Has 24bit / >44kHz material ever been tested? With more and more webshops selling music in these resolutions and samplerates this might get interesting.

Title: Public Listening Test [2010]
Post by: antman on 2009-12-26 18:06:21
Definately AAC.  Nero v new Nero alone will bring in a lot of interest.  Add in Apple's latest, with it's true vbr options.  And it'd be interesting to see CT/Dobly in the mix since it's been left out of so many AAC discussions.  And I agree with you, it should be in the ~96k range.
Title: Public Listening Test [2010]
Post by: Sebastian Mares on 2009-12-26 18:36:00
I could imagine an AAC test at 80 or 96 kbps. At 80 kbps you could feature both LC and HE encoders and see which one performs best. The winner could be used in a multiformat listening test at the same bitrate.

Regarding the samples, muaddib once had an idea to create a samples DB that should be divided into problem samples and regular samples. When preparing a test, one could pick X samples from that DB based on lottery numbers so that people don't complain that sample Y was selected with the purpose of letting encoder A appear better than encoder B. Additionally, samples collected especially for the respective test could be used like it was done in the past.

Good luck with the test Igor. Are you planing to use existing software (ABC/HR for Java) for the test or something new? ABC/HR's development is dead unfortunately and there were some problems in my last test requiring the installation of JRE 1.5 (which some people with JRE 1.6 found annoying).
Title: Public Listening Test [2010]
Post by: C.R.Helmrich on 2009-12-26 19:00:23
An AAC test would be interesting indeed. But at 96 kbps or so, you would have to test both AAC LC and HE-AAC, since they offer about the same average quality. With about 4 encoders under test (Apple, 2x nero, Dolby, ...), this would give 8 codecs under test. That's too many in my opinion (risk of overload and fatigue). How about 112 kbps or so? There you can be relatively sure that LC is better then HE on average, i.e. you would need to test only LC.

Chris
Title: Public Listening Test [2010]
Post by: IgorC on 2009-12-26 19:39:19
Good luck with the test Igor.

That sounded sarcastic. 


Are you planing to use existing software (ABC/HR for Java) for the test or something new? ABC/HR's development is dead unfortunately and there were some problems in my last test requiring the installation of JRE 1.5 (which some people with JRE 1.6 found annoying).

Yes, I remember of this issue and speaking of me  I could get rid of this without problem however with some workaround. Must see what happens with other people.   

Definately AAC.  Nero v new Nero alone will bring in a lot of interest.  Add in Apple's latest, with it's true vbr options.  And it'd be interesting to see CT/Dobly in the mix since it's been left out of so many AAC discussions.  And I agree with you, it should be in the ~96k range.

True VBR mode isn't available in Windows version. Many people couldn't use it.

An AAC test would be interesting indeed. But at 96 kbps or so, you would have to test both AAC LC and HE-AAC, since they offer about the same average quality. With about 4 encoders under test (Apple, 2x nero, Dolby, ...), this would give 8 codecs under test. That's too many in my opinion (risk of overload and fatigue).
   
Itunes has HE-AAC only up to 80 kbps. Nero's defaults option goes to HE-AAC up to ~85 kbps.
From my personal test Apple LC-AAC is better than HE-AAC already at 80 kbps.
http://www.hydrogenaudio.org/forums/index....showtopic=74781 (http://www.hydrogenaudio.org/forums/index.php?showtopic=74781)
But yes it's personal test, not public.
96 kbps will be enough to test only LC-AAC codecs.
Anyways all bitrates should be shifted to 100 kbps as Itunes constrained VBR mode at 96 kbps produces real bitrates ~100 kbps. There is mediacoder which can shift bitrates for CT encoder.


How about 112 kbps or so? There you can be relatively sure that LC is better then HE on average, i.e. you would need to test only LC.
Chris
   
iTunes  hasn't 112 kbps. Only 96 and 128.

As I can see people interesting  in AAC test and next codecs:
1. Nero 1.3.3
2. Nero 1.5.1
3. CT
4. Apple
5. Divx (???) or other?

I think 4-5 codecs are already enough. Not more. What do you think?

I found that Divx AAC enocoder has all potential to get into test.
http://www.hydrogenaudio.org/forums/index....st&p=675456 (http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=76751&view=findpost&p=675456)
Title: Public Listening Test [2010]
Post by: IgorC on 2009-12-26 19:55:01
Regarding the samples, muaddib once had an idea to create a samples DB that should be divided into problem samples and regular samples. When preparing a test, one could pick X samples from that DB based on lottery numbers so that people don't complain that sample Y was selected with the purpose of letting encoder A appear better than encoder B. Additionally, samples collected especially for the respective test could be used like it was done in the past.

I see. It's very important part to get fair results. It should be done this way.
Title: Public Listening Test [2010]
Post by: guruboolez on 2009-12-26 20:06:25
~100 kbps listening test should be fine. Quality at 128 kbps was usually too high to get interesting results on a public test.
I'd rather have a multiformat test than a pure AAC one. Two tests would be ideal: an AAC one and then the multiformat one which would include the best AAC implementation. But it will give you much more annoyance. So go for either AAC or multiformat.

About Nero AAC: I don't see the point of testing the two last releases. Otherwise why wouldn't we test two different implementations of other competitors? If Nero developers have release this new encoder it's because it was tested, ready to use, and therefore solid and trustworthy.
Don't put too many competitors in the arena: the more encoders you have the harder is it to rate them accurately. At the end, many contenders will only bring statistic noise and the test will end with no clear winner. And don't forget the anchors: they're really essential to avoid or limit discrepancies.

Ideally, and for a public listening test, I would go for 2 competitors and 2 anchors. But it won't be very attractive to many people. So 3 competitors and 2 anchors is probably the most doable configuration.

Good luck.
Title: Public Listening Test [2010]
Post by: C.R.Helmrich on 2009-12-26 20:06:48
I see. Didn't know that most encoders don't offer HE-AAC for bitrates higher than ~ 85 kpps.

As I can see people interesting  in AAC test and next codecs:

...

I think 4-5 codecs are already enough. Not more. What do you think?


Agreed. But for completeness sake: Fraunhofer is currently finalizing quality tunings on their encoder which have been going on for about two years. Release is scheduled for end of January. If there is any interest, I can ask if it's possible to provide an evaluation encoder for this test.

Chris
Title: Public Listening Test [2010]
Post by: Sebastian Mares on 2009-12-26 22:12:38
Good luck with the test Igor.

That sounded sarcastic. 


No, seriously.
Title: Public Listening Test [2010]
Post by: hellokeith on 2009-12-27 03:28:17
4.Bitrates.
96-100 kbits/s?
It's possible to perform test at 128 kbits/s with very hard samples. Aemese-like samples will be very easy to spot from original.


If this is already clear enough, then forgive me, but I think the test needs to have a very defined and limited goal with respect to bit rates.  Because when you start adding parameters and/or choosing optional items, things can get quite unclear what is equivalent or fair from one encoder to another encoder, especially if you are dealing with command-line vs gui.
Title: Public Listening Test [2010]
Post by: antman on 2009-12-27 04:36:03
True VBR mode isn't available in Windows version. Many people couldn't use it.


It's available in Quicktime Pro.

About Nero AAC: I don't see the point of testing the two last releases.


Now that you mention it, it doesn't really make sense.  Drop Nero 1.3.3 from the test.

So 3 competitors and 2 anchors is probably the most doable configuration.


Agreed.  Man of logic. 
Title: Public Listening Test [2010]
Post by: IgorC on 2009-12-27 20:04:14
About Nero AAC: I don't see the point of testing the two last releases. Otherwise why wouldn't we test two different implementations of other competitors? If Nero developers have release this new encoder it's because it was tested, ready to use, and therefore solid and trustworthy.

Yes and No.
There is no reason to compare two Nero versions with other encoders.
But many people use Nero encoder intensively here and want to know if there is any improvement (more specifically in quality area).
You also have compared two versions of Nero partially in your last test (http://www.hydrogenaudio.org/forums/index.php?showtopic=58724&st=0&start=0). I've compared two versions of Nero too in my previous test (http://www.hydrogenaudio.org/forums/index.php?showtopic=66949&hl=)
But there are many encoders to test and it will be reasonable option to switch to last Nero encoder.
I will make a poll to see what people prefer.


Don't put too many competitors in the arena: the more encoders you have the harder is it to rate them accurately. At the end, many contenders will only bring statistic noise and the test will end with no clear winner. And don't forget the anchors: they're really essential to avoid or limit discrepancies.

Ideally, and for a public listening test, I would go for 2 competitors and 2 anchors. But it won't be very attractive to many people. So 3 competitors and 2 anchors is probably the most doable configuration.

I can't disagree here. Also Sebastian's public tests indicate that 3 (maybe 4. It should be discussed) competitors is a good balance. Even more taking into account that today AAC encoders provide enough good quality at 100 kbps and make it more difficult to ABX.



Possible high anchor:
1.LAME -V4 or -V3. I think -V5 is too risky to be a high anchor.
2.Nero or Apple 128 kbps

Possible low anchor:
In my opinion low anchor shouldn't be that bad.
LAME ~V7  (~100 kbps) or ABR 100.

Agreed. But for completeness sake: Fraunhofer is currently finalizing quality tunings on their encoder which have been going on for about two years. Release is scheduled for end of January. If there is any interest, I can ask if it's possible to provide an evaluation encoder for this test.

Chris

That would be really good.
As it was already mentioned we should limited to 3 competitors.

I propose to do two separate AAC tests during 2010.

Three more wide spread AAC encoders will be tested in 1st test which are:
1. Nero 1.5.1
2. Apple
3. CT
4?. Maybe Nero 1.3.3

And then the winner will be tested in 2d test:
1. Winner of 1st test
2. Divx (?)
3. Fraunofer (?)
or any other encoder with reasonable quality.

True VBR mode isn't available in Windows version. Many people couldn't use it.


It's available in Quicktime Pro.

It's not practical. Person must open each source file in QuickTime player then remux it MOV->M4A and will be tired after encoder only 1 album (>10 tracks).
I think we should stick with practical iTunes unless there will be any solution.

4.Bitrates.
96-100 kbits/s?
It's possible to perform test at 128 kbits/s with very hard samples. Aemese-like samples will be very easy to spot from original.


If this is already clear enough, then forgive me, but I think the test needs to have a very defined and limited goal with respect to bit rates.  Because when you start adding parameters and/or choosing optional items, things can get quite unclear what is equivalent or fair from one encoder to another encoder, especially if you are dealing with command-line vs gui.

Sorry, I don't understand where you go. Please, be more specific.

Title: Public Listening Test [2010]
Post by: IgorC on 2009-12-27 23:16:42
The poll of Nero 1.5.1 vs 1.3.3 is opened. http://www.hydrogenaudio.org/forums/index....showtopic=77303 (http://www.hydrogenaudio.org/forums/index.php?showtopic=77303)
Title: Public Listening Test [2010]
Post by: /mnt on 2009-12-27 23:25:15
I would like to see Quicktime's new true VBR encoder on the test, even if it's a Mac OS X exlusive. Also having FAAC on the test as a low anchor would be interesting.
Title: Public Listening Test [2010]
Post by: funkyblue on 2009-12-28 00:16:24
Please tell me if I'm on the wrong track.

How about a test to test the transparency of codecs?
I'd love to see another test and find out what the current VBR sweet spot is for LAME.
I suspect -V3 could replace -V2.

I miss the days where there was consensus over -V2 (I remember when --r3mix first came out to!)

Thanks
Title: Public Listening Test [2010]
Post by: Polar on 2009-12-28 02:18:02
Don't put too many competitors in the arena: the more encoders you have the harder is it to rate them accurately. At the end, many contenders will only bring statistic noise and the test will end with no clear winner. And don't forget the anchors: they're really essential to avoid or limit discrepancies.

Ideally, and for a public listening test, I would go for 2 competitors and 2 anchors. But it won't be very attractive to many people. So 3 competitors and 2 anchors is probably the most doable configuration.

I can't disagree here. Also Sebastian's public tests indicate that 3 (maybe 4. It should be discussed) competitors is a good balance. Even more taking into account that today AAC encoders provide enough good quality at 100 kbps and make it more difficult to ABX.

Possible high anchor:
1.LAME -V4 or -V3. I think -V5 is too risky to be a high anchor.
2.Nero or Apple 128 kbps
Your and Guruboolez' own private tests have shown that AAC at 96k is likely to tie or even slightly outperform MP3 at 128k, which in itself has proven to be pretty much transparent in any recent 128k test, including last year's 128k MP3 test conducted by Sebastian (http://www.listening-tests.info).  The latter didn't include a high anchor either, since it's supposed to function as a clearly distinguisable reference point.  If you have AAC at 96k tested against LAME -V5 or even -V3, you risk having all of them rated 4.5 or so, making it hard, even impossible, to draw any interesting conclusions.  In others words, I believe codecs have evolved to such a point where, competing against 96k AAC, imho, picking a high anchor is moot.

The only valid reason I see for still including one, is to prove the almost self-fulfulling prophecy that MP3 is not the codec of choice in the 96k range, at least quality-wise.  Whether that makes it worth sacrificing participants' precious time and efforts, is debatable.

In fact, last year's 128k MP3 test (http://www.listening-tests.info) didn't even give LAME the highest grade of the bunch, so having Helix as high anchor makes just as much sense to me.

Edit: then again, if we'd raise the bar even higher than already near-to-transparent LAME -V4 (or so), by simply throwing in the lossless original samples as high anchor, perhaps lowpassed at some 17 kHz, maybe that could keep the participants from handing out 4.5 averages to the 96k lossy samples.  This could set us up for a bold teaser: does 96k sound as good as the lossless originals?

Possible low anchor:
In my opinion low anchor shouldn't be that bad.
LAME ~V7  (~100 kbps) or ABR 100.
Agreed, and another thing your own tests have indicated: any MP3 codec at this bitrate, even LAME, will most probably wind up statistically worse by quite a margin, which makes it a very valid low anchor, even more so because it'll be competing at the same bitrate as its contenders.

I believe that, in a larger-scale, public test, we shouldn't be too convinced that at 96k, AAC is the poised winner over any other encoder.  Mind you that this bitrate range has been, well, untested whatsoever to a wide, public extent in recent years, as opposed to 128k, and 64k and below, both of which are potentially far off in terms of perceived audio quality.  Cf. http://wiki.hydrogenaudio.org/index.php?ti...Listening_Tests (http://wiki.hydrogenaudio.org/index.php?title=Hydrogenaudio_Listening_Tests) and especially Sebastian's (http://www.listening-tests.info) and Roberto's (http://web.archive.org/web/*/http://www.rjamorim.com/test/index.html) accounts of the public tests they conducted.  Which is why I think we should put Aotuv Vorbis and WMA to the test too.  Chances are fair that it may end up as a statistical tie once again.

So I guess, in the end, my vote goes out to a multi-format test featuring 1 carefully selected AAC codec, Vorbis (Aotuv?) and WMA (Professional?), with LAME at the same bitrate as low anchor (edit) and lossless original as high anchor.
Title: Public Listening Test [2010]
Post by: IgorC on 2009-12-28 05:13:21
So I guess, in the end, my vote goes out to a multi-format test featuring 1 carefully selected AAC codec, Vorbis (Aotuv?) and WMA (Professional?), with LAME at the same bitrate as low anchor (edit) and lossless original as high anchor.

The decision of such selection should be based on the result of public test.
The problem is that it was very long time from last LC-AAC test. It's not the case of Vorbis where Aotuv is clearly optimal encoder. AAC test is must to be done before the Multiformat as many people here will be agree.


In others words, I believe codecs have evolved to such a point where, competing against 96k AAC, imho, picking a high anchor is moot.

Yep.
Apple 96 VBR was used as high anchor in previous public test and it did actually very well http://www.listening-tests.info/mf-48-1/results.htm (http://www.listening-tests.info/mf-48-1/results.htm)
In my opinion correct me if I wrong but unless there is good reason to keep high anchor we shouldn't include it.

As I can see from poll inclusion of two versions of Nero is very meaningless.

Updated proposition of the test list:
1. Nero 1.5.1
2. Apple Itunes or QT(true VBR). To be discussed.
3. CT or Divx or FH . It's an option to do internal pre-test where participants will be some well known members of HA. Divx encoder has enough good VBR mode while CT only CBR.

Anchors:
High anchor. Do we really need it? I would propose Nero or Apple at 128 kbps. LAME -V4/3 is risky to get low scores for some hard samples.
Low anchor. LAME -V 7.

I would like to see Quicktime's new true VBR encoder on the test, even if it's a Mac OS X exlusive. Also having FAAC on the test as a low anchor would be interesting.

Low anchor should be perfectly inferior to all competitors. If FAAC has such quality then it can be included as low anchor.

I would follow suggestion of Guru as we also must take into account that AAC actually enough good at ~100 and it will be much work actually to ABX . Only 3 competitors.  Not more.
Title: Public Listening Test [2010]
Post by: guruboolez on 2009-12-28 08:06:09
There is no reason to compare two Nero versions with other encoders.
But many people use Nero encoder intensively here and want to know if there is any improvement (more specifically in quality area).
You also have compared two versions of Nero partially in your last test (http://www.hydrogenaudio.org/forums/index.php?showtopic=58724&st=0&start=0). I've compared two versions of Nero too in my previous test (http://www.hydrogenaudio.org/forums/index.php?showtopic=66949&hl=)
But there are many encoders to test and it will be reasonable option to switch to last Nero encoder.
I will make a poll to see what people prefer.

In the test you've linked two different implementations of Nero AAC were tested in a qualification pool - not in the final one. And if I did this, it was for a good reason: at this moment a very quick experience made me very suspicious about the output quality of the latest release of this encoder. And the test's results said my feelings were right (at least for me).

Currently nobody complaints about last Nero AAC release and nobody said the previous one was obviously better, so I really don't see the point of making a competition between the last year encoder and the 2009 one. Some people are probably interested by this kind of a comparison but they can easily do it themselves ; and if they find something weird then we might reconsider our initial choice. Myself I'm very interested between 1.0.7.0 (which really impressed me 2 years ago) and the last 1.5.1.0 but I don't see the point of making such comparison in a public listening test with direct reference to iTunes AAC, Fhg AAC and Ct AAC. Such comparison would immediately imply that we don't trust Nero AAC development cycle that much ; that we can safely use the last iTunes AAC without risk but not the last Nero AAC.


Quote
Possible low anchor:
In my opinion low anchor shouldn't be that bad.
LAME ~V7  (~100 kbps) or ABR 100.


It looks too strong for a low anchor. I'm also against dramatically poor low anchor but using one of the most advanced MP3 encoder at the same bitrate we'll test main contenders is a kind of a risk. But the comparison would be interesting I confess... For the sake of the test I'd rather lower the bitrate or use a very old (ISO AAC maybe?) encoder at 100 kbps. Or maybe half the bitrate with HE-AAC(v2)?
Title: Public Listening Test [2010]
Post by: Polar on 2009-12-28 08:20:03
High anchor. Do we really need it?
Imho, only to avoid people from rating most (all?) competitors an unoriginal 4.5 score.  As said, that may almost only be achieved by contrasting them to the original, lossless samples.  But on the other hand, of course, that would make the test even more difficult to take.

Fwiw, as I promised last year, I am willing to host the test samples once more.  Plenty of bandwidth available.
Title: Public Listening Test [2010]
Post by: roozhou on 2009-12-28 10:32:17
Why not add ffmpeg aac encoder to this test?
Title: Public Listening Test [2010]
Post by: C.R.Helmrich on 2009-12-28 11:29:13
Imho, only to avoid people from rating most (all?) competitors an unoriginal 4.5 score.  As said, that may almost only be achieved by contrasting them to the original, lossless samples.  But on the other hand, of course, that would make the test even more difficult to take.

Which is precisely why we need the hidden reference as a high anchor! If a listener assigns a grade of, say, 3.0 to the hidden lossless sample, we know that the listener is unable to identify the stimulus which must sound identical to the known original (and instead hears differences which are not there), so that listener must be post-screened, i.e. his/her results removed from statistical analysis.

At Fraunhofer, we use MUSHRA tests (http://en.wikipedia.org/wiki/MUSHRA) and post-screen listeners who assigned a grade lower than 90 (out of 100) to any hidden reference. If we are going to do an ABX test here, I propose the following:

- Low anchor: MP3 at 96 kbps CBR as I expect it to give slightly worse results than VBR.
- High anchor: Lossless original. Edit: no further high anchors to minimize listening time.
- Post-screening rules: Remove all listeners from analysis who
  a) graded the high anchor lower than 4.5,
  b) graded the low anchor higher than the high anchor.

What do you think?

Chris
Title: Public Listening Test [2010]
Post by: guruboolez on 2009-12-28 12:39:04
I like it (with some reserves for low anchor).
Title: Public Listening Test [2010]
Post by: kurtnoise on 2009-12-28 12:47:59
Why not add ffmpeg aac encoder to this test?

because it is not finished/tuned/tweaked...
Title: Public Listening Test [2010]
Post by: IgorC on 2009-12-29 02:25:50
It looks too strong for a low anchor. I'm also against dramatically poor low anchor but using one of the most advanced MP3 encoder at the same bitrate we'll test main contenders is a kind of a risk. But the comparison would be interesting I confess... For the sake of the test I'd rather lower the bitrate or use a very old (ISO AAC maybe?) encoder at 100 kbps. Or maybe half the bitrate with HE-AAC(v2)?

Last time I tried HE-AAC 48k was less or more comparable to MP3 96k.
iTunes or Nero LC-AAC 48-56k CBR could be optimal low anchor. Near 2x lower bitrate.

- High anchor: Lossless original. Edit: no further high anchors to minimize listening time.

Speaking of lossless original as high anchor did you mean:
1. There won't be high anchor at all
or
2. There will be supposedly lossy file but will be actually lossless reference.

- Post-screening rules: Remove all listeners from analysis who
  a) graded the high anchor lower than 4.5,
  b) graded the low anchor higher than the high anchor.

What do you think?
Chris

How about more strict rules?:

Remove all listeners from analysis who
  a) graded the high anchor lower than 4.8-4.9 or even 5 in case if high anchor will be lossless.
  b) graded the low anchor higher than any competitor. Low anchor by its definition will be perfectly inferior to any of competitors.


Should we test true or constricted VBR of Apple encoder?

Why not add ffmpeg aac encoder to this test?

I hadn't time to test it yet.  What stage of development it at? All encoders to test are actually stable release. Partial exclusion could be Divx AAC encoder (beta stage). But I will relate it more with rebranding Mainconcept->Divx as it's based on stable Mainconcept's code.
Title: Public Listening Test [2010]
Post by: IgorC on 2009-12-29 02:39:54
Fwiw, as I promised last year, I am willing to host the test samples once more.  Plenty of bandwidth available.

Thank you very much. We will need it.
Title: Public Listening Test [2010]
Post by: kornchild2002 on 2009-12-29 04:25:18
I would like to see Quicktime's new true VBR encoder on the test, even if it's a Mac OS X exlusive. Also having FAAC on the test as a low anchor would be interesting.


I am on the fence about this.  Part of me would like to see a test using Quicktime's true VBR encoder while another part of me would rather see just iTunes AAC (ie QuickTime VBR_constrained) since it can be accessed by everyone.  The procedure of Quicktime true VBR AAC encoding on Windows is ridiculous and I am surprised someone hasn't come out with a solution as there are plenty of other programs that offer true VBR encoding under Mac OS X.  I am a Windows user so I would want to just see iTunes AAC thrown in the mix with the latest release from Nero.  Part of me is still curious regarding Quicktime's true VBR performance so I wouldn't mind seeing it in the test either.  Ideally I would like to see VBR_constrained and true VBR while dumping a few other encoders that can't be easily downloaded (ie fhg).
Title: Public Listening Test [2010]
Post by: IgorC on 2009-12-29 21:12:28
I propose to include only iTunes constrained VBR mode as ~90% of OSs are Windows.
Let's keep practical approach.
Title: Public Listening Test [2010]
Post by: antman on 2009-12-29 22:10:05
I still say quicktime true vbr.  I think if the results are tempting enough, it'll light a fire under somebody to create a more reasonable way to encode on windows.  And if we don't, I think there will always be room for, "well, it wasn't quicktime's best setting in that test..."  And people will talk about what might have been.
Title: Public Listening Test [2010]
Post by: IgorC on 2009-12-30 00:08:10
All right, there is poll (http://www.hydrogenaudio.org/forums/index.php?showtopic=77367) (true vs constrained VBR)

I found that true vbr is enable only at 128 kbps in QT Windows version.
Does anybody note the same?

Ideally I would like to see VBR_constrained and true VBR while dumping a few other encoders that can't be easily downloaded (ie fhg).

How are constrained VBR (CVBR) and true VBR (TVBR) modes comparable in bitrate area?
For previous versions of iTunes:
CVBR ~100 kbps
TVBR ~95 kbps
Those 5kbps can incluence on final results.

It will be good if we organize group of members to avoid concetration of taking decesions just by one person (especially for chosing competitors and samples).
Well known members are welcome like Guru, Sebastian, /mnt or other.
Title: Public Listening Test [2010]
Post by: kornchild2002 on 2009-12-30 01:02:33
I am not sure how the bitrates of true VBR and VBR constrained compare as I run Windows only.  I used to have an iMac but ended up selling it to someone who really needed it for college.  That was alright as I would usually just boot camp into Windows XP anyway.  I guess someone running Mac OS X could always run some bitrate tests for you to see how they compare.  I believe that iTunes, when using VBR constrained, will go with Quicktime's normal quality setting.  The iTunes Plus setting will use the high quality setting in Quicktime.  That would also have to be something taken into consideration.  Will the high quality mode be used for true VBR encoding, normal, low, or something else?

I saw the pole and I voted for VBR constrained simply because that is what I have easy access to.  I would hope that someone will release a Windows solution for Quicktime true VBR AAC encoding after a public listening test.  That being sad, people have been asking for this ever since true VBR AAC encoding was introduced in Quicktime and nothing has come of it on the Windows front.
Title: Public Listening Test [2010]
Post by: birdy25 on 2009-12-30 17:21:37
All right, there is poll (http://www.hydrogenaudio.org/forums/index.php?showtopic=77367) (true vs constrained VBR)

I found that true vbr is enable only at 128 kbps in QT Windows version.
Does anybody note the same?

Ideally I would like to see VBR_constrained and true VBR while dumping a few other encoders that can't be easily downloaded (ie fhg).

How are constrained VBR (CVBR) and true VBR (TVBR) modes comparable in bitrate area?
For previous versions of iTunes:
CVBR ~100 kbps
TVBR ~95 kbps
Those 5kbps can incluence on final results.

It will be good if we organize group of members to avoid concetration of taking decesions just by one person (especially for chosing competitors and samples).
Well known members are welcome like Guru, Sebastian, /mnt or other.


Well this 5kbps would definately make it better. This means more bits to the actual audio. Now these extra bits might not make a difference if the compression model changes inbetween CVBR and TVBR which may not be the case.
    I think the first thing would be to encode same song in two modes and the see the output file size.
Title: Public Listening Test [2010]
Post by: C.R.Helmrich on 2009-12-30 19:37:09
- High anchor: Lossless original. Edit: no further high anchors to minimize listening time.

Speaking of lossless original as high anchor did you mean:
1. There won't be high anchor at all
or
2. There will be supposedly lossy file but will be actually lossless reference.

Basically both. I meant that there should be a single hidden lossless reference, and no anchor a la MP3@200kbps because the codecs under test are close enough to transparency.
Quote
How about more strict rules?:

Remove all listeners from analysis who
  a) graded the high anchor lower than 4.8-4.9 or even 5 in case if high anchor will be lossless.
  b) graded the low anchor higher than any competitor. Low anchor by its definition will be perfectly inferior to any of competitors.

Both are possible but dangerous.

a) Certainly possible if you have only highly experienced listeners in the test, but if not, you might end up having to post-screen a lot of people, even such with consistent and useful grading (but showing a lack of concentration on one single test item out of, say, 15).
b) Yes, on average the low anchor will be inferior to all other codecs, but this does not guarantee that it will be inferior for every possible test item. The decision whether to apply this rule could be made after the test is finished.

Chris
Title: Public Listening Test [2010]
Post by: IgorC on 2009-12-30 21:37:08
After reading Roberto's (fast ) manual about listening test (section Dealing with ranked refernces)  http://www.rarewares.org/rja/ListeningTest.pdf (http://www.rarewares.org/rja/ListeningTest.pdf)
I see that your a) point is fair.
- Post-screening rules: Remove all listeners from analysis who
  a) graded the high anchor lower than 4.5


Low anchor (LC-AAC at 48 kbps) was enough inferior to all competitors on all samples in previous tests  that's why I think it will be logical remove all listeners from analysis who rated low anchor higher than any of competitors. I will ask Sebastian as he has dealt personally with results.
Title: Public Listening Test [2010]
Post by: IgorC on 2009-12-30 22:17:04
All right, there is poll (http://www.hydrogenaudio.org/forums/index.php?showtopic=77367) (true vs constrained VBR)

I found that true vbr is enable only at 128 kbps in QT Windows version.
Does anybody note the same?

Ideally I would like to see VBR_constrained and true VBR while dumping a few other encoders that can't be easily downloaded (ie fhg).

How are constrained VBR (CVBR) and true VBR (TVBR) modes comparable in bitrate area?
For previous versions of iTunes:
CVBR ~100 kbps
TVBR ~95 kbps
Those 5kbps can incluence on final results.

It will be good if we organize group of members to avoid concetration of taking decesions just by one person (especially for chosing competitors and samples).
Well known members are welcome like Guru, Sebastian, /mnt or other.


Well this 5kbps would definately make it better. This means more bits to the actual audio. Now these extra bits might not make a difference if the compression model changes inbetween CVBR and TVBR which may not be the case.
    I think the first thing would be to encode same song in two modes and the see the output file size.

Those numbers are based on something like hundred files.
Title: Public Listening Test [2010]
Post by: Porcus on 2010-01-03 13:35:25
- Post-screening rules: Remove all listeners from analysis who
  a) graded the high anchor lower than 4.5,
  b) graded the low anchor higher than the high anchor.


Wouldn't this introduce biases? I mean, this listener is "wrong", but among listeners who are not able to tell the differences at all, this post-screening would reveal only those who by chance grade low anchor too high, not those who by chance grade high better than low.

I am not a statistician -- and certainly not an applied one -- so it might very well be that this is a fair compromise to do to eliminate at least some randomguessers and also those who are able to tell differences but actually prefer the compression artifacts.


By the way, has one considered BitTorrent for distribution?
Title: Public Listening Test [2010]
Post by: jido on 2010-01-03 15:25:42
This is Hydrogenaudio, why not using LossyWAV for the high anchor? Let's get the word out!

With regards to Quicktime true VBR vs constrained VBR, the discussion should wait until the bitrate discussion is in advanced enough stage. The poll may suggest a setting that cannot match the desired bitrate.
Title: Public Listening Test [2010]
Post by: C.R.Helmrich on 2010-01-03 16:34:23
I am not a statistician -- and certainly not an applied one -- so it might very well be that this is a fair compromise to do to eliminate at least some randomguessers and also those who are able to tell differences but actually prefer the compression artifacts.

Yes, that's the basic idea. If a listener grades consistent with the majority of non-post-screened listeners, there is no way to tell from the results whether (s)he guessed or actually heard a difference. So you have to include that listener.

Quote from: jido link=msg=0 date=
This is Hydrogenaudio, why not using LossyWAV for the high anchor? Let's get the word out!

Since people have been almost flooding this thread with proposals for encoders to be tested in a single test, it's time for me to give my 2 cents.

The question is what we want. Of course you could put all codecs of interest (LossyWav, iTunes CVBR and TVBR, nero 1.3.3 and 1.5.3, CT/Winamp, LAME, WMA, etc.) into one test. But trust me, if you do that, you will get mostly inconclusive results, i.e. waste a lot of listening effort, due to listener overload, as I already explained.

LossyWav's objective is to be transparent. AAC at 96 kbps usually is not transparent, so its objective is to be near-transparent, i.e. "as good as possible". If you want to check whether LossyWav is transparent, do a separate ABX test against the unprocessed original (or maybe an ABX-HR test including AAC at 256 kbps or so, if you want.) Then people can focus on whether the codecs under test really are transparent, without being distracted by at the same time having to evaluate the quality of lower-bit-rate codecs.

If you want to check whether there's a statistically significant improvement of iTunes TVBR over CVBR, propose or conduct a separate public ABX-HR or MUSHRA test for those two encoders. Then people can focus on sonic differences (if any) between those two encoders.

The same applies to nero 1.5.3 vs. 1.3.3.

Then, once you finished those last two tests, you can take the "winners" of those tests and conduct the test which we are promoting (under the title "AAC test") in this thread. Yes, it's a lot of work, but it's the only way to get meaningful results.

If you don't want to do those last two tests, just choose one encoder from each test based on certain non-quality considerations (e.g. nero 1.5.3 because it's a newer release, iTunes CVBR because it's also available for Windows). This should be fine since most likely, there are only minor sonic differences at 96 kbps between nero 1.3.3 and 1.5.3 and between iTunes CVBR and TVBR. You can always do said tests later, of course also with the exact same test material.

Chris
Title: Public Listening Test [2010]
Post by: Sebastian Mares on 2010-01-03 21:05:59
Regarding the results, I would personally discard all results where the low-anchor is not rated.
Title: Public Listening Test [2010]
Post by: IgorC on 2010-01-03 21:09:53
Regarding the results, I would personally discard all results where the low-anchor is not rated.

And if low anchor is rated higher than any of competitors or high anchor?
Title: Public Listening Test [2010]
Post by: Sebastian Mares on 2010-01-03 21:14:57
Regarding the results, I would personally discard all results where the low-anchor is not rated.

And if low anchor is rated higher than any of competitors or high anchor?


Post processing actually comes down to how wisely the low anchor was selected. It could be possible that for certain problem samples that target a special encoder, the low anchor sounds better than a contender.

Regarding TVBR vs. CVBR for Apple, this also comes down to the goal of your test. If you want to test encoders in general, a poll is fine, if you want to test only encoders that are easily accessible to users, CVBR is probably best because it's available in iTunes directly and Windows has a higher market share than OS X.
Title: Public Listening Test [2010]
Post by: IgorC on 2010-01-03 21:43:10
Post processing actually comes down to how wisely the low anchor was selected. It could be possible that for certain problem samples that target a special encoder, the low anchor sounds better than a contender.

After some testing and reading of previous results I think it will be more optimal to adapt less restrictive rules:
Quote
Remove all listeners from analysis who
a) graded the high anchor lower than 4.5,
b) graded the low anchor higher than the high anchor.
c) didn't grade the low anchor.


As many people agree that there won't be high anchor then:
Quote
Remove all listeners from analysis who
a) graded the high anchor lower than 4.5,
b) graded the low anchor higher than all competitors.
c) didn't grade the low anchor.


Regarding TVBR vs. CVBR for Apple, this also comes down to the goal of your test. If you want to test encoders in general, a poll is fine, if you want to test only encoders that are easily accessible to users, CVBR is probably best because it's available in iTunes directly and Windows has a higher market share than OS X.

As poll (http://www.hydrogenaudio.org/forums/index.php?showtopic=77367) indicates there are more people who are interested in TVBR.
However I find that TVBR isn't enable at 96 kbps even in QT Windows version. If somebody can confirm that then the inclusion of TVBR is very questionable.
Title: Public Listening Test [2010]
Post by: IgorC on 2010-01-04 04:56:37
As high anchor will be dropped then we can add one more competitor.

4 competitors + 1 low anchor

Proposal list:
1. Nero 1.5.1
2. Apple iTunes CVBR. (TVBR isn't enable at 96 kbps in QT Windows version)
3/4. Two winners from internal pre-test between Divx, Fraunhofer (FH) and CT.

Low anchor:
iTunes LC-AAC CBR 48 kbps.

VBR vs CBR:
Nero, Apple and Divx have (good) VBR mode while CT is CBR only.
I don't mind about CT but it's not convinient to include CBR encoders basing on  previous experience (http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=57322&view=findpost&p=514958). I don't want to create a wheel. Untill now we started to discuss already discussed items and realized that final desicion is the same as from previous experience. Finally my propose is VBR encoders only.

C.R.Helmrich, will new Fraunhofer encoder  have VBR mode?

Settings
Best settings for final quality. No negotiation for speed.
Title: Public Listening Test [2010]
Post by: Sebastian Mares on 2010-01-04 10:42:21
Are you sure 48 kbps LC is not too exaggerated? Do you expect any contender to be worse or as bad as 64 kbps?
Title: Public Listening Test [2010]
Post by: C.R.Helmrich on 2010-01-04 13:06:30
IgorC: Yes, Fraunhofer's AAC encoder supports VBR coding. We don't distinguish between TVBR and CVBR, though, so that VBR mode is most likely a CVBR mode.

Sebastian: Maybe we should think about using a generic low anchor, as done in MUSHRA tests. How about a 7-kHz lowpass filtered version of the reference? Sonically, that should be quite similar to 48-kbps AAC LC. And we wouldn't have to worry about which encoder to choose for generating the anchor.

As mentioned in the AES Journal paper referenced here (http://www.hydrogenaudio.org/forums/index.php?showtopic=59542), the goal is to span the entire range of the grading scale with the codecs and anchors, in order to minimize bias, i.e. reduce the width of the confidence intervals. A 7-kHz anchor (and 48-kbps LC) should be about mid-way between "very bad quality" and the quality of the 96-kbps LC encoders. To define the lower end, we could throw in 8-kHz 8-bit a-Law PCM of the reference as "64-kbps phone quality low anchor". That will take very little listening effort to identify, so it doesn't make the test more difficult, but gives us the advantage of stabilizing the test results.

Chris
Title: Public Listening Test [2010]
Post by: devnul on 2010-01-04 14:40:19
Wasn't the primary goal to test the latest nero (many peeps have been waiting long for its release) vs. apples TrueVBR (which was hyped up, not negatively meant, at its release)? I wouldn't even know where to get those competing aac encoders...

The scientific method demands an analysis of the single most separable part.

Maybe the public listening test needs more than one instance...

Well, I have to admit, though: I won't be a tester anyways, prly, and do not practice blindtests, but I follow hydrogenaudios recommendations.
Title: Public Listening Test [2010]
Post by: kornchild2002 on 2010-01-04 17:10:55
Though many people want to see how QuickTimes true vbr AAC encoder holds up, it isn't readily available.  I have always thought that the underlying purpose behind public listening tests is to test encoders that were readily available.  Apple has made TVBR AAC encoding on Windows so awkward that most people don't do it.  The tags are lost, sometimes it doesn't work out, and you have to go through the process twice (once for encoding to a MOV file and then taking the AAC audio out of the MOV container and putting it in an mpeg-4 container).  It takes about 5 minutes for the process to encode one file.  Things are a lot easier on Mac OS X.  So I feel that any QuickTime TVBR AAC results will benefit Mac OS X users only since most Windows users aren't going to go out of their ways just to encode a few TVBR tracks.

Sure, we can start including obscure encoders and settings but what would be the point if no one would use them regardless of how well the encoders perform?  Additionally, at this bitrate, it looks as if QuickTime can't encode a TVBR AAC file.  I can't confirm the results as I have access to QuickTime Pro only at work.  Lastly, given the described performance of the FhG AAC encoder, it sounds like CVBR AAC might be best for testing QuickTime/iTunes.
Title: Public Listening Test [2010]
Post by: nao on 2010-01-05 16:14:20
For Windows users, I made a tiny tool to access the QuickTime AAC encoder from the command-line.
qtaacenc (http://tmkk.hp.infoseek.co.jp/qtaacenc/)
Title: Public Listening Test [2010]
Post by: kornchild2002 on 2010-01-05 16:53:37
Performance isn't all that great with with foobar2000 v0.9.6.9 and Windows 7.  It took about 4 minutes to encode a 5 minute song using foobar2000 at -q100 (fb2k was reporting an encoding speed of 15x but that would drop to 0x after about 25 seconds when the progress bar was filled).  Your exe file would consume about 35-45% of my processor (a simple 1.66GHz Atom but Lame.exe and Nero.exe consume about 5% or less) and foobar2000 jumped to about 115MB of RAM.  The output file also throws dBpowerAMP for a loop.  I encoded the test file at the 100 level and dBpowerAMP reports the bitrate as being 0kbps.  iTunes sees it as being 224kbps though.  Are things smoother on Windows XP or maybe when using the latest beta of fb2k?

This seems like a viable option except for the near-1X encoding speed.  It would take me over 20 straight days of encoding my lossless archive using this tool.  So I think you are onto something especially since Windows users generally get the shaft when it comes to true VBR encoding with QuickTime AAC.
Title: Public Listening Test [2010]
Post by: nao on 2010-01-05 17:38:12
As far as I've tested with fb2k 0.9.6.9 and winXP, the conversion of about 10 minutes FLAC files finishes in 30 sec. Maybe there is a problem with Vista or 7. (But I can't test now, though )

The behavior of the progress bar is expected. qtaacenc begins conversion after the progress bar is filled (due to the lack of the pipe encoding feature).
Title: Public Listening Test [2010]
Post by: Larson on 2010-01-05 17:42:20
i've tested your script nao and thank you so much i have been looking for a solution for windows for such a long time. Conversion time was normal,fast and it did its job at Q127. Having also a macbook and using XLD i've compared a few files and they are identical. I confirm the "bug" that these AACs result in 0 kbps of bitrate in dbpoweramp tab,instead in Windows 7 explorer it's fine.
Title: Public Listening Test [2010]
Post by: lvqcl on 2010-01-05 17:45:31
Quote
Lame.exe and Nero.exe consume about 5% or less

Maybe your HDD is too slow or fragmented?

Added: try to set "Tools > Converter > Thread count" to 1.
Title: Public Listening Test [2010]
Post by: kornchild2002 on 2010-01-05 18:40:32
I don't see a thread count option under the Converter area of the fb2k preferences.  My hard drive isn't too slow or fragmented either.  In fact, it is an SSD drive so fragmentation has little to no affect and the speeds are above that of a 7200 RPM drive.  I do appreciate the work of nao, I don't know if that was clear or not in my first post regarding their tool.  In fact, I would use it as a viable encoding option if things weren't so slow.

Edit: my source files are ALAC.  I thought that might be an issue so I converted the 5 minute file to FLAC (q 5) and tried converting with foobar2000 again.  Still the same results, nearly 4 minutes to convert that 5 minute file.  I don't have access to my XP machine so I won't be able to test until tonight.  As previously said, Windows 7 explorer is fine displaying the correct bitrate on my end (just not dBpowerAMP).

Edit 2: I went ahead and tried dBpowerAMP's CLI encoder (version R13.2) under Windows 7.  Same exact results whether I encode from an ALAC, FLAC, or PCM WAV file.  However, dBpowerAMP will display the encoding speed (unlike foobar2000) which hovers at around 1.3x.
Title: Public Listening Test [2010]
Post by: IgorC on 2010-01-05 19:39:15
For Windows users, I made a tiny tool to access the QuickTime AAC encoder from the command-line.
qtaacenc (http://tmkk.hp.infoseek.co.jp/qtaacenc/)

Can you specify also samplerate because all encoded files are 32 khz?

Great tool.
Title: Public Listening Test [2010]
Post by: jido on 2010-01-05 23:09:33
Quote from: jido link=msg=0 date=
This is Hydrogenaudio, why not using LossyWAV for the high anchor? Let's get the word out!

Since people have been almost flooding this thread with proposals for encoders to be tested in a single test, it's time for me to give my 2 cents.

The question is what we want. Of course you could put all codecs of interest (LossyWav, iTunes CVBR and TVBR, nero 1.3.3 and 1.5.3, CT/Winamp, LAME, WMA, etc.) into one test. But trust me, if you do that, you will get mostly inconclusive results, i.e. waste a lot of listening effort, due to listener overload, as I already explained.

LossyWav's objective is to be transparent. AAC at 96 kbps usually is not transparent, so its objective is to be near-transparent, i.e. "as good as possible". If you want to check whether LossyWav is transparent, do a separate ABX test against the unprocessed original (or maybe an ABX-HR test including AAC at 256 kbps or so, if you want.) Then people can focus on whether the codecs under test really are transparent, without being distracted by at the same time having to evaluate the quality of lower-bit-rate codecs.

The idea was not to test LossyWAV transparency, but to download smaller files. BTW that could also apply to the proposed 7 kHz-filtered low anchor.
Title: Public Listening Test [2010]
Post by: IgorC on 2010-01-05 23:52:13
A 7-kHz anchor (and 48-kbps LC) should be about mid-way between "very bad quality" and the quality of the 96-kbps LC encoders.

Are you sure 48 kbps LC is not too exaggerated? Do you expect any contender to be worse or as bad as 64 kbps?
LC-AAC 56 kbps or 8-khz anchor could be a good compromise. Both looks good for me.


As poll (http://www.hydrogenaudio.org/forums/index.php?showtopic=77534) indicates
there are two widely used AAC encoder on HA.
Coding Technologies to be dropped as it's only CBR and not so popular here.

It's clear now at least for me speaking about propsal list:
1. Nero
2. Apple
3. FH
4. Divx

The idea was not to test LossyWAV transparency, but to download smaller files. BTW that could also apply to the proposed 7 kHz-filtered low anchor.

I think it's not good idea from point of view of credibility of final test. There will be rumors like "not 100% lossless references"
I'm completely against of lossyWAV here.
Title: Public Listening Test [2010]
Post by: lvqcl on 2010-01-06 00:21:07
For Windows users, I made a tiny tool to access the QuickTime AAC encoder from the command-line.
qtaacenc (http://tmkk.hp.infoseek.co.jp/qtaacenc/)

Many thanks.
But, it adds samples at the beginning and decoded .m4a file has more samples than original file. Is it possible to make gapless files?
Title: Public Listening Test [2010]
Post by: Enig123 on 2010-01-06 01:18:19
I might have missed something, but where to get the Fraunhofer encoder?
Title: Public Listening Test [2010]
Post by: C.R.Helmrich on 2010-01-06 12:23:02
...
Agreed. But for completeness sake: Fraunhofer is currently finalizing quality tunings on their encoder which have been going on for about two years. Release is scheduled for end of January. If there is any interest, I can ask if it's possible to provide an evaluation encoder for this test.

Chris

Chris
Title: Public Listening Test [2010]
Post by: nao on 2010-01-06 17:37:45
Updated qtaacenc (http://tmkk.hp.infoseek.co.jp/qtaacenc/).

dBpowerAMP reports the bitrate as being 0kbps.

Fixed.

Can you specify also samplerate because all encoded files are 32 khz?

Now possible with the --samplerate option. "--samplerate keep" or "--samplerate 44100" meets your requirement. With the default setting, the optimum samplerate is automatically chosen according to the bitrate and quality.
Title: Public Listening Test [2010]
Post by: tedgo on 2010-01-06 18:01:21
@nao
Thanks for your qtaacenc .
I love it!
Maybe you better should've offered it in an own thread instead of hiding it in this "public listening test" thread.

Unfortunately the encoded files aren't gapless...
So i've to ask the same as Ivqcl: Is it possible to make gapless files?
Title: Public Listening Test [2010]
Post by: Sylph on 2010-01-06 21:27:22
I might have missed something, but where to get the Fraunhofer encoder?


Nowhere.

Only in MAGIX MP3 Maker and I'd have to check whether it has a VBR mode...
Title: Public Listening Test [2010]
Post by: kornchild2002 on 2010-01-06 21:33:30
Maybe you better should've offered it in an own thread instead of hiding it in this "public listening test" thread.


Maybe a mod would be nice enough to take the posts dealing with nao's command line tool out of the listening test thread and put them in a new one.  That way we can continue to help test qtaacenc for nao without filling up this thread with posts that aren't really about the listening test.
Title: Public Listening Test [2010]
Post by: rpp3po on 2010-01-08 15:57:52
I don't think that a <128kbit/s listening test makes any sense nowadays. Nobody is using those bitrates and results cannot be extrapolated into regions of general transparency.

One encoder might be a worse low bitrate performer, but still have much better resilience against killer samples than another, once the bitrate passes a certain barrier. And since we have reached a point were most popular encoders are totally transparent for most music at 128-192kbit/s, we should now focus onto what encoder knows less killer samples. A listening test in its traditional form is not suited to evaluate that. The limited, initial choice of test samples will determine the outcome.

A [a href='index.php?act=findpost&pid=678439']lengthier[/a] version of this...
Title: Public Listening Test [2010]
Post by: halb27 on 2010-01-08 17:23:33
... we should now focus onto what encoder knows less killer samples. A listening test in its traditional form is not suited to evaluate that. The limited, initial choice of test samples will determine the outcome. ...

Yes, the value of the outcome of such listening tests is often rated too high IMO. Especially at such a pretty low bitrate. Especially as many people just take the average score of each encoder to get a quality order for the encoders.
But: Despite this listening tests do have a value IMO, though a restricted one.

As for your approach: this would give valuable information, though restricted information as well because the universe of killer samples is not known.
Title: Public Listening Test [2010]
Post by: C.R.Helmrich on 2010-01-08 20:19:48
... we should now focus onto what encoder knows less killer samples. A listening test in its traditional form is not suited to evaluate that. The limited, initial choice of test samples will determine the outcome. ...

Yes, the value of the outcome of such listening tests is often rated too high IMO. Especially at such a pretty low bitrate. Especially as many people just take the average score of each encoder to get a quality order for the encoders.
But: Despite this listening tests do have a value IMO, though a restricted one.

As for your approach: this would give valuable information, though restricted information as well because the universe of killer samples is not known.

Well, an educated audio codec developer knows a lot about the universe of killer samples for his/her codec. I work as an AAC developer, and I've seen about two dozen high-bitrate killer samples until now. They all fall into very few specific categories of sounds, hence you could look for similar sounds (with high chances that they'll be killer samples as well), or even design your own. Today, one decade after version 1.0 of the encoder I work on, many initial killer samples are not killer samples any more. Of course, some still are, simply because AAC is not perfect, or because appropriate input analysis would make the encoder unacceptably slow. For example, you will never find an AAC encoder which at 128-160 kbps stereo will give you a transparent encoding of the emese sample.

So, rpp3po, I don't understand what you mean by "A listening test in its traditional form is not suited to evaluate that". I think it is. You just need to know the abovementioned categories.

Chris
Title: Public Listening Test [2010]
Post by: rpp3po on 2010-01-08 21:14:35
So, rpp3po, I don't understand what you mean by "A listening test in its traditional form is not suited to evaluate that". I think it is. You just need to know the abovementioned categories.


With "listening test in its traditional form" in this context I mean public listening tests as the one in discussion. A public listening test like this has usually a limited test of samples. And the more 'emesesk' samples the set includes the worse AAC will look in comparison to MP3 and vice versa. So the outcome is severely influenced by the choice of samples and thus of limited use as a tool to evaluate which is best overall. The encoder with the least problematic samples in the set wins. You can intentionally exclude problematic samples from the test, but what are you going to get then? Good results for all encoders for most music. But we already know that. My point is, isn't it time to move on with testing to compare which encoders have the lowest probability of failing over broad collections of music, even if that would be harder to accomplish than with traditional 8-16 sample shootouts?

For the individual developer, who knows its encoder inside out, a "traditional" listening test with select samples can be of great value, no question.
Title: Public Listening Test [2010]
Post by: C.R.Helmrich on 2010-01-08 22:03:19
And the more 'emesesk' samples the set includes the worse AAC will look in comparison to MP3 and vice versa.

Are you sure? Just because a certain encoder has problems with "emesesk" samples doesn't mean that every AAC encoder has problems with them. Btw, to me, MP3 (LAME) doesn't do too well with such samples, either.
Quote
You can intentionally exclude problematic samples from the test, but what are you going to get then? Good results for all encoders for most music. But we already know that. My point is, isn't it time to move on with testing to compare which encoders have the lowest probability of failing over broad collections of music, even if that would be harder to accomplish than with traditional 8-16 sample shootouts?

Being a developer, all I care about is problematic samples  Sure, I agree, "lowest probability of failing over broad collections of music" should be an encoder's ultimate goal. But how else would you test for that than via 8-16 sample shootouts? How about asking our expert listeners (guruboolez, /mnt, IgorC, sauvage78, etc.) for the most critical items they could find in their high-bitrate tests, and put all those items in one public test (bitrate to be defined)? Wouldn't that be a good starting point?

Chris
Title: Public Listening Test [2010]
Post by: MichaelW on 2010-01-08 22:43:28
A question from the peanut gallery.

Do different codecs have specifiably different types of killer samples?

My thought is that we know that a number of good codecs give excellent results for most music, but it might be possible, perhaps, to give specific recommendations matching codec to genre (codec A is good for harpsichord, for example, but falls over on distorted guitars, on which codec B does better).

IF this were possible, it might be more practically useful for listeners than another very tight general comparison of good codecs over "average" music.

Edit: punctuation
Title: Public Listening Test [2010]
Post by: IgorC on 2010-01-12 02:17:56
A question from the peanut gallery.

Do different codecs have specifiably different types of killer samples?

Yes, there are many of this kind of sampls.

My thought is that we know that a number of good codecs give excellent results for most music, but it might be possible, perhaps, to give specific recommendations matching codec to genre (codec A is good for harpsichord, for example, but falls over on distorted guitars, on which codec B does better).

IF this were possible, it might be more practically useful for listeners than another very tight general comparison of good codecs over "average" music.

Edit: punctuation

The problem of light or killer samples can be eliminated if randomizer will be used.
http://www.hydrogenaudio.org/forums/index....st&p=678087 (http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=77584&view=findpost&p=678087)


I'm fine with nao's Apple TVBR encoder.

Availability of encoders
I think any AAC encoder can be tested while it's available at least in one commercial product.
No problem for Apple,Nero and CT(Winamp).
Divx encoder is still beta but every person can download it after registration on Divx home page.

Chris, what about FH encoder?
At least demo version with limited time will be fine. This way we can check that bitrate and quality settings are the same for all samples.
Title: Public Listening Test [2010]
Post by: IgorC on 2010-01-12 02:29:28
The encoder with the least problematic samples in the set wins. You can intentionally exclude problematic samples from the test, but what are you going to get then? Good results for all encoders for most music. But we already know that. My point is, isn't it time to move on with testing to compare which encoders have the lowest probability of failing over broad collections of music, even if that would be harder to accomplish than with traditional 8-16 sample shootouts?

For the individual developer, who knows its encoder inside out, a "traditional" listening test with select samples can be of great value, no question.

It's just different approach. Good one actually.

Somebody would say that testing killer samples wouldn't be representative but the same way light samples aren't representative neither.

I'm in favor to rise bitrate to 128 kbps and include more difficult samples randomly (and don't include samples which are only difficult for one specific competitor). Emese-like samples are horrible at this bitrate.
Title: Public Listening Test [2010]
Post by: halb27 on 2010-01-12 07:44:02
[...I'm in favor to rise bitrate to 128 kbps and include more difficult samples randomly (and don't include samples which are only difficult for one specific competitor). Emese-like samples are horrible at this bitrate.

I second that. Will get a result which is more relevant for practical purposes. And in case it comes out that there are encoders (may be all of them) which are perfect or near-perfect except for say artificial music this  would be a fine result.
As for samples which are difficult for a specific encoders however it would be helpful to have a list of known such samples.
Title: Public Listening Test [2010]
Post by: /mnt on 2010-01-12 20:56:14
The encoder with the least problematic samples in the set wins. You can intentionally exclude problematic samples from the test, but what are you going to get then? Good results for all encoders for most music. But we already know that. My point is, isn't it time to move on with testing to compare which encoders have the lowest probability of failing over broad collections of music, even if that would be harder to accomplish than with traditional 8-16 sample shootouts?

For the individual developer, who knows its encoder inside out, a "traditional" listening test with select samples can be of great value, no question.

It's just different approach. Good one actually.

Somebody would say that testing killer samples wouldn't be representative but the same way light samples aren't representative neither.

I'm in favor to rise bitrate to 128 kbps and include more difficult samples randomly (and don't include samples which are only difficult for one specific competitor). Emese-like samples are horrible at this bitrate.


That was the problem with the recent Mp3 test. Since some of the killer samples on test, such as the Final Fantasy sample, seemed to be targeted against LAME 3.97.

I would recommend using a couple of Kraftwerk tracks such as The Robots (0:20 - 0:50 or 4:10 - 4:40) and Musique Non Stop (0:10 - 0:40). Since those tracks produce very obvious artifacts at 128 with every AAC encoder.
Title: Public Listening Test [2010]
Post by: IgorC on 2010-01-14 17:35:33
/mnt
Thank you for data.

As poll (http://www.hydrogenaudio.org/forums/index.php?showtopic=77809) indicates there is more interest in 128 kbps test.

Bad news most probably FH codec won't be available for public and because of this it won't get into the test.
As we moved to 128 kbps the question of high anchor is opened again.
Also CT encoder can be included into the test despite of its CBR but it's very popular (Winamp,etc.)

Proposal list  (3 competitors + 1 high anchor +1 low anchor) :
1. Apple TVBR
2. Nero
3. Winner of pre-test (CT vs Divx)

High anchor (?): LAME 3.98.2 -V2
Low anchor(?): LC-AAC 64 kbps.
Title: Public Listening Test [2010]
Post by: halb27 on 2010-01-14 20:10:33
I'd welcome to have winamp's CT AAC encoder included.
As a high anchor I guess Lame -V2 can be worse for some problem samples than a good AAC encoder @128 kbps. From previous discussion I thought lossless was to be the high anchor.
As a low anchor LC-AAC @64 kbps is fine IMO.
Title: Public Listening Test [2010]
Post by: /mnt on 2010-01-14 20:22:31
I would like see to DivX's AAC encoder on the test.

320kbps or V0 Mp3 would make a good high anchor if samples such as: eig, kraftwerk, Show Me Your Spine and castanets are used.
Title: Public Listening Test [2010]
Post by: IgorC on 2010-01-14 20:35:51
I'd welcome to have winamp's CT AAC encoder included.
As a high anchor I guess Lame -V2 can be worse for some problem samples than a good AAC encoder @128 kbps. From previous discussion I thought lossless was to be the high anchor.

Right. Taking into account that test will have a high amount of difficult samples even AAC at 128 kbps won't be rated too high as to be compared to high anchor.

The following rules will be enough to avoid the use of high anchor:
Code: [Select]
Remove all listeners from analysis who
a) graded the reference lower than 4.5,
b) graded the low anchor higher than all competitors.
c) didn't grade the low anchor.


Then the proposal list (without high anchor) is:
Nero, Apple, CT, Divx. Low anchor: LC-AAC 64 kbps.

Title: Public Listening Test [2010]
Post by: greynol on 2010-01-14 20:49:43
I thought I'd just pipe in and say I'm opposed to using Apple TVBR for the reasons already cited by others.  No disrespect to nao, but I have a hard time understanding how easily his encoder slipped right into the mix.

EDIT: Does TVBR require QuickTime Pro?  This would be another reason why I'd be opposed to TVBR.
Title: Public Listening Test [2010]
Post by: IgorC on 2010-01-14 21:04:31
I thought I'd just pipe in and say I'm opposed to using Apple TVBR for the reasons already cited by others.  No disrespect to nao, but I have a hard time understanding how easily his encoder slipped right into the mix.

Poll results (http://www.hydrogenaudio.org/forums/index.php?showtopic=77367)
Poll was made before nao's post.
Many people want to see TVBR even they aren't MAC users.
Title: Public Listening Test [2010]
Post by: greynol on 2010-01-14 21:13:27
Alright then; the masses have spoken.  Good luck with your test!
Title: Public Listening Test [2010]
Post by: kornchild2002 on 2010-01-14 22:27:03
Yeah, I don't like it either.  It takes 5 minutes (by the time track tag information is added) to encode one TVBR QuickTime AAC file under Windows using QuickTime Pro.  Aside from nao's utility, QuickTime Pro is required for TVBR AAC encoding on Windows.  I would have preferred CVBR simply because it is integrated into iTunes and is (in my opinion) easy to use.  The masses want something else though so I can live with that.  I still wish that it were included alongside QuickTime TVBR.
Title: Public Listening Test [2010]
Post by: C.R.Helmrich on 2010-01-14 22:47:16
I agree. Since I have yet to see a blind test comparing CVBR and TVBR for multiple items, done my multiple people (if there is one, please point me to it!), why not include both in this test? Moreover, right now the poll is only slightly in favor of TVBR.

Chris
Title: Public Listening Test [2010]
Post by: IgorC on 2010-01-14 23:33:45
One thing I'm afraid that the difference between TVBR and CVBR are far less than between different encoders. People will have hard time to try to find any difference on T/CVBR  instead of actually compare different encoders.

We can do poll and see but at least now I didn't find any sample where the difference between TVBR and CVBR was any pronounceable. If you find one let's know.



Title: Public Listening Test [2010]
Post by: rpp3po on 2010-01-15 00:42:20
The effect of TVBR is mainly space saving at higher bitrates. You can use overkill quality settings comparable to iTunes Plus without wasting as much disk space. I don't think that there would be much difference at 128kbit/s. Still it would be interesting to see this verified. As C.R. Helmrich noted, there isn't much good data, yet.
Title: Public Listening Test [2010]
Post by: IgorC on 2010-01-16 16:16:05
This Poll (http://www.hydrogenaudio.org/forums/index.php?showtopic=77932) will define T&CVBR's comparison.
Title: Public Listening Test [2010]
Post by: IgorC on 2010-01-23 02:28:58
As now Apple TVBR is available for Windows users shall we drop CVBR?

Encoder settings for 128 kbps.
Nero -q 0.xx
Divx -v 4
CT(media coder) LC, maybe shifted to ~130 kbps

There isn't much to choose.


Apple(qtaacenc) -tvbr 65/--cvbr 128
--high (default)  or --highest?
Title: Public Listening Test [2010]
Post by: halb27 on 2010-01-23 20:10:25
IMO we should use --highest.

Though not many people will be interested:
Now that we got qtaacenc I just did a small listening test with harp40_1 and trumpet_myPrince which were the most problematic samples of 'mine' when used with Nero.
--tvbr 65 and --cvbr 128 were both not transparent, and they also sounded pretty identical to me.
--abr 128 wasn't transparent as well of course, but the result sounded better to me than the VBR results.
I know many people think that only VBR is the way to go, true VBR if possible, but the variable bitrate method ABR can have its merits on its own when it's up to quality, not efficiency. It also uses high bitrate if necessary but obeys to another quality decision making process than does VBR.
More interesting: AFAIK there was never a listening test ABR vs. VBR, at least not in recent years with up-to-date encoders. There's always just the widespread beleive that VBR is the best way to go qualitywise. I don't want to say that's wrong, but it would be fine to have this beleive verified in a test.

My suggestion: let's do a pre-test QT ABR vs. CVBR and use the best.
(Of course that would bring the problem to also give the ABR way a chance for the other encoders who have this option in case ABR will be the winner, but a priori the situation that ABR should win against CVBR isn't what most people expect to happen. And if it should occur I think it's worth while thinking then about how to deal with the situation in the real test).
Title: Public Listening Test [2010]
Post by: greynol on 2010-01-23 20:41:03
As now Apple TVBR is available for Windows users shall we drop CVBR?

I say no because the vast majority of Windows users (and likely HA users as well) are going to still use iTunes instead of QuickTime.
Title: Public Listening Test [2010]
Post by: IgorC on 2010-01-23 22:06:31
halb,
I think we came back to specific options or versions of one competitor.
Then we should give an opportunity to Nero's ABR or another version as well.
We've reached to conclusion that only one version of Nero will be tested and untill now only VBR. (however I also find samples where Nero ABR perfoms better than VBR).

Then we should pre-test: Divx vs CT, Nero ABR(maybe even ABR+2pass), Nero VBR, Apple ABR, Apple CVBR and I don't want to mention Apple quality settings --normal vs --high. It's too much unless we can arrive to conclusion to discard some of those pre-tests.

I tend to propose to rely on developer suggestions as they know their codec better than any other one:

Nero VBR (the same as in previous public test)
I've sent an e-mail to Apple developer and he said that VBR should be the best option.

However personally now after your mentions  I'm interested to see ABR and VBR at least in my tests (and maybe in pre-test).
Title: Public Listening Test [2010]
Post by: halb27 on 2010-01-23 22:36:31
I understand your considerations, and part of them came to my mind as well when considering ABR.
Anyway glad to hear that you encountered experience like mine and want to do a personal test or even the pre-test also with ABR.
Title: Public Listening Test [2010]
Post by: IgorC on 2010-01-23 23:06:28
iTunes AAC encodes at fixed quality option which is identical to qtaacenc's --normall.
If --highest will be enabled in QTAacEnc (QTAE) then it won't pure CVBR/ABR vs TVBR comparison but TVBR+highest vs CVBR/ABR+normal. But it  shouldn't be necesary  pure CVBR vs TVBR. I’m fine with TVBR+highest vs CVBR/ABR+normal comparison.
Title: Public Listening Test [2010]
Post by: halb27 on 2010-01-23 23:22:46
I'm not quite sure I understand what you're suggesting.

Is it this?:
iTunes uses CVBR and ABR, and for this reason you want to use iTunes' fixed quality setting which corresponds to qtaacens's --normal.
As iTunes doesn't support TVBR you'd like to use --highest for TVBR.
Title: Public Listening Test [2010]
Post by: IgorC on 2010-01-24 00:22:42
Yes, that's correct.

Maybe there are people who are interested in  pure T/CVBR comparison  with --normal for both QTAE and iTunes.  As there is no any quality setting except default/fixed --normal in iTunes.

But I'm more interested in TVBR+highest vs CVBR/ABR+(fixed)normal.
Title: Public Listening Test [2010]
Post by: IgorC on 2010-01-25 04:30:30
This open discussion will have enough time but can't be endless. All items (codec, settings, bitrate, samples, etc) will have a deadline. The plans are to begin the test in early March.

Untill this moment the list of AAC encoders is almost established and its deadline is 1 week (untill 1st of February). 
Please, speak now.

List of AAC encoders to test is:
1.Nero 1.5.1
2.Apple TVBR
3.Apple CVBR or ABR
4. The winner of pretest (CT vs Divx)

Title: Public Listening Test [2010]
Post by: halb27 on 2010-01-25 07:59:29
As for 3. I'd like to see ABR in the test because CVBR results are expected to be more or less identical to those of TVBR. I guess optimum strategy however to keep everybody happy is to have a pretest ABR vs. CVBR.
I'm not so happy comparing TVBR --highest against CVBR/ABR --normal, but I hope this doesn't change things a lot.
Title: Public Listening Test [2010]
Post by: IgorC on 2010-01-27 02:13:32
I've tried a few samples for ABX test on ABR vs TVBR vs CVBR and have found  that ABR felt badly on Fatboy-like samples. The bitrate is extremely high for such killer samples and ABR is bitrate restricted on them.

I think we attribute too detailed attention just only to Apple encoder. It will be very hard to test all settings and modes for all encoders.  We can't just test settings for Apple encoder without give same possibility to Nero.

For some good reasons Nero and Apple devs both suggest VBR. I don't think it's coincidence.


Opinions?

Deadlines:
1. List of codecs (February 1)
2. Settings (February 8)
3. Samples (February 15)
...
Title: Public Listening Test [2010]
Post by: halb27 on 2010-01-27 08:46:02
OK, I see the arguments (and your findings), so let's forget about ABR.
Thanks anyway for having given it a chance.
Title: Public Listening Test [2010]
Post by: IgorC on 2010-01-27 15:01:48
The rules:
Code: [Select]
Remove all listeners from analysis who
1. graded the reference lower than 4.5
2. graded the low anchor higher than all competitors.
3. didn't grade the low anchor.
4. didn't grade any of competitors.

Chris has suggested to change 1st rule to "graded the reference lower than 5.0" here (http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=77809&view=findpost&p=681079)

I tend to disagree.
Imagine situation that there is codec A which has very noticeble artifacts on sample and ranked at pretty low score while codec B did a good job on lowpassing an old noisy record. Codec B could be ranked higher than reference lossless -> reference can be ranked lower than 5.0 as described in http://www.rarewares.org/rja/ListeningTest.pdf (http://www.rarewares.org/rja/ListeningTest.pdf)

The result could be:
Code: [Select]
Codec A - 3.0
Reference A -5.0

Codec B - 5.0
Reference B - 4.9


It's valid result for me.
Title: Public Listening Test [2010]
Post by: IgorC on 2010-01-27 15:12:59
Nero Bitrate settings:

-q 0.42 is comparable with Apple T&CVBR 128 kbps on classic music (http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=77932&view=findpost&p=682937)
while -q 0.41 is on rock/alternative music

-q 0.415 can be reasonable setting for this test.

It will be good if someone else can confirm that too.
Title: Public Listening Test [2010]
Post by: Alex B on 2010-01-27 15:41:59
Regarding the bitrates, I have run some tests using my standard "Various" and "Classical" file sets.

Sow far I have encoded the sets with:

- iTunes VBR @ 96 kbps and @ 128 kbps
- QT tvbr @ -64 (i.e. about 128 kbps), "best" quality  -- or whatever the actual string is, I don't have the settings in front of me now.

Am I understanding correctly, that the constrained VBR setting in QT is more likely to be tested than the "constrained" iTunes VBR? I'd prefer iTunes because then the test would tell if using QT is worthwhile. (edit: ... i.e. if using QT tvbr instead of iTunes is worthwhile)

I could add the latest Nero and try to find a matching setting. Apparently the test is going to be a "128" test, or is "96" still a possibilty?

In general, does anyone have an opinion of which method should be used for checking the AAC bitrates? Should we blindly trust the applications that just read the header data? For instance, foobar and Mr QuestionMan don't always agree. Actually, apparently Mr Q. can't read QT tvbr files at all, but that can be fixed by retagging the files.
Title: Public Listening Test [2010]
Post by: rpp3po on 2010-01-27 15:50:51
Am I understanding correctly, that the constrained VBR setting in QT is more likely to be tested than the "constrained" iTunes VBR? I'd prefer iTunes because then the test would tell if using QT is worthwhile.


QT, constrained VBR, at medium quality is identical to iTunes' VBR. The only exception is the iTunes plus preset, which is identical to QT, 256 kbit/s constrained VBR, max quality.

In general, does anyone have an opinion of which method should be used for checking the AAC bitrates? Should we blindly trust the applications that just read the header data?


If you want to verify, which application reports with the highest precision, extract a raw AAC stream from an MP4/M4A (e.g. with mp4box) and divide the number of bytes by number of seconds.

Even if it is more work, I'd vote for hand selected quality settings. Run each encoder in a FOR LOOP with a few increments from q 0.38 to q 0.45 and then choose the version closest to 128 kbit/s, instead of using the same preset for the whole test. Primary goal is to see how encoders compare at 128kbit/s. Knowing how good their VBR algorithms scale up for problematic content cannot be tested in a fixed bitrate comparison, anyway.
Title: Public Listening Test [2010]
Post by: IgorC on 2010-01-27 16:14:19
Alex B,

I've counted the votes and opinions of HA's experienced members regarding 96/128 kbps poll topic. The results are 50/50. 
So both are still fine.

CVBR vs TVBR settings to test:
identical to iTunes  CVBR = qtaacenc --cvbr --normal
TVBR = qtaacenc --tvbr --highest

I compare file sizes this way the results are independent from application (foobar, MrQ,etc...)
Title: Public Listening Test [2010]
Post by: IgorC on 2010-01-27 16:17:34
Even if it is more work, I'd vote for hand selected quality settings. Run each encoder in a FOR LOOP with a few increments from q 0.38 to q 0.45 and then choose the version closest to 128 kbit/s, instead of using the same preset for the whole test. Primary goal is to see how encoders compare at 128kbit/s. Knowing how good their VBR algorithms scale up for problematic content cannot be tested in a fixed bitrate comparison, anyway.

The test should have real life condition. Usually user won't use the different -q value per song. It's not real.
Title: Public Listening Test [2010]
Post by: Alex B on 2010-01-27 16:41:17
QT, constrained VBR, at medium quality is identical to iTunes' VBR. The only exception is the iTunes plus preset, which is identical to QT, 256 kbit/s constrained VBR, max quality.

IMHO, then the constrained vbr samples should be encoded with iTunes and labeled as such. It would be a bit stupid move to not include a well-known brand like iTunes. Of course a test is not a commercial product that needs marketing, but surely it would be good if test would gain more publicity.

Quote
If you want to verify, which application reports with the highest precision, extract a raw AAC stream from an MP4/M4A (e.g. with mp4box) and divide the number of bytes by number of seconds.

I could do that. My sets contain only 25 + 25 carefully selected files so that would not be too much work.

Quote
Even if it is more work, I'd vote for hand selected quality settings. Run each encoder in a FOR LOOP with a few increments from q 0.38 to q 0.45 and then choose the version closest to 128 kbit/s, instead of using the same preset for the whole test.

I hope you don't mean that each sample & each encoder should be adjusted individually to produce 128 kbps or as close as possible.

Quote
Primary goal is to see how encoders compare at 128kbit/s. Knowing how good their VBR algorithms scale up for problematic content cannot be tested in a fixed bitrate comparison, anyway.

The primary goal is probably going to be
"to see how encoders compare at a setting that produces an average bitrate of 131 kbps or as close as possible when a big varied audio library is encoded."

The exact target bitrate is depended on those encoders that cannot be adjusted precisely. Their average bitrate should be calculated and then the  encoders that can be freely set should be tested in order to find the matching setting.
Title: Public Listening Test [2010]
Post by: IgorC on 2010-01-27 17:00:23
Bitrate limit, How much VBR is allowed? (http://www.hydrogenaudio.org/forums/index.php?showtopic=78033&hl=)

The discussion has indicated that everyboy agree to test VBR without any limits.
Title: Public Listening Test [2010]
Post by: Polar on 2010-01-27 19:32:27
I've counted the votes and opinions of HA's experienced members regarding 96/128 kbps poll topic. The results are 50/50. 
So both are still fine.
Well, if you don't want to end up frustrated because the results of the test's long and hard work are a 4.5 statistical tie, especially if you apply strict post-processing, go for 96k.  The latter hasn't been publicly tested recently, contrary to 128k.

Moreover, bear in mind that with rigorous post-processing, however noble the idea is, you'll have even fewer results, widening the statistical error margins even further, making the all-tied end result a self-fulfilling prophecy.  Sebastian's most recent public listening test, 128k MP3, conducted little over a year ago, averaged just 27 test results per sample (http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=66564&view=findpost&p=601028) ...  Would you really want to put in so much energy in testing and post-processing for only 10 to 20 results?

Another reminder quote from Sebastian:
Well, this was definitely the last test at 128 kbps, that is for sure.
Title: Public Listening Test [2010]
Post by: C.R.Helmrich on 2010-01-27 20:18:28
... , go for 96k.  The latter hasn't been publicly tested recently, contrary to 128k.

When was the last time an AAC-only test with only critical samples was done at 128 kb?

Quote
Moreover, bear in mind that with rigorous post-processing, however noble the idea is, you'll have even fewer results, widening the statistical error margins even further, ...

Incorrect. By rigorous post-screening, you reduce the statistical error margins because the results become more consistent. Of course, you shouldn't end up with only a handful of accepted listeners.

Quote
Would you really want to put in so much energy in testing and post-processing for only 10 to 20 results?

Yes, anything higher than 10 after post-screening is a very useful number.

Quote
Another reminder quote from Sebastian:
Well, this was definitely the last test at 128 kbps, that is for sure.


I would say the same if I would have to test with non-critical samples.

Quote from: IgorC link=msg=0 date=
The discussion has indicated that everyboy agree to test VBR without any limits.

Careful! That discussion had nothing to do with the choice of encoders for this test. I was just asking if people mind excessive bitrates in VBR encoders.

Quote
Imagine situation that there is codec A which has very noticeble artifacts on sample and ranked at pretty low score while codec B did a good job on lowpassing an old noisy record. Codec B could be ranked higher than reference lossless -> reference can be ranked lower than 5.0 as described in http://www.rarewares.org/rja/ListeningTest.pdf (http://www.rarewares.org/rja/ListeningTest.pdf)

This would not be a result which we are interest in! For an encoder, sounding better than the original is not a goal. If it sounds better than the original, it's not transparent and hence, must be graded lower than 5.0.

Quote from: Alex B link=msg=0 date=
It would be a bit stupid move to not include a well-known brand like iTunes. Of course a test is not a commercial product that needs marketing, but surely it would be good if test would gain more publicity.

Agreed. Of QT and iTunes, the latter is arguably the more widely used software for encoding (after all, AAC is the default codec for CD ripping in iTunes). Why not just use iTunes CVBR in the test and leave the ABR-CVBR-TVBR discussion for a separate test. The corresponding poll shows a tie.

Quote
The primary goal is probably going to be
"to see how encoders compare at a setting that produces an average bitrate of 131 kbps or as close as possible when a big varied audio library is encoded."

The exact target bitrate is depended on those encoders that cannot be adjusted precisely. Their average bitrate should be calculated and then the encoders that can be freely set should be tested in order to find the matching setting.

Agreed. I think it's time to decide on the "varied audio library" for calibrating the VBR coders. That would also allow me to tune the average bitrate of Fraunhofer's AAC VBR encoder to match those of iTunes and DivX, for example (in case one day Fraunhofer's encoder will be compared against those encoders).

A spontaneous proposal from my side is Pink Floyd's 2-CD best-of "Echoes" (http://www.amazon.com/Echoes-Best-Pink-Floyd/dp/B00005QDW5/) because musically, it's very diverse (loud and quiet, tonal and noisy stuff), and it has no silence between tracks (it's all one mix).

Chris
Title: Public Listening Test [2010]
Post by: muaddib on 2010-01-28 12:30:18
The rules:
Code: [Select]
Remove all listeners from analysis who
1. graded the reference lower than 4.5
2. graded the low anchor higher than all competitors.
3. didn't grade the low anchor.
4. didn't grade any of competitors.

Would you remove results for all sample from that user?

Chris has suggested to change 1st rule to "graded the reference lower than 5.0" here (http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=77809&view=findpost&p=681079)
I tend to disagree.
Imagine situation that there is codec A which has very noticeble artifacts on sample and ranked at pretty low score while codec B did a good job on lowpassing an old noisy record. Codec B could be ranked higher than reference lossless -> reference can be ranked lower than 5.0 as described in http://www.rarewares.org/rja/ListeningTest.pdf (http://www.rarewares.org/rja/ListeningTest.pdf)

The result could be:
Code: [Select]
Codec A - 3.0
Reference A -5.0

Codec B - 5.0
Reference B - 4.9


It's valid result for me.

Also imagine people guessing for one encoder that is transparent and you discard only those guesses where they were unlucky. All those lucky guesses stay and the encoder gets lower grade than what it deserves. I still hold the opinion that an ABX test for each codec on each sample should be mandatory when one wants to give a grade.
Title: Public Listening Test [2010]
Post by: IgorC on 2010-01-29 03:05:25
muaddib,

It's important observation indeed.
I think we are all humans and have right to make a mistake sometimes but not always. I would accept valid results from listener with previous invalid ones.
I propose the next solution.
a) If one listener will submit invalid result then he will be informed and will have one more and unique posibility  to submit result for one particular sample (but now with ABX log).
b) Now if the listener will submit 3 or more invalid results then only ABX results will be accepted from him/her.


About lucky guessing.
Those lucky guesses will be canceled mutually between them because all encoders will have the same probability to be lucky guessed. Obviously the average scores will be a bit lower but it can't affect one particular codec without affected the rest of competitors... because it's exactly how lucky guess works -> no privileges for one particular competitor.
Now, why do I say no to ABX but to ABC/HR in this particular test ?
1. From my previous blind tet experience (and I want to listen the opinions of other listeners here) I can say ABX is actually exhaustive activity. The listener (at least me) will rather lose a concentration after ABX all competitors against lossless and won't be able to grade competitors between them.
2. We will test a difficult samples so it should be more easy to spot the artifacts.
I would rather accept the possibility of lucky guess but also higher possibilities to get useful results.
Title: Public Listening Test [2010]
Post by: hellokeith on 2010-01-29 04:02:13
Bitrate limit, How much VBR is allowed? (http://www.hydrogenaudio.org/forums/index.php?showtopic=78033&hl=)

The discussion has indicated that everyboy agree to test VBR without any limits.


I was under the impression that it is common for portable audio players to have a vbr limit? Is this not a concern? Is unrestrained VBR truly unrestrained or just very high?
Title: Public Listening Test [2010]
Post by: IgorC on 2010-01-29 05:29:03
Chris,

I think TVBR vs CVBR should take the place. People want to see it (see poll). The poll is closed and I think we shouldn't discuss it anymore. It doesn't mater if it's tied. There is simply more people want to see it.  It's very long and popular discussion about of efficiency of T/CVBR here on HA.
Title: Public Listening Test [2010]
Post by: rpp3po on 2010-01-29 11:12:00
I was under the impression that it is common for portable audio players to have a vbr limit?


That's all cleanly defined in the MPEG specification, nothing to worry about.

Is this not a concern? Is unrestrained VBR truly unrestrained or just very high?


Unconstrained VBR the context of this discussion means unconstrained variability, not unconstrained max bitrate.
Title: Public Listening Test [2010]
Post by: muaddib on 2010-01-29 17:15:12
1. From my previous blind tet experience (and I want to listen the opinions of other listeners here) I can say ABX is actually exhaustive activity. The listener (at least me) will rather lose a concentration after ABX all competitors against lossless and won't be able to grade competitors between them.
2. We will test a difficult samples so it should be more easy to spot the artifacts.
I would rather accept the possibility of lucky guess but also higher possibilities to get useful results.

ABX is an exhausting activity if it is hard for someone to spot the artifact, that is if a sample is difficult for him. Difficulty is determined by an observer.
ABX can be used to help users decide what grade to give.
For example, a user should give a grade bellow 4 only if one listening of original is enough for doing 5/5 ABX.
Grade between 4 and 4.5 should be given only if a user doesn't make a mistake and if it is not hard for him to get 5/5, but is allowed to listen the original before each try.
Grade bellow 3 should be given only if there is no need to listen to the original for doing 5/5 ABX.
And so on...
This way the SDG becomes less vague.
I am not here proposing a rule to refuse all results without ABX, but propose a description of a listening test procedure that would make people give valid results.
Title: Public Listening Test [2010]
Post by: C.R.Helmrich on 2010-01-30 10:47:48
The rules:
Code: [Select]
Remove all listeners from analysis who
1. graded the reference lower than 4.5
2. graded the low anchor higher than all competitors.
3. didn't grade the low anchor.
4. didn't grade any of competitors.

Would you remove results for all sample from that user?

Initially, I would have said yes, but since the rules seem to have become more strict, I'd say we only remove the results for that particular sample.

Quote
Also imagine people guessing for one encoder that is transparent and you discard only those guesses where they were unlucky. All those lucky guesses stay and the encoder gets lower grade than what it deserves. I still hold the opinion that an ABX test for each codec on each sample should be mandatory when one wants to give a grade.

Good point. It's a listening time - reliability tradeoff. Certainly, an extra ABX test for each sample will get rid of such "false results", but it will make the test much longer. Due to the latter, I'm not sure yet whether I want extra ABX tests.

Igor, I wasn't questioning the necessity of comparing CVBR and TVBR. Of course people want to see it (myself included). But I think we should make a separate test out of it. I don't mind at all if we would make that one public as well. We could even do that Apple-only test before the multi-company test. Then we can take the winner of that test (if there is one) and put it on the multi-company test. Btw, this should have been an option in the poll (http://www.hydrogenaudio.org/forums/index.php?showtopic=77932), I think.

Chris
Title: Public Listening Test [2010]
Post by: IgorC on 2010-02-01 05:23:54
Chris, we were talking about encoders during like month and maybe it is a little bit late for such drastic changes.
I think there isn't enough good chance for more than one AAC test. Take into account  that it's not multiformat test and we will get less results. Even less if it will be only Apple AAC multi-settings test. After just one codec-specific test the interest in public test can drop drasticly.

Conducting a single AAC public test is already will be hard task in my opinion.

We can make poll to ask people or something else.

I think it will be better if we will determinate together the list of AAC encoders  and its new deadline maybe untill 7 of february?.

I don't see the reason why we should conduct separate tests. The number of good aac encoders is actually small.
Nero, TVBR, CVBR,  (CT vs Divx pre test).
Title: Public Listening Test [2010]
Post by: .alexander. on 2010-02-01 10:53:34
Three cheers for rpp3po!

Even if it is more work, I'd vote for hand selected quality settings. Run each encoder in a FOR LOOP with a few increments from q 0.38 to q 0.45 and then choose the version closest to 128 kbit/s, instead of using the same preset for the whole test. Primary goal is to see how encoders compare at 128kbit/s. Knowing how good their VBR algorithms scale up for problematic content cannot be tested in a fixed bitrate comparison, anyway.


Why do you guys actually need 96/128 kbps poll?
128 is only 30% larger than 96, and most of you don't mind encoder using 150% of nominal bitrate: [a href='index.php?showtopic=78033']Bitrate limit, How much VBR is allowed?[/a]

It's okay that consumers want to evaluate encoders in modes they regulary use. But I am surprised that people like C.R.Helmrich and muaddib do believe in magic of encoder frontend settings. There are very few principal encoding parameters in audio (bitrate, bitres mode, fs) and it would be easy to conduce fair comparison. Ridiculous but you don't want to fix any of them.

Wouldn't it be fair to exclude CT encoder? It only has CBR mode and outputs ADTS so you are going to steal another 2kbps.
Title: Public Listening Test [2010]
Post by: Alex B on 2010-02-01 12:33:23
Regarding the correct and fair VBR settings for each individual encoder, here is one of my related replies that I posted when the previous public listening test was prepared:

http://www.hydrogenaudio.org/forums/index....st&p=593735 (http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=57322&view=findpost&p=593735)

The complete thread (http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=57322) * would be good reading for anyone who is interested in the preparatory actions that are needed before a public listening can be launched. For instance, while the test was prepared we discovered a serious problem with the iTunes MP3 encoder. The problem had been existed for several years and only our discovery made Apple to finally fix it (I hope it is fixed now).

* There was also a preceding 14-page discussion about a year earlier: http://www.hydrogenaudio.org/forums/index....showtopic=47313 (http://www.hydrogenaudio.org/forums/index.php?showtopic=47313)

I have posted some other links to older threads here (http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=57322&view=findpost&p=592494):
Quote
While we wait the test to begin it might be useful to revisit the comments that were posted in the 64kbps multiformat test's announce thread: http://www.hydrogenaudio.org/forums/index....showtopic=56397 (http://www.hydrogenaudio.org/forums/index.php?showtopic=56397)

In that thread I made some suggestions about how the test presentation and instructions could be developed further: http://www.hydrogenaudio.org/forums/index....st&p=509971 (http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=56397&view=findpost&p=509971)

In addition, the comments in the post-test thread make a good read: http://www.hydrogenaudio.org/forums/index....showtopic=56851 (http://www.hydrogenaudio.org/forums/index.php?showtopic=56851)
Title: Public Listening Test [2010]
Post by: rpp3po on 2010-02-02 21:11:47
QT doesn't let its psymodel sort out >16kHz content for ~128kbit/s material, but chooses to lowpass completely (and gain the benefit of only needing 32kHz). Is this development choice going to be honored or are you planning to force a sample rate?
Title: Public Listening Test [2010]
Post by: C.R.Helmrich on 2010-02-02 22:40:57
.alexander., rpp3po,

sorry, I don't understand what you're talking about. Already at 96 kb VBR, iTunes gives me 44.1-kHz MP4 files. Using 32 kHz sampling rate at 128 kb or more is a bad idea, anyway. Pre-echo sensitive people like /mnt will tell you why.

C.R.Helmrich and muaddib believe in magic of encoder frontend settings because they know that, in principal, there are about 100 encoding parameters in AAC. There's a reason why consumers are allowed to access only very few of them. If a codec developer decides (after hundreds of hours of testing) to use a certain default sampling rate for a given bitrate, why should we disallow that?

Wouldn't it be fair to exclude CT encoder? Fair to whom? Stealing another 2kbps? From where? It's included in the 128 kb.

Chris
Title: Public Listening Test [2010]
Post by: rpp3po on 2010-02-02 23:06:34
I was referring to this (http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=78072&view=findpost&p=684807) post. Is it false information? I didn't verify it before posting.

I just noticed that for Q values <59, output is resampled to 32 kHz. For me, Q 59 results in 139 kbps on average. Isn't resampling done at much lower bitrates with mp3?.
Title: Public Listening Test [2010]
Post by: rpp3po on 2010-02-03 00:20:19
I have just verified singaiya's report. Quicktime, left at its default settings for TVBR and CVBR, downsamples automatically below Q59. Apple's bare bones afconvert front-end produces the following output:

Code: [Select]
rpp3po:Desktop rpp3po$ afconvert test.wav -f m4af -s 3 -d aac -u vbrq 58 -o test.m4a -v
Input file: test.wav, 11393676 frames
strategy = 3
user property 'qrbv' = 58
Formats:
  Input file     2 ch,  44100 Hz, 'lpcm' (0x0000000C) 16-bit little-endian signed integer
  Output file    2 ch,      0 Hz, 'aac ' (0x00000000) 0 bits/channel, 0 bytes/packet, 0 frames/packet, 0 bytes/frame
  Output client  2 ch,  44100 Hz, 'lpcm' (0x0000000C) 16-bit little-endian signed integer
AudioConverter 0x595004 [0x10012d490]:
  CodecConverter 0x0x10013e1e0
    Input:   2 ch,  44100 Hz, 'lpcm' (0x0000000C) 16-bit little-endian signed integer
    Output:  2 ch,  32000 Hz, 'aac ' (0x00000000) 0 bits/channel, 0 bytes/packet, 1024 frames/packet, 0 bytes/frame
    codec: 'aenc'/'aac '/'appl'
    Input layout tag: 0x650002
    Output layout tag: 0x650002
Optimizing test.m4a... done
Output file: test.m4a, 8267520 frames


Code: [Select]
rpp3po:Desktop rpp3po$ afconvert test.wav -f m4af -s 3 -d aac -u vbrq 59 -o test.m4a -v
Input file: test.wav, 11393676 frames
strategy = 3
user property 'qrbv' = 59
Formats:
  Input file     2 ch,  44100 Hz, 'lpcm' (0x0000000C) 16-bit little-endian signed integer
  Output file    2 ch,      0 Hz, 'aac ' (0x00000000) 0 bits/channel, 0 bytes/packet, 0 frames/packet, 0 bytes/frame
  Output client  2 ch,  44100 Hz, 'lpcm' (0x0000000C) 16-bit little-endian signed integer
AudioConverter 0x59a004 [0x10012cc00]:
  CodecConverter 0x0x10013d910
    Input:   2 ch,  44100 Hz, 'lpcm' (0x0000000C) 16-bit little-endian signed integer
    Output:  2 ch,  44100 Hz, 'aac ' (0x00000000) 0 bits/channel, 0 bytes/packet, 1024 frames/packet, 0 bytes/frame
    codec: 'aenc'/'aac '/'appl'
    Input layout tag: 0x650002
    Output layout tag: 0x650002
Optimizing test.m4a... done
Output file: test.m4a, 11393676 frames


.alexander., rpp3po,

sorry, I don't understand what you're talking about. Already at 96 kb VBR, iTunes gives me 44.1-kHz MP4 files. Using 32 kHz sampling rate at 128 kb or more is a bad idea, anyway. Pre-echo sensitive people like /mnt will tell you why.


I don't understand why you have addressed both posts together. My point wasn't related.

If a codec developer decides (after hundreds of hours of testing) to use a certain default sampling rate for a given bitrate, why should we disallow that?


Does that mean that you would prefer to use Apple's default or not?

Could you please point me to any reference why having only 16kHz of bandwidth makes it harder to avoid pre-echo?
Title: Public Listening Test [2010]
Post by: menno on 2010-02-03 01:35:00
Could you please point me to any reference why having only 16kHz of bandwidth makes it harder to avoid pre-echo?


The bandwidth doesn't matter, but the temporal resolution of the codec will be different due to the constant blocksize regardless of samplerate.
Title: Public Listening Test [2010]
Post by: rpp3po on 2010-02-03 02:50:42
Ah, thanks, I didn't think about that. The blocks last much longer.
Title: Public Listening Test [2010]
Post by: .alexander. on 2010-02-03 16:05:31
For some reason olympic sprinters don't run in casual wear, despite they can regulary walk barefoot or wear suits. In my opinion it would be good idea to compare streams having equal bitrates (amount of bits), but not produced using similarly spelled settings.

C.R.Helmrich and muaddib believe in magic of encoder frontend settings because they know that, in principal, there are about 100 encoding parameters in AAC. There's a reason why consumers are allowed to access only very few of them.


100 parameters per encoder is somewhat about 400 for this test. And that's why I propose to focus on parameters of bitstreams. Actually there are less than 100 data_elements in aac-lc syntax (see 4.148).

If a codec developer decides (after hundreds of hours of testing) to use a certain default sampling rate for a given bitrate, why should we disallow that?


You are right, and I appreciate invested hundreds of hours of testing. Though resampling isn't primary AAC compression technique, and some applications require fixed sample rate.

Wouldn't it be fair to exclude CT encoder? Fair to whom? Stealing another 2kbps? From where? It's included in the 128 kb.


Note that in previous tests CT streams were in ADTS, while others targeted bitrate of raw data. CT encoder could use 56 bits of each ADTS header for huffman codes. And I kindly ask you to increase CT bitrate upto 130 kbps.
Title: Public Listening Test [2010]
Post by: IgorC on 2010-02-03 16:41:40
In my opinion it would be good idea to compare streams having equal bitrates (amount of bits), but not produced using similarly spelled settings.

That won't happen. I promise you.
Why?
99% of HA community are on the same page that  codec should be tested without any bitrate restriction while produce the same bitrate on enough big amount of files.

Quote
And I kindly ask you to increase CT bitrate upto 130 kbps.

Yes, CT's bitrate will be shifted to ~130 kbps
Title: Public Listening Test [2010]
Post by: .alexander. on 2010-02-03 18:37:04
That won't happen. I promise you.
Why?


And why do they use straight tracks for 100m sprint?

99% of HA community are on the same page that  codec should be tested without any bitrate restriction while produce the same bitrate on enough big amount of files.


What is your personal opinion how big amount of files is neccessary to make encoders produce same avarage bitrate after 30' second? Test samples are likely to be short and this can particulary be a reason to get biased bitrates.
Title: Public Listening Test [2010]
Post by: C.R.Helmrich on 2010-02-03 19:30:28
What is your personal opinion how big amount of files is neccessary to make encoders produce same avarage bitrate after 30' second? Test samples are likely to be short and this can particulary be a reason to get biased bitrates.

http://www.hydrogenaudio.org/forums/index....showtopic=77932 (http://www.hydrogenaudio.org/forums/index.php?showtopic=77932) 55 CDs should suffice

Now, IIRC, at ~128 kbs, all encoders should run at 44.1 kHz by default, am I right? So if we take that bitrate, the sampling rate issue should disappear.

Chris

Oh, I forgot: Who says the number of encoding parameters needs to equal the number of data elements? Even something such as the decision whether to use short blocks or a long block already requires a handful of parameters.
Title: Public Listening Test [2010]
Post by: rpp3po on 2010-02-03 22:51:53
http://www.hydrogenaudio.org/forums/index....showtopic=77932 (http://www.hydrogenaudio.org/forums/index.php?showtopic=77932) 55 CDs should suffice

Now, IIRC, at ~128 kbs, all encoders should run at 44.1 kHz by default, am I right? So if we take that bitrate, the sampling rate issue should disappear.


That was 55 CDs of classical music. You know what that means. With Q 59 and 55 CDs of 'emesesk' music you end up in the 140-145kbit/s range. And beginning from the next smaller setting, Q 58, downsampling is the default for Quicktime. So no, depending on the encoded content you are not right. But since it seems we won't individually alter Q values to get each track as close to ~128kbit/s as possible, anyway, it might not really matter. Depending on the selection of samples, the global Q value might be higher than 59 and then downsampling is really not an issue.

I have 87 lossless cross-genre albums on my notebook right now. I'll check in a couple of minutes how close its average is going to get to 128kbit/s at Q 59.

PS I have finished the 87 albums. I get a collection average of 129 kbit/s (median at 134 kbit/s) for Q 59. Biggest surprise is the MFSL edition of "A Night in Tunesia" by Art Blakey and the Jazz Messengers from 1960, with an album average of 164 kbit/s. On the low end are Miles Davis recordings from the 50s with ~72kbit/s average.
Title: Public Listening Test [2010]
Post by: IgorC on 2010-02-04 14:55:28
Guruboolez (classic, 55 CDs) (http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=77932&view=findpost&p=682937) - 127 kbps
IgorC (varoius, 31 CDs ) (http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=77932&view=findpost&p=683265)  -  129 kbps
rpp3p0 (various, 87 CDs) (http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=77272&view=findpost&p=685443) - 129 kbps

The invterval Q59-Q68 produces exactly the same bitrate (Q59, Q60..Q65...).
My suggestion is --tvbr 60 --highest should be tested

It's time to find the same bitrate for other encoders.
Title: Public Listening Test [2010]
Post by: rpp3po on 2010-02-04 18:28:24
It's time to find the same bitrate for other encoders.


I'm testing different "target" bitrate settings for the CVBR mode right now on the same set. It's not exactly a target bitrate, because the constraint seems to be enforced asynchronously. It rather behaves as "don't fall below that bitrate on average and scale up moderately when required". So I expect the matching target for CVBR to be in the range of 117-120 kbit/s. I'm verifying this right now, but it needs some time, since about 30GB need to be processed in each run.

If someone else could experiment with Nero AAC q values, that would be great. I'm willing to test that result against the 87 CD set. But since it is kind of slow over VMware, I can't do the experimenting myself.
Title: Public Listening Test [2010]
Post by: IgorC on 2010-02-04 21:19:56
itunes 128 CVBR produces 131 kbps on my collection of 31 CDs.  http://www.hydrogenaudio.org/forums/index....st&p=683265 (http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=77932&view=findpost&p=683265)
Title: Public Listening Test [2010]
Post by: rpp3po on 2010-02-04 22:26:44
OK, tests are finished. To get equal results to 'TVBR --highest Q(59-68)' = 129kbit/s, I had to use a target bitrate of 121 kbit/s for 'CVBR --normal'.

This is the list of the 87 albums' genres:

Rhythm & Blues: 2
Electronic: 11
Folk: 1
Jazz: 39
Classical: 4
Metal: 2
Pop: 9
Rock: 16
Soul: 3

Jazz is a little overrepresented, but it spans music from six decades with great variety, so I think it's ok.

I found it very interesting that the bitrate distribution is quite different for CVBR, which I did not expect to that degree. Art Blakey doesn't lead anymore (now 133kbit/s), but Boulez: Répons at 136kbit/s. The lowest is now Radiohead with "Motion Picture Soundtrack" from Kid A at 83kbit/s (TVBR 86kbit/s), but there are only 4 tracks below 100kbit/s at all! The median is at 130kbit/s.

Summary:

TVBR Q(59-68) --highest: average 129 kbit/s, median 134 kbit/s, min 72 kbit/s, max 168 kbit/s
CVBR 121 --normal: average 129 kbit/s, median 130 kbit/s, min 83 kbit/s, max 136 kbit/s

If I would plot a graph of the results, the edges would be very steep for CVBR, only very few tracks are found at the extremes, most form a flat top in the middle.

For TVBR, even though the bitrates goes up as high as 168 kbit/s, there are very many tracks both at the upper and the lower end and pretty equal distribution over the whole (though much larger) bitrate range.
Title: Public Listening Test [2010]
Post by: IgorC on 2010-02-05 02:55:28
rpp3po
Thank you for the results.
CVBR 121 produces the same bitrate as CVBR 128.

At least it's confirmation for my previous finding (http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=77932&view=findpost&p=683265) that CVBR 128 and TVBR 60 have comparable bitrate ~130 kbps.
Title: Public Listening Test [2010]
Post by: rpp3po on 2010-02-05 03:16:24
CVBR 121 produces the same bitrate as CVBR 128.


That's not correct. Each CVBR target bitrate results in another filesize. 121 is different from 128, also from 122, 123, etc. Only TVBR changes at every 8th increment.

We also have slightly different results.
Title: Public Listening Test [2010]
Post by: IgorC on 2010-02-05 03:21:50
Impossible.
Title: Public Listening Test [2010]
Post by: rpp3po on 2010-02-05 03:23:07
Impossible.


Just try it.
Title: Public Listening Test [2010]
Post by: IgorC on 2010-02-05 03:27:11
Already did it.

From nao's site:
Code: [Select]
Cannot encode with odd bitrates, like 125kbps


It should be bug or something wrong ... on your side.
Title: Public Listening Test [2010]
Post by: rpp3po on 2010-02-05 03:36:01
It should be bug or something wrong ... on your side.


Quicktime on the Mac doesn't have any problem with odd bitrates, it's rather a bug of the Windows port. Both nao's XLD and Apple's afconvert front-end don't have this problem.

If your system cannot encode 121 kbit/s, then try even numbers as 120 and 122. You will see, they produce different file sizes.

PS Quicktime's CVBR target bitrate, is a minimum average, not a real target, as described above. If we compare TVBR 60 against CVBR 128, we will give CVBR an unfair advantage of up to 5% without necessity. Much of TVBR's efforts to save bitrate on non-complex passages would be superseded.

PPS What do you think have I done the whole day? Encode several times 30GB, at different CVBR settings, just to get the same result each time?
Title: Public Listening Test [2010]
Post by: Alex B on 2010-02-05 07:59:36
So what means are you guys using for measuring and calculating the average bitrates? Can the bitrates that programs report be blindly trusted?

I tried demuxing MP4/M4A to raw AAC with Yamb/MP4Box, but surprisingly that actually increased the file size a bit. Perhaps MP4 somehow packs the stream more effectively and the despite the additional MP4 container structure the overall file size can be smaller than the size of the raw stream.

Could the AAC developers explain how AAC is stored in the container and how the bitrate values are saved in the "atoms"? Does the MP4 container structure have a defined size overhead that can be detected or calculated?

EDIT

BTW, is test now settled to be a ~128 test?
Title: Public Listening Test [2010]
Post by: Sebastian Mares on 2010-02-05 08:06:16
Could someone give a summary to what has been decided so far? Is the discussion about codecs and their settings over? I am asking because there was already a samples thread somewhere and usually you choose samples after choosing codecs (at least it was so in the past).
Title: Public Listening Test [2010]
Post by: muaddib on 2010-02-05 09:32:05
I tried demuxing MP4/M4A to raw AAC with Yamb/MP4Box, but surprisingly that actually increased the file size a bit. Perhaps MP4 somehow packs the stream more effectively and the despite the additional MP4 container structure the overall file size can be smaller than the size of the raw stream.
Could the AAC developers explain how AAC is stored in the container and how the bitrate values are saved in the "atoms"? Does the MP4 container structure have a defined size overhead that can be detected or calculated?

Raw AAC streams have frame structure just like mp3, so there is an overhead of a header for each frame.
Title: Public Listening Test [2010]
Post by: Alex B on 2010-02-05 09:47:03
Oh, I see. That explains the slightly bigger size of demuxed AAC.

Do you have any opinion about the proper way of measuring the MP4-AAC audio file bitrates? For instance, is it fine to use foobar?

I have tried also Mr QuestionMan (http://www.burrrn.net/?page_id=5) and Mediainfo (http://mediainfo.sourceforge.net/en), but the results are not always identical.


[!--sizeo:1--][span style=\"font-size:8pt;line-height:100%\"][!--/sizeo--]EDIT

There is an excess "the" word in my previous reply: ... and the despite the additional MP4 container structure... . The forum software doesn't allow me to edit it anymore. In my opinion the replies should be editable longer - 24 h or so.[/size]
Title: Public Listening Test [2010]
Post by: rpp3po on 2010-02-05 10:36:32
Regarding the recently discussed, determining bitrates is not an issue. IgorC just misinterpreted the line on nao's site. To end the speculation, I have attached a set of Quicktime CVBR files at 126, 127, 128 kbit/s targets (effectively in the 141-143 kbit/s range), which should be "impossible" to produce according to Igor:

Update: I had accidently uploaded CBR, files.  Now it is CVBR.
Title: Public Listening Test [2010]
Post by: Alex B on 2010-02-05 10:54:02
I am not speaking about QT CVBR. I am speaking about determining the bitrates in general. Obviously those programs that I mentioned read info that is stored in the MP4 headers. I am not an MP4 expert and I would like to know if that info is always correct and if I can trust some of the available tools.

You have not explained what software you use for displaying the bitrate values.


BTW, seems like you don't agree with me and C.R.Helmrich that iTunes should be used for encoding CVBR. Correct?
Title: Public Listening Test [2010]
Post by: .alexander. on 2010-02-05 10:54:34
Oh, I forgot: Who says the number of encoding parameters needs to equal the number of data elements? Even something such as the decision whether to use short blocks or a long block already requires a handful of parameters.


Not me. What I said is that number of principal parameters of stream (and even data elements) is less than 100, and these are common for all competitors.

While samples are short, it will be a test of rate control volatility. And this isn't worth the efforts. IgorC rejected rpp3po' idea since it doesn't look real. But in "real life condition" bitres at the begining of fragment wouldn't be reset. And what is more important the bit reservoir state at the end of fragment wouldn't be irrelevant.

For "real life condition" use looped samples. Then pick fragments with lowest bit-length for each encoder. This way you could check whether CBR is that bad.

After all, rate control is not most intriguing part of AAC. Intra-frame stuff is of more interest.
Title: Public Listening Test [2010]
Post by: rpp3po on 2010-02-05 11:05:39
BTW, seems like you don't agree with me and C.R.Helmrich that iTunes should be used for encoding CVBR. Correct?


Yes. The test's goal should be AAC at 128kbit/s respectively an encoder setting that produces 128 kbit/s on average. This is true for Quicktime TVBR at Q60 and CVBR 121. In contrast iTunes' "128 kbit/s" CVBR preset produces considerably larger files on average. What is missing now, is a matching Q value for Nero.

Edit: And don't forget, we can exactly clone iTunes' encoder by using Quicktime with CVBR --normal.

Alexander, I also supported the idea to use hand picked files, to get as close to 128 kbit/s average, at first. But I don't do anymore. Let us just use a setting for each encoder that will result in a collection average of ~128kbit/s. Some contenders like Quicktime TVBR, which is only alterable in increments of 8, don't allow fine tuning or even want to downsample at some settings. This would be a mess after all. Using one preset for each encoder really seems the most practicable approach.
Title: Public Listening Test [2010]
Post by: muaddib on 2010-02-05 11:21:38
Do you have any opinion about the proper way of measuring the MP4-AAC audio file bitrates? For instance, is it fine to use foobar?
I have tried also Mr QuestionMan (http://www.burrrn.net/?page_id=5) and Mediainfo (http://mediainfo.sourceforge.net/en), but the results are not always identical.

Bitrate from Audio stream must be taken into account in MediaInfo and not the overall bitrate from General.
Bitrate in foobar is also correct.
Title: Public Listening Test [2010]
Post by: Alex B on 2010-02-05 11:33:28
rpp3po,

You are still not saying what do you use for measuring the bitrates...


IMHO in a VBR listening test the encoders that can't be adjusted precisely (i.e. the encoders that have only one suitable bitrate related setting) should be measured and the measured values should be averaged. After that the encoders that can be freely adjusted should be set to produce that average value (naturally always using the same test files).

For instance:

Encoders that cannot be freely adjusted:

iTunes CVBR => x kbps
QT TVBR => y kbps
CT CBR (does it have a VBR mode?) or Divx VBR (I downloaded the demo version, but I have not tried it yet and I have no idea of how it works.) = z kbps

x+y+z /3 kbps = a kbps

Encoders that can be adjusted precisely:

Nero VBR  => adjusted as close to "a" kbps as possible
Title: Public Listening Test [2010]
Post by: rpp3po on 2010-02-05 11:37:44
rpp3po,

You are still not saying what do you use for measuring the bitrates...


Normally just he OS X file info. The OS supports this out of the box. For the 87 album test I have loaded the set into Foobar within VMware. For the three above CVBR 126, 127, and 128 files OS X reports 140, 141, and 143 kbit/s. Foobar: 141, 141, 143. Extracted to raw AAC and calculated by hand: 144.6, 145.1, 147.0.

So I think CVBR can be counted as freely adjustable.

As long as always the same tool is used to determine the bitrates in this test, everything should be ok. I would be fine with agreeing on Foobar.
Title: Public Listening Test [2010]
Post by: Alex B on 2010-02-05 11:55:51
I measured the CVBR samples you provided:

Foobar:

141, 141, and 143 kbps

(http://i224.photobucket.com/albums/dd212/AB2K/ha/foobar.png)

Mr QuestionMan:

142, 143, and 145 kbps

(http://i224.photobucket.com/albums/dd212/AB2K/ha/mr_qm.png)

MediaInfo:

The AAC bitrates: 141, 144, and 144 kbps

(Oddly only the "126" file shows a nominal bitrate. Something is not very reliable.)

Code: [Select]
General
Complete name                    : F:\Test\cvbr\cvbr126.mp4
Format                          : MPEG-4
Format profile                  : Apple AAC audio with iTunes info
Codec ID                        : M4A
File size                        : 579 KiB
Duration                        : 30s 255ms
Overall bit rate                : 157 Kbps
Writing application              : X Lossless Decoder 20100123, QuickTime 7.6.3, Constrained VBR 126 kbps
Encoding Params                  : (Binary)

Audio
ID                              : 1
Format                          : AAC
Format/Info                      : Advanced Audio Codec
Format version                  : Version 4
Format profile                  : LC
Format settings, SBR            : No
Codec ID                        : 40
Duration                        : 30s 255ms
Bit rate mode                    : Variable
Bit rate                        : 141 Kbps
Nominal bit rate                : 144 Kbps
Maximum bit rate                : 167 Kbps
Channel(s)                      : 2 channels
Channel positions                : L R
Sampling rate                    : 44.1 KHz
Stream size                      : 521 KiB (90%)
Language                        : English

General
Complete name                    : F:\Test\cvbr\cvbr127.mp4
Format                          : MPEG-4
Format profile                  : Apple AAC audio with iTunes info
Codec ID                        : M4A
File size                        : 580 KiB
Duration                        : 30s 255ms
Overall bit rate                : 157 Kbps
Writing application              : X Lossless Decoder 20100123, QuickTime 7.6.3, Constrained VBR 127 kbps
Encoding Params                  : (Binary)

Audio
ID                              : 1
Format                          : AAC
Format/Info                      : Advanced Audio Codec
Format version                  : Version 4
Format profile                  : LC
Format settings, SBR            : No
Codec ID                        : 40
Duration                        : 30s 255ms
Bit rate mode                    : Variable
Bit rate                        : 144 Kbps
Maximum bit rate                : 167 Kbps
Channel(s)                      : 2 channels
Channel positions                : L R
Sampling rate                    : 44.1 KHz
Stream size                      : 522 KiB (90%)
Language                        : English

General
Complete name                    : F:\Test\cvbr\cvbr128.mp4
Format                          : MPEG-4
Format profile                  : Apple AAC audio with iTunes info
Codec ID                        : M4A
File size                        : 588 KiB
Duration                        : 30s 255ms
Overall bit rate                : 159 Kbps
Encoded date                    : UTC 1975-01-22 15:50:31
Tagged date                      : UTC 1975-01-22 15:50:31
Writing application              : X Lossless Decoder 20100123, QuickTime 7.6.3, Constrained VBR 128 kbps
Encoding Params                  : (Binary)

Audio
ID                              : 1
Format                          : AAC
Format/Info                      : Advanced Audio Codec
Format version                  : Version 4
Format profile                  : LC
Format settings, SBR            : No
Codec ID                        : 40
Duration                        : 30s 255ms
Bit rate mode                    : Variable
Bit rate                        : 144 Kbps
Maximum bit rate                : 171 Kbps
Channel(s)                      : 2 channels
Channel positions                : L R
Sampling rate                    : 44.1 KHz
Stream size                      : 529 KiB (90%)
Language                        : English
Encoded date                    : UTC 1975-01-22 15:50:31
Tagged date                      : UTC 1975-01-22 15:50:31



Which one would you pick?

Does the OS X file info box agree with one of these?

EDIT

I saw your edit. Apparently OS X does not exactly agree with foobar, but is close.
Title: Public Listening Test [2010]
Post by: rpp3po on 2010-02-05 12:04:17
While you have written this, I have updated my last post. I think all tools except MediaInfo show consistent relative rates compared to undiluted raw AAC (including constant frame overhead).
Title: Public Listening Test [2010]
Post by: .alexander. on 2010-02-05 12:09:53
Alexander, I also supported the idea to use hand picked files, to get as close to 128 kbit/s average, at first. But I don't do anymore. Let us just use a setting for each encoder that will result in a collection average of ~128kbit/s. Some contenders like Quicktime TVBR, which is only alterable in increments of 8, don't allow fine tuning or even want to downsample at some settings. This would be a mess after all. Using one preset for each encoder really seems the most practicable approach.


I don't usualy quote myself, buy why not

use looped samples. Then pick fragments with lowest bit-length for each encoder. This way you could check whether CBR is that bad.


This can be useless in case of uncontrained VBR, but very easy to implement. CT encoder is also restictive in some way, why don't adapt to its traits? e.g. adjust its bitrate to match average bitrate of others?
Title: Public Listening Test [2010]
Post by: rpp3po on 2010-02-05 12:14:46
Blind testing is impossible if each non-adjustable encoder gets different fragments. It's also only a cosmetic change.
Title: Public Listening Test [2010]
Post by: .alexander. on 2010-02-05 12:34:43
The input would be same for each encoder (1000 times looped 30sec sample).  Then output stream is split into 1000 fragments of same duration (30 sec) and then the fragment with lowest amount of bits is chosen for listening. The only difference is in bit reservoir state at the begining of fragment.
Title: Public Listening Test [2010]
Post by: rpp3po on 2010-02-05 12:45:01
Ok, I understand. But IgocC has already decided that this won't happen. We should tolerate that. He's doing most of the work. Without that discussions could go on well into 2011...
Title: Public Listening Test [2010]
Post by: nao on 2010-02-05 14:07:03
Quicktime on the Mac doesn't have any problem with odd bitrates, it's rather a bug of the Windows port. Both nao's XLD and Apple's afconvert front-end don't have this problem.

QuickTime on OSX has the same limitation. Afconvert and XLD are not affected by the limitation because they use CoreAudio (AudioCodec) API instead of QuickTime to access the AAC codec.
Title: Public Listening Test [2010]
Post by: rpp3po on 2010-02-05 14:26:54
Thanks for the info, nao!

It is the same codec, nevertheless, only different APIs. You basically said that, just wanted to clarify.

CVBR 120 results in 128 kbit/s (instead of 129 as Q60) with the above collection. A 1 kbit/s difference shouldn't be a problem.
Title: Public Listening Test [2010]
Post by: frozenspeed on 2010-02-05 15:35:03
So when does this listening test start? How can I participate?
Title: Public Listening Test [2010]
Post by: C.R.Helmrich on 2010-02-06 11:11:22
rpp3po,

I have 87 lossless cross-genre albums on my notebook right now. I'll check in a couple of minutes how close its average is going to get to 128kbit/s at Q 59.

PS I have finished the 87 albums. I get a collection average of 129 kbit/s (median at 134 kbit/s) for Q 59.

For completeness sake, would you mind listing the individual albums? And when you encoded them, was each album one single FLAC file or did you encode track by track?

Thanks,

Chris
Title: Public Listening Test [2010]
Post by: C.R.Helmrich on 2010-02-06 11:37:32
Could someone give a summary to what has been decided so far? Is the discussion about codecs and their settings over? I am asking because there was already a samples thread somewhere and usually you choose samples after choosing codecs (at least it was so in the past).

Let's see.

http://www.hydrogenaudio.org/forums/index....showtopic=77932 (http://www.hydrogenaudio.org/forums/index.php?showtopic=77932) We will test Apple true VBR vs. Apple constrained VBR vs. Nero 1.5.3 vs. winner of internal pre-test (DivX vs. Coding Technologies). The poll is also in-line with the newest discussions.

http://www.hydrogenaudio.org/forums/index....showtopic=77809 (http://www.hydrogenaudio.org/forums/index.php?showtopic=77809) 128 kbps. Also supported by the discussion here which has been exclusively about 128-ish VBR bitrates. Note that the CT encoder (which is CBR only) will be tested at 130 kbps since it supposedly uses ADTS.

Important short-term decisions missing:

Nero quality value: rpp3po, could you please run the nero encoder over your 87-album test set with -q 0.41 and report the bitrate here? Thanks!

Now, help me out. What settings are we using for iTunes CVBR and DivX?

Once that's fixed, I think we can finish the item selection (http://www.hydrogenaudio.org/forums/index.php?showtopic=77584).

Chris

(Edit: changed the quality value slightly)
Title: Public Listening Test [2010]
Post by: rpp3po on 2010-02-06 14:59:47
For completeness sake, would you mind listing the individual albums? And when you encoded them, was each album one single FLAC file or did you encode track by track?


I don't feel comfortable doing that. A set 87 individual albums (including several rarities) is sufficient for a one in billions fingerprint and I don't want that linked to my internet alias. I know millions share info like that on last.fm, spotify, and whatnot, let alone Facebook & Co., but I do not. 87 albums might not be a problem (only a possibility) today, but what's in five, ten years? The internet doesn't forget very fast. If anyone can't live with these terms feel free to upload your own results.

The encoding was done track by track. The reported average is that reported by Foobar's properties dialog for all selected tracks.

Nero quality value: rpp3po, could you please run the nero encoder over your 87-album test set with -q 0.41 and report the bitrate here? Thanks!


I can do that later today.

PS It is running now. It is going to take 6:30 hours in a virtual machine. I have sorted the queue by track numbers so that l can do a half-way check in 3 hours and evaluate if going for 0.4 would be better.

PPS Watching Nero groan in VMware was a pain. So I gave Dibrom's new [a href='index.php?showtopic=77261']Foobar package for OS X[/a] a try and intregrated NeroAACenc and the ALAC component. It rock's! Restarted the whole thing, new estimated time 2:50h, speed 32x. It looks like 0.41 is correct btw.
Title: Public Listening Test [2010]
Post by: IgorC on 2010-02-06 16:45:18
rpp3po and Chris,

After nao's acclaration I think --CVBR 128 --normal is fine as it produces the same output as itunes on Windows OS. --CVBR 128 --normal has very close bitrate to --TVBR 60 --highest. (See previous posts)

Divx -v 4

Alex B's method for calculation bitrate should be applied. http://www.hydrogenaudio.org/forums/index....st&p=685755 (http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=77272&view=findpost&p=685755)
Title: Public Listening Test [2010]
Post by: rpp3po on 2010-02-06 16:51:21
What's the benefit of attributing up to 5% more bitrate to CVBR over TVBR than needed, if we don't have to? qtaacenc is perfectly able to accept the settings 120, 122, 124, 126, and 128 on the Windows platform and uses the same encoder as iTunes. The contestant could still be truthfully called "iTunes VBR encoder with 128 kbit/s average target setting".
Title: Public Listening Test [2010]
Post by: IgorC on 2010-02-06 17:04:58
What's the benefit of attributing up to 5% more bitrate to CVBR over TVBR than needed, if we don't have to?


The difference is much less from my previous findings.
http://www.hydrogenaudio.org/forums/index....st&p=683265 (http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=77932&view=findpost&p=683265)

Quote
qtaacenc is perfectly able to accept the settings 120, 122, 124, 126, and 128 ...

Actually it doesn't.
Title: Public Listening Test [2010]
Post by: rpp3po on 2010-02-06 17:26:05
Actually it doesn't.


Ok, we didn't have a true clarification of the meaning of "odd" bitrates, yet, wether it means odd numbers or odd in the sense of not (96, 122, 128). There was just a clarification why it worked on the mac but didn't on Windows. But since you insist, I guess you double-checked, and believe it.

When there really was no way on Windows to reproduce the files in the test, that's really a drawback and I'll accept it. There's still the possibility to ABX my own set of files, if CVBR and TVBR come out too close in the test.
Title: Public Listening Test [2010]
Post by: rpp3po on 2010-02-06 21:37:55
Nero has finished. 0.41 was a good call by C. R. Helmrich!

QT CVBR 121 --normal: average 129 kbit/s, median 130 kbit/s, min 83 kbit/s, max 136 kbit/s
Nero AAC 1.5.1.0 q 0.41: average 129 kbit/s, median 133 kbit/s, min 81 kbit/s, max 143 kbit/s
QT TVBR Q(59-68) --highest: average 129 kbit/s, median 134 kbit/s, min 72 kbit/s, max 168 kbit/s

Lowest bitrate is the same as QT CVBR: Radiohead: "Motion Picture Soundtrack". Highest is Björk, "Cocoon" from "Verspertine Live". The top 30 bitrates are quite different from QT, I'm looking forward to the comparison.

I had only done a ~35% run of QT CVBR 128 and then switched to lower rates, because the result was too high. Tonight I'll let it run to 100% for completeness.
Title: Public Listening Test [2010]
Post by: Alex B on 2010-02-06 21:55:57
Divx -v 4
Before anyone encodes a huge amount of files and goes through the needed muxing step here are my first results of DivX VBR AAC encoding.

I encoded my usual reference sets (25 various and 25 classical tracks) @ -v 4 and -v 5.

I used Speek's Batchenc program as a GUI for DivXAACEnc.exe. http://members.home.nl/w.speek/batchenc.htm (http://members.home.nl/w.speek/batchenc.htm)
For -v 4 a command line that works is:
[path to the encoder]\DivXAACEnc.exe -i <infile> -o <outfile.aac> -v 4

After encoding I muxed the raw AAC tracks with Mp4Box v.0.4.6 (dev.20091013). http://kurtnoise.free.fr/mp4tools/ (http://kurtnoise.free.fr/mp4tools/)
Once again I used Batchenc as a front end. A command line that works is:
[path to the muxer]\MP4Box.exe -add <infile> <outfile.m4a>
According to the documentation the m4a extension makes MP4box to automatically create an "iTunes compatible" file (whatever that is supposed to mean).

Then I measured the individual track bitrates with foobar and calculated the results with a spreadsheet (I need to do that because my reference sets have tracks that were carefully selected only because of their individual audio qualities. The varied track durations would add incorrect weighting in this case.
EDIT: For instance, foobar calculates the overall bitrate of the complete selection and thus longer tracks have more weight than shorter tracks in the displayed average value.

The results:

-v 4
Code: [Select]
 DivX -v 4	Various	

AC_DC - Highway to Hell 127
Adiemus - Boogie Woogie Llanoogie 122
Barry White - Sho' You Right 128
Björk - Possibly Maybe - LFO (Lucy Mix) 120
Davis Bowie - Starman 127
Dido - Life For Rent 123
Duran Duran - Astronaut 129
ELO - Livin' Thing 123
Erich Kunzel & The Cincinnati Pops - Theme from The Pink Panther 109
Evanescence - Going Under 126
Faithless - Mass Destruction 121
Garbage - Bleed Like Me 124
Jamiroquai - World That He Wants 114
Kraftwerk - Tour de France Etape 1 131
Morrissey - Irish Blood, English Heart 129
Paco De Lucia - Rumba Improvisada 118
Santana - Oye Como Va 131
Simply Red - You Make Me Believe 125
Sting - Desert Rose 127
The Beatles - Let It Be 119
Tina Arena - Symphony of Life 118
U2 - Vertigo 129
Whitney Houston - Queen Of The Night 124
Yello - Planet Dada 135
Yo-Yo Ma - Libertango 114

Average 123.72


 DivX -v 4 Classical

Aldo Ciccolini - Satie, Sports Et Divertissements, Le Flirt 95
Alfred Brendel - Beethoven, Piano Sonata No 15, Op 28 Scherzo 102
Baroque Festival Orchestra - Vivaldi, Sinfonia in C major, 3. Presto 114
Berlin - Mahler, Symphony No  8, 2 Chailly, Ewiger Wonnebrand 110
Berlin Philharmonic Orchestra - Mozart, Requem in D moll K 626, Sanctus 119
Berlin Philharmonic Orchestra - Strauss, Also Sprach Zarathustra (Thus Spoke Zarathustra) 108
Christophe Rousset - Farinelli, Il Castrato (OST), J.A. Hasse, Generoso risuegliati 108
Concentus Musicus Wien - Bach, Matthäus Passion BWV 244, Da ging hin der Zwölfen Einer 99
Daniel Barenboim - Mozart, Piano Concerto No 3 in D major, K40-3 103
Gérard Lesne - Vivaldi, Sonate Op 2 No 3 pour violon & bc, III. Adagio 107
Giuseppe Sinopoli - Elgar, Cello Concerto, Serenade for Strings, Enigma-Andante 98
Itzhak Perlman - Paganini, 24 Caprises, No 1 In E 112
Jascha Heifetz - Sarasate, Zigeunerweisen, Op 20 No 1 116
Jessye Norman - Angels we have heard on high (Trad.) 113
Kirov Orchestra & Chorus - Khatchaturian, Gayaneh, Säbeltanz 117
Leslie Howard - Liszt, Douze Grandes Études, S 137 No 1 in C 91
London Sinfonietta - Saint-Saëns, Le carnaval des animaux, Hémiones 95
London Symphony Orchestra - Ravel, Daphnis et Chloé, 10. Tres modere 95
Marie-Claire Alain - Bach, Wo Soll Ich Fliehen Hin BWV 646 107
Michael Nyman - The Piano OST, A Bed of Ferns 100
Orchestre Symphonique De Montreal - Elgar, Enigma Variations No 1 104
Philharmonia Slavonica - Bach, BWV 1067 Rondeau 113
The Cleveland Orcestra - Ravel Valses, nobles et sentimentales, 1. Modere 115
The Philadelphia Orchestra - Tchaikovsky, The Nutcracker, Op 71a (Ballet Suite) No 2 109
Zbigniew Preisner - Trois Couleurs Bleu (OST), First flute 88

Average 105.52


 DivX -v 4 Overall average 114.62
-v 5
Code: [Select]
 DivX -v 5	Various	

AC_DC - Highway to Hell 152
Adiemus - Boogie Woogie Llanoogie 145
Barry White - Sho' You Right 154
Björk - Possibly Maybe - LFO (Lucy Mix) 142
Davis Bowie - Starman 152
Dido - Life For Rent 148
Duran Duran - Astronaut 154
ELO - Livin' Thing 147
Erich Kunzel & The Cincinnati Pops - Theme from The Pink Panther 130
Evanescence - Going Under 151
Faithless - Mass Destruction 143
Garbage - Bleed Like Me 148
Jamiroquai - World That He Wants 135
Kraftwerk - Tour de France Etape 1 156
Morrissey - Irish Blood, English Heart 155
Paco De Lucia - Rumba Improvisada 140
Santana - Oye Como Va 156
Simply Red - You Make Me Believe 148
Sting - Desert Rose 153
The Beatles - Let It Be 141
Tina Arena - Symphony of Life 140
U2 - Vertigo 155
Whitney Houston - Queen Of The Night 148
Yello - Planet Dada 161
Yo-Yo Ma - Libertango 135

Average 147.56


 DivX -v 5 Classical

Aldo Ciccolini - Satie, Sports Et Divertissements, Le Flirt 113
Alfred Brendel - Beethoven, Piano Sonata No 15, Op 28 Scherzo 120
Baroque Festival Orchestra - Vivaldi, Sinfonia in C major, 3. Presto 136
Berlin - Mahler, Symphony No  8, 2 Chailly, Ewiger Wonnebrand 129
Berlin Philharmonic Orchestra - Mozart, Requem in D moll K 626, Sanctus 143
Berlin Philharmonic Orchestra - Strauss, Also Sprach Zarathustra (Thus Spoke Zarathustra) 129
Christophe Rousset - Farinelli, Il Castrato (OST), J.A. Hasse, Generoso risuegliati 129
Concentus Musicus Wien - Bach, Matthäus Passion BWV 244, Da ging hin der Zwölfen Einer 116
Daniel Barenboim - Mozart, Piano Concerto No 3 in D major, K40-3 121
Gérard Lesne - Vivaldi, Sonate Op 2 No 3 pour violon & bc, III. Adagio 128
Giuseppe Sinopoli - Elgar, Cello Concerto, Serenade for Strings, Enigma-Andante 116
Itzhak Perlman - Paganini, 24 Caprises, No 1 In E 132
Jascha Heifetz - Sarasate, Zigeunerweisen, Op 20 No 1 137
Jessye Norman - Angels we have heard on high (Trad.) 134
Kirov Orchestra & Chorus - Khatchaturian, Gayaneh, Säbeltanz 140
Leslie Howard - Liszt, Douze Grandes Études, S 137 No 1 in C 107
London Sinfonietta - Saint-Saëns, Le carnaval des animaux, Hémiones 112
London Symphony Orchestra - Ravel, Daphnis et Chloé, 10. Tres modere 113
Marie-Claire Alain - Bach, Wo Soll Ich Fliehen Hin BWV 646 126
Michael Nyman - The Piano OST, A Bed of Ferns 119
Orchestre Symphonique De Montreal - Elgar, Enigma Variations No 1 123
Philharmonia Slavonica - Bach, BWV 1067 Rondeau 134
The Cleveland Orcestra - Ravel Valses, nobles et sentimentales, 1. Modere 136
The Philadelphia Orchestra - Tchaikovsky, The Nutcracker, Op 71a (Ballet Suite) No 2 129
Zbigniew Preisner - Trois Couleurs Bleu (OST), First flute 103

Average 125


DivX -v 5 Overall average 136.28

Summary:

DivX -v 4
Various   123.72
Classical   105.52
Average   114.62

DivX -v 5
Various   147.56
Classical   125.00
Average   136.28


Apparently -v 4 produces quite low bitrates. -v 5 is closer to the test target, though a bit on the high side.
Title: Public Listening Test [2010]
Post by: C.R.Helmrich on 2010-02-06 23:28:18
Thanks a lot for the results, rpp3po! I estimated the -q value from the discussion here (http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=77932&view=findpost&p=682937).

Thanks also to Alex! Sigh, now how on earth are we going to integrate the DivX encoder into the test in a fair way? Increase the nero and QT TVBR -q values so that they match the iTunes 128 CVBR and DivX -v 5 average bitrates? Or just kick out DivX?

Chris
Title: Public Listening Test [2010]
Post by: rpp3po on 2010-02-07 00:20:52
CVBR 128 --normal is through.

All results so far:

QT CVBR 128 --normal: average 134 kbit/s, median 134 kbit/s, min 88 kbit/s, max 144 kbit/s
Nero AAC 1.5.1.0 q 0.41: average 129 kbit/s, median 133 kbit/s, min 81 kbit/s, max 143 kbit/s
QT TVBR Q(59-68) --highest: average 129 kbit/s, median 134 kbit/s, min 72 kbit/s, max 168 kbit/s

Sigh, now how on earth are we going to integrate the DivX encoder into the test in a fair way? Increase the nero and QT TVBR -q values so that they match the iTunes 128 CVBR and DivX -v 5 average bitrates? Or just kick out DivX?


The next higher Q value for QT TVBR would land us in the 140-150 kbit/s range. Initially it was said that we take CT or DIVX. Maybe we can get closer results for the CT encoder?

PS I thought a little about why my relative average for QT CVBR is much higher than Igor's. I think it is because my collection includes more old recordings. The two less constrained encoders don't allocate more than about 80-90 kbit/s to those while the CVBR mode does barely produce anything below 100 kbit/s. Without those old Jazz albums Nero's and QT's TVBR average would be higher, closer to CVBR.
Title: Public Listening Test [2010]
Post by: IgorC on 2010-02-07 00:40:50
rpp3po

Thank you.

Ok, I see. CVBR 121 is more closer to TVBR 60. As Windows users haven't access for odd CVBR bitrates I will ask you to encode files for the test.


Alex B,
I sent email to Divx delevoper to see if it's possible to get ~128-130 kbps.
Title: Public Listening Test [2010]
Post by: rpp3po on 2010-02-07 00:47:34
No problem, Igor. I can do that.

While you have written this, I added some thoughts to my last post about why our averages were different. I think without the old part of my Jazz collection (and Jazz was quite overrepresented), the three averages could have been closer together (>130kbit/s). So I could live with both choices: providing custom, Apple encoded CVBR samples or accepting iTunes as encoder for CVBR. Whatever finds the most consent...
Title: Public Listening Test [2010]
Post by: IgorC on 2010-02-07 00:55:55
Your old Jazz music is part of real scenario. 
On my collection CVBR 128 has also slightly higher bitrate 131 kbps while TVBR 60 is 129 kbps for my and your collections.

CVBR 128 is a little bit higher (2-4%) than TVBR 60.
CVBR 121 is on par but with some old recorded music.

Then CVBR 124-125 would be the best balance.
Title: Public Listening Test [2010]
Post by: .alexander. on 2010-02-11 09:45:28
(http://i49.tinypic.com/2n8mzkg.jpg)

Above is a graph of average bitrate for looped emese sample. iTunes / 128kbps
Title: Public Listening Test [2010]
Post by: C.R.Helmrich on 2010-02-11 10:39:13
Thanks, Alexander. Is that CBR? If so, could you do the same for iTunes CVBR 128?

Thanks,

Chris
Title: Public Listening Test [2010]
Post by: .alexander. on 2010-02-11 13:06:26
Thank you for the interest Chris,
The sample was encoded with iTunes VBR so probably it is CVBR.

Title: Public Listening Test [2010]
Post by: rpp3po on 2010-02-11 14:59:08
Very interesting! Have you made this by hand or is there a tool that could do that?
Title: Public Listening Test [2010]
Post by: .alexander. on 2010-02-11 15:58:47
Very interesting! Have you made this by hand or is there a tool that could do that?


mp4box can split streams, but I just extracted each frame into separate file
"mp4box -raws 1 emese_looped.m4a"

Also, I made simple m-script to
* remove silence
* trim emese to contain multiple of 1024 samples
* compensate encoder delay (add silence and skip first 2 aac frames latter)
* write loop using writewav (it can append)
* collect bitrate data

Title: Public Listening Test [2010]
Post by: C.R.Helmrich on 2010-02-11 19:04:20
I see. Then I wonder what the graph for QuickTime TVBR would look like...

Chris
Title: Public Listening Test [2010]
Post by: rpp3po on 2010-02-12 02:28:03
I am also very interested to see a comparison to the TVBR version, if you find the time. Sadly I'm not practiced in m myself.

[attachment=5721:emese_qt..._highest.m4a]
Title: Public Listening Test [2010]
Post by: guruboolez on 2010-02-12 10:50:19
Using a modified version of faad released by Ivan Dimkovic some years ago, this is the bitrate distribution I get with your sample, rpp3po:

(http://img168.imageshack.us/img168/7537/clipboard01cr.png)
Title: Public Listening Test [2010]
Post by: .alexander. on 2010-02-12 11:30:21
The CVBR looks different since I had to update quicktime to use qtaacenc.

(http://i49.tinypic.com/2rftvrp.jpg)

Note this is bitrate distribution for looped emese sample. And each point corresponds to bitrate of one repetition of emese sample.

EDIT: legend
Title: Public Listening Test [2010]
Post by: rpp3po on 2010-02-13 15:38:00
I would never have thought, that the average bitrate variation for CVBR would be that large depending on iterations. Could that effect be mitigated by adding pre-silence to the test samples?
Title: Public Listening Test [2010]
Post by: IgorC on 2010-02-13 16:11:53
As AlexB has found that Divx hadn't suitable VBR option for 128 kbps http://www.hydrogenaudio.org/forums/index....st&p=686087 (http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=77272&view=findpost&p=686087)

I still didn't receive the answer from Divx developer.One week has passed.
He answered very late before.

Let's make decision by ourselfs.

Possible Divx settings:
a)  v4/v5 alternate setting which produces more close bitrate to other competitors on specific sample. I know it's not the best idea but still an option.
b) CBR 128.
Title: Public Listening Test [2010]
Post by: C.R.Helmrich on 2010-02-13 16:19:10
I would never have thought, that the average bitrate variation for CVBR would be that large depending on iterations. Could that effect be mitigated by adding pre-silence to the test samples?

Probably not, pre-silence doesn't empty the bit reservoir. I think the best thing would be to concatenate all test samples into a single file (it seems this is commonly done while standardizing MPEG coders), and then to prepend a few seconds of noise to the beginning of that file so that the first sample is encoded with a half empty bit reservoir state.

Chris
Title: Public Listening Test [2010]
Post by: rpp3po on 2010-02-13 16:46:53
That sounds like a good approach and not too much overhead. A little script employing mp4box could be used to produce readily cut file sets after encoding.

I think Divx should be CBR 128, with the developers' option to provide a better matching preset if they want to improve their chances in the competition.
Title: Public Listening Test [2010]
Post by: C.R.Helmrich on 2010-02-13 17:19:08
Agreed. Agreed.
Title: Public Listening Test [2010]
Post by: guruboolez on 2010-02-13 20:36:47
During the last listening tests the two first seconds of each encoding were discarded (by ABC/HR). Isn't it enough to avoid some technical issues?
Title: Public Listening Test [2010]
Post by: hellokeith on 2010-02-14 07:29:39
During the last listening tests the two first seconds of each encoding were discarded (by ABC/HR). Isn't it enough to avoid some technical issues?


Why on earth would you throw out the first two seconds???    There should be one and only one lossless source file.  If trimming needs to be done to the beginning or ending, it should be done on the lossless source file, which will propogate to all the lossy files.  The ABC/HR application should never be altering the audio data.
Title: Public Listening Test [2010]
Post by: guruboolez on 2010-02-14 09:36:10
It was explained some years ago. Some elements of answers:
http://www.hydrogenaudio.org/forums/index....c=29555&hl= (http://www.hydrogenaudio.org/forums/index.php?showtopic=29555&hl=)
Title: Public Listening Test [2010]
Post by: C.R.Helmrich on 2010-02-14 12:04:31
Yes, but some test samples are only 5-6 seconds long. I wouldn't want to throw away precious 25 or more percent of that.

Chris
Title: Public Listening Test [2010]
Post by: Alex B on 2010-02-14 13:10:35
In addition to the link Guruboolez provided, here are some other relevant links:

http://www.hydrogenaudio.org/forums/index....mp;#entry343318 (http://www.hydrogenaudio.org/forums/index.php?showtopic=38936&st=0&p=343318&#entry343318)
http://www.hydrogenaudio.org/forums/index....mp;#entry447231 (http://www.hydrogenaudio.org/forums/index.php?showtopic=49803&st=125&p=447231&#entry447231)
http://www.hydrogenaudio.org/forums/index....mp;#entry382267 (http://www.hydrogenaudio.org/forums/index.php?showtopic=42696&st=125&p=382267&#entry382267)

Edit: these are the initial related posts in the linked threads. Related replies may follow after a few unrelated other replies.

Yes, but some test samples are only 5-6 seconds long. I wouldn't want to throw away precious 25 or more percent of that.

So as an AAC developer, can you tell if your AAC encoder needs some time to adopt to the content before it provides the best possible quality?

Regardless of the encoder specific behavior, I think that if you want to test a certain very short passage you should simply encode a sample that starts 2 seconds earlier. If the original source track actually starts with the critical passage that is intended to be tested you can always configure that sample to start from the beginning. It would be fair for all encoders. You would then be testing how the encoders can handle a track that starts with such content.


iTunes CVBR may be a real problem if its behavior is inconsistent. We discussed about a bit similar problem when WMA 2-pass VBR was one of the possible test contenders.

In general, the only really correct and fair way to simulate a real life usage situation would be to encode the complete original source tracks, decode the encoded tracks, cut the test samples from the decoded tracks, and store the samples in a lossless format. This has been discussed in the past, but it has never been a viable option for various practical reasons.
Title: Public Listening Test [2010]
Post by: guruboolez on 2010-02-14 13:43:28
Yes, but some test samples are only 5-6 seconds long. I wouldn't want to throw away precious 25 or more percent of that.

Chris

If needed I can easily upload a longer sample for emese (though I'm not convinced that this kind of extreme sample really belongs to this listening test).
Title: Public Listening Test [2010]
Post by: rpp3po on 2010-02-14 14:15:11
For AAC at ~128 kbit/s rates, extreme samples like emese are the bread & butter of this test.
Title: Public Listening Test [2010]
Post by: guruboolez on 2010-02-14 15:25:27
That's right for a public listening test. That's also why such test at 128 kbps test is already doomed.
If you don't want an « all-tied » conclusion like previous tests at this bitrate you necessary have to feed the listeners with extremely difficult to encode samples - which are also unrepresentative ones (by unrepresentative, I mean that the tested material won't correspond to what people are listening on a daily basis). Such methodology is - be sure of that - a very good argument against the validity of the test. Of course we may put one or two special samples for pedagogic purpose, in order to show that lossy encoders could fail on extreme material. But not more.
With more « musical » (difficult but normal) samples the conclusion should be the same than previous LT at 128 kbps : a complete statu quo. I don't think it's worth to put so much energy to show that 128 kbps encoders are equally transparent for a panel of 25...30 persons.

http://www.listening-tests.info/mp3-128-1/results.htm (http://www.listening-tests.info/mp3-128-1/results.htm)
http://www.listening-tests.info/mf-128-1/results.htm (http://www.listening-tests.info/mf-128-1/results.htm)
Title: Public Listening Test [2010]
Post by: rpp3po on 2010-02-14 15:43:04
A "representative" test would just show what we already know. Why should anymore time be wasted on that? There is no need to prove again that all tested encoders can be transparent for most material. The 2010 edition tries to explore the tiny range between most and 100%. Which encoder comes closest? An thus it bears the potential to actually produce new results, we didn't know already: Like is TVBR really better than CVBR or even the other way around, if TVBR choked on some content? How does Nero compare to QT? And what about the new contenders. All of them should be perfectly fine for most material, we don't need to test that again. And going for more representative music selection at lower bitrates would just show which encoder produces the best low bitrate results and not necessarily tell anything about transparency potential in the bitrate range people actually use.
Title: Public Listening Test [2010]
Post by: guruboolez on 2010-02-14 18:05:53
If I understand your last message correctly, I think I'm very far to share any interest in the future test (I apologize from discovering it right now: I haven't read the full debate).
Extreme samples were usually helpful to test encoders under stressing situation and then validate the choice of any high bitrate settings (like lame --alt-preset, musepack, lossywav, etc...). People using such high lossy encodings expect transparent results even in extreme cases (at least in most extreme ones). But 128 kbps is very far from high bitrate and I don't think many people would expect from such low bitrate a robust, artefact-free, music library.

So if I understand all this correctly the future test should tell us how good (or bad) are 128 kbps encodings under extraordinary situations but won't give us any idea about how different they could be with music for daily usage (I assume it was what people are expecting from a test at 128 kbps). If I'm right, the practical interest in such test seems very limited. At least to me.

Regards.
Title: Public Listening Test [2010]
Post by: rpp3po on 2010-02-14 19:20:02
But 128 kbps is very far from high bitrate and I don't think many people would expect from such low bitrate a robust, artefact-free, music library.


The general sentiment was that 128 is even a very high bitrate for an AAC listening test. It was expected that at that bitrate, with normal samples, only very few participants would be able to contribute anything at all, because most wouldn't hear a difference.
Title: Public Listening Test [2010]
Post by: guruboolez on 2010-02-14 20:36:00
128 kbps is indeed high for a public listening test of modern encoders. That's why I really think it would be judicious to not make a new one rather than forcing the chance to get detailed results by using a lot of meaningless samples.

I realize that shouldn't have open the debate about the chosen bitrate. I'm sorry for that and will stop it right now.
I repeat my offer: if needed I can upload a longer version of the emese sample (from the original CD) and maybe other samples if I can.
Title: Public Listening Test [2010]
Post by: C.R.Helmrich on 2010-02-14 22:20:57
Thanks for the offer, guruboolez! I'm sure we will need to make use of it.

... but won't give us any idea about how different they could be with music for daily usage ...

Correct, but as rpp3po mentioned, we already know that. Except maybe for the DivX encoder, all encoders in question have been developed for years, so we can expect them to perform equally well (i.e. with satistically insignificant quality differences on average) on daily music. The question is how large the quality differences are for extremely critical material. Sure, an "average" reader might not care about such material, but such a reader can refer to the previous listening tests you referred to.

By the way, related question: Do you - or does anyone else, for that matter - think a 96-kb test with less critical samples will show greater differences between the encoders?

Chris
Title: Public Listening Test [2010]
Post by: guruboolez on 2010-02-14 22:32:07
Honestly, I don't think it will. I strongly believe that the difference between contenders must be very strong and immediately obvious to appear on the final plots. And I'm pretty sure that the gap between Nero & Apple is far to be large enough (assuming that a gap really exists) for expecting such results.
Title: Public Listening Test [2010]
Post by: hellokeith on 2010-02-15 01:14:22
Quote from: Alex B link=msg=0 date=
In general, the only really correct and fair way to simulate a real life usage situation would be to encode the complete original source tracks, decode the encoded tracks, cut the test samples from the decoded tracks, and store the samples in a lossless format. This has been discussed in the past, but it has never been a viable option for various practical reasons.


If a person is providing a sample, have them follow that procedure prior to upload.  Or at least make sure they understand the first 1-2 seconds will be chopped off, so they may decide to provide a different range.  Alot of work and variables, though.

Quote from: Alex B link=msg=0 date=
Regardless of the encoder specific behavior, I think that if you want to test a certain very short passage you should simply encode a sample that starts 2 seconds earlier. If the original source track actually starts with the critical passage that is intended to be tested you can always configure that sample to start from the beginning. It would be fair for all encoders. You would then be testing how the encoders can handle a track that starts with such content.

Gabriel was in favor of cutting 1-2 seconds, but also he did warn about "scene cut", i.e. the 1-2 seconds being cut need to be representative of the proceeding material, so that the codec has already adjusted by the presented beginning point of the listening sample.  But this is a problem, because the critical passage is directly after a scene change and is not long in duration?  I say this is the same situation as a track that starts with a critical passage.

Trimming seconds seems to cause as many problems as it tries to solve.  That would be a no-go in an engineering process.
Title: Public Listening Test [2010]
Post by: .alexander. on 2010-02-15 08:09:06
You know it just amazes me how HA people can screen out the acctual bitrates of the samples.

THE GREEN LINE IS 25KBPS ABOVE THE BLUE LINE!

And there is no 128kbps level for CBR encoders at all. For what particular reason the listening test should include samples produced by same encoder with 25kbps gap? Each frame of green stream has more bits than the blue one, both produced using same intra-frame routines.

And I can see no difference between bitrate overhead of "constant quality" and related to initial bit reservoir state. What is the difference?

I think the best thing would be to concatenate all test samples into a single file (it seems this is commonly done while standardizing MPEG coders), and then to prepend a few seconds of noise to the beginning of that file so that the first sample is encoded with a half empty bit reservoir state.


This sould also make easier enumeration of encoder's settings in search of equal bitrates.
Title: Public Listening Test [2010]
Post by: Alex B on 2010-02-15 09:21:36
.alexander.,

I think many readers may have difficulties in understanding the graphs.

When you posted your first graph I wasn't quite sure what exactly was looped, measured and how it was done. After seeing your second graph and the related replies I think I got it right. Could you confirm if the following is accurate:

The graph shows the bitrates of 100 instances of the 7-second emese sample, which were put together and encoded as a single AAC audio file. The duration of this file is about 700 seconds. After encoding the file was splitted to individual 7-second AAC segments that were muxed into individual MP4 files. Each dot shows the overall bitrate of one of the resulting 100 MP4 files in the same order as the individual segments were inside the single encoded file. The graph provides no information of how the bitrate varies inside any of these 7 second segments.
Title: Public Listening Test [2010]
Post by: .alexander. on 2010-02-15 09:55:30
Thank you Alex, your explanation is correct and very clear. The only minor remark: I trimmed silence at the begining and the end of the emese and duration of output files is rather 6 but not 7 seconds.
Title: Public Listening Test [2010]
Post by: guruboolez on 2010-02-15 11:33:11
I'm using the modified faad version I mentioned earlier to check the bitrate variation on a looped encoding of emese (20 iterations). I can confirm what .alexander. found by using different tools.
Here's the average bitrate for each individual loop:
Code: [Select]
#1    150.04
#2    138.87
#3    138.09
#4    153.44
#5    141.49
#6    139.85
#7    139.43
#8    137.74
#9    147.72
#10  146.37
#11  139.38
#12  138.96
#13  137.36
#14  140.22
#15  144.19
#16  139.70
#17  139.57
#18  137.98
#19  143.13
#20  153.29

minimum = 137.36 kbps (13th loop)
maximum = 153.44 kbps (4th loop)
average = 142.84 kbps

And now the full bitrate graph of the 20 iterations of emese (6.00 seconds x 20 = 2 minutes of audio material) :

(http://img27.imageshack.us/img27/318/emesex20.th.png) (http://img27.imageshack.us/i/emesex20.png/)
Title: Public Listening Test [2010]
Post by: rpp3po on 2010-02-15 13:17:54
Aren't the bitrate variations overrated lately?

If we had wanted to force each encoder for each sample as close to 128kbit/s as possible, the variations would be a problem. But that's not the usual use case with VBR, anyway. VBR Q values aim at whole collection averages and that's what we are going to use. It's a VBR encoders job to scale up for problematic (and compensate with simple) content. If it isn't doing that job properly this test is going to show that.
Title: Public Listening Test [2010]
Post by: Alex B on 2010-02-15 14:07:27
I think iTunes CVBR is doing something funny. The "funniness" may not be limited to the emese sample.

It behaves more like an ABR setting that uses a very large "ABR frame", at least several seconds, perhaps tens of seconds. In that case encoding any short samples will produce inconsistent, quite arbitrary results.

EDIT

As I said before, the issue could be avoided completely by encoding the complete original tracks and cutting the test samples from decoded files. Then the encoded passages would be exactly correct and there would be no reason to investigate if the samples' short durations have any effect to the encoded data.
Title: Public Listening Test [2010]
Post by: rpp3po on 2010-02-15 14:19:15
If the complete versions of every track can be organized, that would be fine. Else C. R. Helmrich's proposal should also work out well.
Title: Public Listening Test [2010]
Post by: Alex B on 2010-02-15 14:42:07
... Else C. R. Helmrich's proposal should also work out well.

Combining the samples into a stream that is encoded as a single file would not change the possibly inconsistent behavior. For instance, the passage that contains the emese sample might be quite different from a passage that is cutted from a separately encoded complete emese track.

At least we would need to know why the encoder behaves like it does, so that the issue could be reliably reproduced with a few different audio samples and its severity considered.
Title: Public Listening Test [2010]
Post by: rpp3po on 2010-02-15 15:53:21
Yes, if CVBR is behaving as if it had a several second memory, that might really screw results in comparison to encoders having a shorter attention span, when that memory is filled with data from a preceding, uncorrelated track.
Title: Public Listening Test [2010]
Post by: C.R.Helmrich on 2010-02-15 20:33:35
It seems iTunes CVBR in its latest version does bitrate handling in a much weirder way than I thought. The plot for the previous version which alexander posted (http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=77272&view=findpost&p=687013) ist much more in line with what I expected. The problem is that many of the samples I'd like to see in the test are actually the beginning of songs ("Since Always" and Fatboy Slim's "Kalifornia", for example). Although I like Alex B's suggestion of encoding entire tracks and extracting relevant parts (or at least encoding from the beginning of the track up to the relevant passage), I'm afraid there will not always be much leading music to cut off.

I think it's time to focus on the item selection (http://www.hydrogenaudio.org/forums/index.php?showtopic=77584).

Chris

P.S.: It's nice to see, though, that there are multiple ways to monitor a file's bitrate usage over time. We could make use of that when encoding the files for the test. If some strange up-down jumps like in the above plot occur, we can decide on a per-item basis on how to proceed.
Title: Public Listening Test [2010]
Post by: rpp3po on 2010-02-15 20:50:08
I have thought about all of this again. We are using collection based presets, anyway. Because of that, we aren't interested that much in actual bitrates, but more what quality is delivered for a specific Q value, representing a 128 kbit/s broad collection average, for a specific track. So if bitrate behaves strangely with an empty reservoir at the beginning of a track, it would also do so in a real life encoding of that track. And if an encoder messes up when an input starts the action without foreplay, it is the encoders fault and should not be fixed manually. So encoding full length tracks and then cutting out the relevant sections, but not more, would be as close to real life as it can get.
Title: Public Listening Test [2010]
Post by: Alex B on 2010-02-15 21:12:11
It seems iTunes CVBR in its latest version does bitrate handling in a much weirder way than I thought. The plot for the previous version which alexander posted (http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=77272&view=findpost&p=687013) ist much more in line with what I expected...

I agree that the first plot looks normal. Unless the Apple developers have created a new innovative system that actually works fine in real life situations, the behavior in the second plot may also be caused by a bug in the new version.

Last time when a listening test was prepered we found a serious bug in Apple's MP3 encoder. The bug had existed for years.

The Apple developer's should be informed about this issue so that could check the changed behavior.
Title: Public Listening Test [2010]
Post by: IgorC on 2010-02-19 23:07:02
Summary:

DivX -v 4
Various   123.72
Classical   105.52
Average   114.62

DivX -v 5
Various   147.56
Classical   125.00
Average   136.28


I think it's not quite correct to calculate average between Various and Classic.
Some codecs produce low bitrate at quite music while other codecs at loud music.
The opposite of classic music will be loud metal.
It will be more correct to calculate average between loud (metal&hard rock), various (middle loudness) and classic (quite music).

Divx's setting is CBR.
Title: Public Listening Test [2010]
Post by: IgorC on 2010-03-06 01:43:16
Apple's dev was informed about possible issue of iterations.

.alexandr.
I have try a few samples and only emese has iteration anomality. Confirmation will be good.


Title: Public Listening Test [2010]
Post by: skuo on 2010-03-09 18:40:22
The CVBR looks different since I had to update quicktime to use qtaacenc.

(http://i49.tinypic.com/2rftvrp.jpg)

Note this is bitrate distribution for looped emese sample. And each point corresponds to bitrate of one repetition of emese sample.

EDIT: legend


This is expected behavior for constrained_VBR. For offline applications, VBR is always the best option, and it won't show this type of irregular behavior.
Title: Public Listening Test [2010]
Post by: IgorC on 2010-03-09 21:31:25
It's still ok to include both TVBR and CVBR at least for me.
Or should we exclude CVBR or do some workaround (iterations)?


Almost definitive list of AAC encoders and settings is:
1. Nero -q 0.41  (-q0.415?)
2. Apple --tvbr 65 --highest 
3. Apple --cvbr 124 --highest  . Discussion (http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=77272&view=findpost&p=686112) of CVBR bitrates
4. Pre-test:
4a. Divx CBR 128
4b. CT CBR 130
Title: Public Listening Test [2010]
Post by: googlebot on 2010-03-10 00:23:18
This is expected behavior for constrained_VBR. For offline applications, VBR is always the best option, and it won't show this type of irregular behavior.


If that is the position at Apple, I find this very interesting. Currently 'playback' within the iTunes eco system means offline use. Every device has got its own music storage. If Apple sticks to constrained VBR, for its streaming benefits and despite its disadvantages, to me this is clearly a hint at an upcoming shift towards a more streaming centered model.
Title: Public Listening Test [2010]
Post by: C.R.Helmrich on 2010-03-13 16:49:51
The CVBR looks different since I had to update quicktime to use qtaacenc.
Note this is bitrate distribution for looped emese sample. And each point corresponds to bitrate of one repetition of emese sample.

EDIT: legend

This is expected behavior for constrained_VBR. For offline applications, VBR is always the best option, and it won't show this type of irregular behavior.

skuo, then why does your previous encoder version (posted earlier) (http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=77272&view=findpost&p=687013) show a very different, much more expected, behavior?

Igor, all, if the emese@CVBR problem persists, I'm afraid we have to use an iteration giving the lowest bit rate of 135.5 kbps for fairness sake, e.g. loop #2.

Chris
Title: Public Listening Test [2010]
Post by: IgorC on 2010-03-13 19:31:31
Igor, all, if the emese@CVBR problem persists, I'm afraid we have to use an iteration giving the lowest bit rate of 135.5 kbps for fairness sake, e.g. loop #2.

But CVBR has this issue only with this particular (emese) sample. I will check all test samples on bitrate regularity.
Title: Public Listening Test [2010]
Post by: IgorC on 2010-03-15 23:06:12
During selection of samples the issue with CVBR bitrate regulaity persisted too often.  CVBR should be exlcude from test. Too many problems. Odd bitrate shifting (--cvbr 124 work only on MAC) to be comparable to TVBR as well.
As Skuo  (Apple's dev) suggested to go with TVBR. Let's do it.
Title: Public Listening Test [2010]
Post by: greynol on 2010-03-15 23:13:35
You're going to alienate massive amounts of people, but it's your test, so it's your choice.
Title: Public Listening Test [2010]
Post by: IgorC on 2010-03-15 23:20:17
It's not just my test. It's public. All decisions are made with agreement of major part (>=50%).

There maybe one solution more before excluding of CVBR.
Loop sample and choose chunk with minimal or middle bitrate. Smells like chemistry to me. Minimal or middle bitrate?

Suggestions are welcomed.
Title: Public Listening Test [2010]
Post by: greynol on 2010-03-15 23:24:54
Why not figure out what the bitrate of the chunk is by doing what people normally do: encode the entire track.
Title: Public Listening Test [2010]
Post by: IgorC on 2010-03-15 23:30:52
I wish there is any possiblity for that.
Who will do that work?
Find full lenght tracks for all 18-20 samples.
Try to find all submmiters and ask for upload? Last time I tried to communicate with a pair of dudes... never got answer.
Title: Public Listening Test [2010]
Post by: greynol on 2010-03-15 23:33:05
Can aac be split easily like mp3?
Title: Public Listening Test [2010]
Post by: IgorC on 2010-03-15 23:40:58
Well, I don't know if it's simple like MP3 but yamb/mp4box can do that. http://yamb.unite-video.com/download.html (http://yamb.unite-video.com/download.html)
Title: Public Listening Test [2010]
Post by: C.R.Helmrich on 2010-03-16 09:32:05
Why not figure out what the bitrate of the chunk is by doing what people normally do: encode the entire track.

For some items, the test sample actually is the entire track. And for some other items, the test sample equals the first few seconds of the track. Only relatively few items are an excerpt of the middle part of a song.

That being said, I agree that kicking out iTunes CVBR is problematic, since that's what many people on this globe are using. Remember, not everyone knows about (and how to use) qtaacenc.

How about using iTunes 128kb/s CVBR (default quality, available even on Windows), loop each sample, check the bit rates, and take the loop giving the lowest bit rate (as the above plot shows, there obviously is a clear lower bit rate border)?

Chris
Title: Public Listening Test [2010]
Post by: IgorC on 2010-03-16 14:09:30
How about using iTunes 128kb/s CVBR (default quality, available even on Windows), loop each sample, check the bit rates, and take the loop giving the lowest bit rate (as the above plot shows, there obviously is a clear lower bit rate border)?

Should be the middle bitrate choosen as it has highest probability (mean value)?
Title: Public Listening Test [2010]
Post by: C.R.Helmrich on 2010-03-17 09:47:04
Should be the middle bitrate choosen as it has highest probability (mean value)?

Does the mean bit rate really have the highest probability? In alexander's post #181, the minimum bit rate is most likely to occur.

I still prefer my above proposal. 128 kb/s is a bit on the high side, but taking the minimum loop bit rate levels things out a bit. Plus it's reproducible using only iTunes on Windows.

Chris
Title: Public Listening Test [2010]
Post by: IgorC on 2010-03-17 15:51:22
Yes, the chunk with lowest bitrate has bitrate very close to average bitrate on emese x 100 loops.

Average bitrate for emese sample x 100: 136 kbps.
The chunk with lowest bitrate: 135 kbps.

As CVBR 128 produces slightly higher bitrates comparing to TVBR 60 the selection of chunk with lowest bitrate will be more appropriated.

Then settings are
1. Nero -q 0.41
2. Apple --tvbr 65 --highest
3. Apple --cvbr 124 128 --highest .
4. Pre-test:
4a. Divx CBR 128
4b. CT CBR 130

Title: Public Listening Test [2010]
Post by: C.R.Helmrich on 2010-03-18 09:54:09
3. Apple --cvbr 124 128 --highest .
Please make it --cvbr 128 --normal so that everyone can reproduce the results with iTunes. See discussion following this post (http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=78072&view=findpost&p=682403).

Chris
Title: Public Listening Test [2010]
Post by: IgorC on 2010-03-19 21:55:27
Yes, Chris. It should be --normal.

I had a problem with yamb/mp4box splitter http://yamb.unite-video.com/download.html (http://yamb.unite-video.com/download.html).
It doesn't cut precisely. 29.206 seg is cut to 29.00. Only integer value. I've tried to do 29.000 sec WAV but somehow it still cut imperfectly.

CVBR bitrate distribution for looped x16 samples:
I've tried fatboy_30sec sample.  h*tp://www.mediafire.com/?nz1cmjoj1y5
1st chunk: 133 kbps.
from 2d to 16th chunks: 140 kbps.

It won't be right to choose the chunk with minimal bitrate. I propose to choose average bitrate ( the chunk with 140 kbps in this case).
Title: Public Listening Test [2010]
Post by: C.R.Helmrich on 2010-03-19 23:40:21
It won't be right to choose the chunk with minimal bitrate. I propose to choose average bitrate ( the chunk with 140 kbps in this case).

OK, let's take the median then. Similar to average (also around 140 kbps for fatboy and emese samples), but much less sensitive to outliers. Example: [100 100 100 100 100 100 100 300] has average 125 and median 100.

Chris

P.S.: I was planning to create test samples which are exactly 15 seconds long, so the yamb/mp4box issue shouldn't cause any problems.
Title: Public Listening Test [2010]
Post by: IgorC on 2010-03-20 16:57:05
Settings
1. Nero -q 0.41
2. Apple --tvbr 60 --highest
3. Apple --cvbr 128 --normal *
4. Pre-test:
4a. Divx CBR 128
4b. CT CBR 130

*
CVBR 128 produces slightly higher bitrate than TVBR
http://www.hydrogenaudio.org/forums/index....st&p=686112 (http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=77272&view=findpost&p=686112)
http://www.hydrogenaudio.org/forums/index....st&p=683265 (http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=77932&view=findpost&p=683265)

Also CVBR has unconstant bitrate distribution.
The samples for CVBR will be looped x 32 times and the chunk with bitrate slighlty inferior to median value will be chosen to compensate that extra 2-3 kbps comparing to TVBR.

I think it's fair workaround for all competitors.
Title: Public Listening Test [2010]
Post by: Alex B on 2010-03-21 17:18:46
... CVBR bitrate distribution for looped x16 samples:
I've tried fatboy_30sec sample.  h*tp://www.mediafire.com/?nz1cmjoj1y5
1st chunk: 133 kbps.
from 2d to 16th chunks: 140 kbps.

It won't be right to choose the chunk with minimal bitrate. I propose to choose average bitrate ( the chunk with 140 kbps in this case).

Actually, the fatboy_30s sample is from beginning of the Fatboy Slim's Kalifornia track. I have the CD. Out of curiosity, a did some testing. I cut three samples of various durations (6, 15 and 30 s), encoded the samples and the complete track (5+ min) with @ --cvbr 128 --normal and measured the AAC frame sizes with the modified FAAD decoder that was already used earlier in this thread by guruboolez. The results were a bit surprising. This time all samples produced exactly identical AAC frames. I.e all four files were identical up to 6 s, three files were identical up to 15 s, and two files were identical up to 30 s. Only the last two or three frames in each cutted sample were different from the complete track. If fatboy_30s is going to be included, it would be absolutely correct to encode just the sample without any looping. I will post an Excel table of the measured data in the uploads section.

I think the encoder's bitrate behavior must be checked sample by sample after the samples are selected.

However, perhaps a bit more worrying thing is its tendency to alter the file's volume level. Here is the difference of the source sample and the encoded sample (I used the version I cut myself. Its duration is exactly 30 s (1323000 samples). For decoding the M4a file I used foobar. It appears to be produce a file of accurate length. The two images in the animated gif file are screenshots from Audition):

(http://i238.photobucket.com/albums/ff132/alexb2k/HA/cvbr.gif)

The difference by numbers (from Audition):

Code: [Select]
ORIGINAL

    Left    Right
Min Sample Value:    -32768    -32768
Max Sample Value:    32718    32766
Peak Amplitude:    0 dB    0 dB
Possibly Clipped:    1    4
DC Offset:    -.016     -.004
Minimum RMS Power:    -73.09 dB    -84.87 dB
Maximum RMS Power:    -5.45 dB    -7.44 dB
Average RMS Power:    -17.05 dB    -16.79 dB
Total RMS Power:    -16.12 dB    -15.8 dB
Actual Bit Depth:    16 Bits    16 Bits

Using RMS Window of 50 ms

CVBR

    Left    Right
Min Sample Value:    -31040    -32768
Max Sample Value:    32767    32767
Peak Amplitude:    0 dB    0 dB
Possibly Clipped:    1    9
DC Offset:    -.009     -.004
Minimum RMS Power:    -77.87 dB    -84.53 dB
Maximum RMS Power:    -5.95 dB    -8 dB
Average RMS Power:    -17.34 dB    -17.11 dB
Total RMS Power:    -16.45 dB    -16.18 dB
Actual Bit Depth:    16 Bits    16 Bits

Using RMS Window of 50 ms

The difference in the Replay Gain value is about 0.5 dB (measured by foobar), but I don't think the problem can be fixed simply by adjusting the playback gain because the encoder seems to adjust some AAC frames more than others.

EDIT:

The Excel table is available here: http://www.hydrogenaudio.org/forums/index....st&p=695191 (http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=77994&view=findpost&p=695191)
Title: Public Listening Test [2010]
Post by: C.R.Helmrich on 2010-03-21 19:17:21
Thanks a lot for the bit rate experiments, Alex! I'll take a look at the Excel sheet later. Igor and I are currently also investigating the looped bit rate issue on the emese sample.

Regarding the volume alterations, I think this is the limiter [edit: compressor] being triggered in Apple's encoder. It will hopefully be circumvented in the listening test by not allowing sample values higher than about -2.5 dBFS (75%), i.e. lowering the volume of affected files. I will match the loudness levels of all samples under test, so that the listeners won't have to adjust the playback volume during the test (one less source of distraction).

Chris
Title: Public Listening Test [2010]
Post by: Alex B on 2010-03-21 19:56:20
Regarding using a looped sample, I think it would be difficult to cut it accurately without first decoding it. Perhaps the sample should be decoded and the wave file cut in order to get it right. Then that sample should be provided in a lossless format.

Edits to my above reply:
-- It appears to be produce a file of accurate length.
-- with qtaacenc @ --cvbr 128 --normal
-- I used the version I cut by myself
(I don't particularly like the current forum policy that doesn't allow to fix mistakes after one hour.)
Title: Public Listening Test [2010]
Post by: IgorC on 2010-03-23 03:11:50
Ok, then fatboy_30s sample will be encoded as is without loops.

As Alex has already said about wav decoded file.
Encoded AAC files have slightly different lenght  to lossless files. It's complication when it comes to split.

Possible solution for CVBR is
1. Add some silence to beginning and end of each reference lossless sample
2. Loop it  to x16
3. Encode it to AAC
4. choose the chunk with desired bitrate.
3. Decode it to wav and align it to be compared to reference.

It works.
Title: Public Listening Test [2010]
Post by: C.R.Helmrich on 2010-03-23 21:08:35
Regarding using a looped sample, I think it would be difficult to cut it accurately without first decoding it. Perhaps the sample should be decoded and the wave file cut in order to get it right. Then that sample should be provided in a lossless format.

It shouldn't be a problem when the item length is an integer multiple of 1024 samples, the frame length. That's why I propose 15-second samples: 15*44100/1024 = 645.996. Add 4 samples (one millisecond), and we get exactly 646.

Alex or Igor (or both), can you take your 15-second fatboy sample, add 1msec silence so that it's 661504 samples long, loop it, encode it with iTunes CVBR, then split the looped MP4 encode into 15-second chunks, and report on the file size of each chunk? That would be interesting.

Chris
Title: Public Listening Test [2010]
Post by: IgorC on 2010-03-24 02:50:30
Add 4 samples (one millisecond), and we get exactly 646.

Is the period of one sample equal to 1/44100 ~ 22.68 us?
us - microseconds

That is problem. We can't cut the reference WAV with precision of us but only +/- 1mseg

The possible solution for CVBR from my previous post is still an option.
Title: Public Listening Test [2010]
Post by: C.R.Helmrich on 2010-03-24 11:48:49
This shouldn't be a problem. MP4 bitstreams can only be cut at frame borders, anyway, so if you specify a cut point which doesn't coincide with a frame border, I assume it will round the cut point to the nearest frame.

Chris
Title: Public Listening Test [2010]
Post by: IgorC on 2010-03-24 20:57:28
Good.
But 4 samples aren't 1 ms.
Title: Public Listening Test [2010]
Post by: C.R.Helmrich on 2010-03-24 22:20:05
Ooops, true

Chris
Title: Public Listening Test [2010]
Post by: IgorC on 2010-03-24 22:49:36
Maybe I'm starting to be annoying but I think my solution is enough optimal as it gets rid of precision issues.
Title: Public Listening Test [2010]
Post by: C.R.Helmrich on 2010-03-25 21:59:24
You're not annoying, and your solution might be fine, but it seems you haven't understood what I'm trying to find out. I'm trying to check whether it's necessary to loop in the first place. I'm not convinced that this is really the case. It can be verified as follows:


So, I'd like to repeat my request:
Quote
Alex or Igor (or both), can you [...] loop it, encode it with iTunes CVBR, then split the looped MP4 encode into 15-second chunks, and report on the file size of each chunk? That would be interesting.

Please?

Thanks,

Chris
Title: Public Listening Test [2010]
Post by: lvqcl on 2010-03-25 22:39:34
I tried to do this test:

1st chunk - 158 kbps.  2nd and all other chunks: 152 kbps.

Chunks 2, 3, 4... are identical. The difference between 1st and 2nd chunks is mostly within first 2.6 seconds.
Title: Public Listening Test [2010]
Post by: C.R.Helmrich on 2010-03-26 13:52:35
Thanks, lvqcl! I see. The difference in the bit rate during the first seconds is due to 1. encoder delay (one or two more frames so that the first frame can be decoded correctly) and 2. full bit reservoir. Both phenomena are unimportant if we concatenate all 16 test items and encode/decode them in one go (only the first item in the concatenation will be affected, but we will probably chop off the first 2 seconds of that item after decoding, so everything should be fine). So from my point of view, we don't need the loop thing.

Can anyone else confirm lvqcl's findings, preferably with a different item, e.g. the first 15.0001 seconds of Human_Disease?

Thanks,

Chris
Title: Public Listening Test [2010]
Post by: IgorC on 2010-03-27 16:48:58
Chris or Alex or somebody else.
Please send me 15.0001 sec of Human_Desease or any other sample. I have only Nero 6.0 Wave Editor which marks resolution of only +/- 1 ms.

Is there free wave editor with high resolution (microseconds) like Audition?
Title: Public Listening Test [2010]
Post by: IgorC on 2010-03-27 18:39:46
Yes, Chris.
You are right. CVBR behavior is normal on sources with 1024 multiple length.
I tried sources with 661504 samples (15 sec * 44100 + 4 samples) and all chunks for emese and human disease have the same bitrate.

If we are going to use CVBR 128 without looping then we should raise Nero q setting a little bit. 0.41 -> 0.415
Title: Public Listening Test [2010]
Post by: C.R.Helmrich on 2010-03-27 19:28:58
Before I purchased Audition, I used GoldWave to cut and edit samples: www.goldwave.com/ (http://www.goldwave.com/)

I think now that the looping issue seems to have disappeared, we should revert to the intermediate decision to run CVBR at 124 kb/s or so. Otherwise, we'd have to raise the bit rate for TVBR as well, right? IIRC, halb27 or rpp3po offered to encode using iTunes on a Mac?

Edit: Found the original discussion here (http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=77272&view=findpost&p=686083). Since I can't find anything previous of that sort: rpp3po, do you volunteer to CVBR-encode the concatenated test sample when it becomes available?

Chris
Title: Public Listening Test [2010]
Post by: IgorC on 2010-03-27 19:43:01
Yes, CVBR 124 kbps is better to be compared to --TVBR 60  and Nero q041.
Who is MAC user?

rpp3po was going to encode to CVBR 124 but he isn't with us anymore. http://www.hydrogenaudio.org/forums/index....showtopic=79075 (http://www.hydrogenaudio.org/forums/index.php?showtopic=79075)
He was great guy here.
Title: Public Listening Test [2010]
Post by: C.R.Helmrich on 2010-03-27 19:51:55
Oh my, I didn't see that thread! That's so sad  Thanks for pointing me to it!

Bit off-topic, but I still have some messages from him in my inbox. It was about this delay removal tool Synchrotron (http://www.hydrogenaudio.org/forums/index.php?showtopic=72560). Maybe we should give that tool a try as a tribute to/in memory of him (or however you say that in English...)?

Chris
Title: Public Listening Test [2010]
Post by: .alexander. on 2010-03-29 11:07:47
I think now that the looping issue seems to have disappeared.


Well then, it's a good time to make bets about (bitrate overhead) vs (quality scores) correlation coefficient.
Title: Public Listening Test [2010]
Post by: C.R.Helmrich on 2010-04-01 09:26:48
Fyi #1, Apple seems to have updated its encoder. See nao's version number report (http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=74381&view=findpost&p=697685). Let's test this new version even if it turns out to deliver bit-identical encodings for our target bit rate.

Fyi#2, we are still open for suggestions on which item to use as test sample #16 (http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=77584&view=findpost&p=695576). If you know a sample which is a piece-of-cake ABX at 128 kb and which has not been mentioned in that thread, please don't hesitate to let us know about it. The deadline is today... well, maybe the end of the week

Thanks,

Chris
Title: Public Listening Test [2010]
Post by: IgorC on 2010-04-14 03:55:33
The choice of samples is completed now.

The problem is bitrate.

1. Is anybody MAC user here? We need shifted bitrate for CVBR --124 and it's possible only under MAC.

2. As already all people know we choose CBR (there is no VBR that would feet into average 130 kbps) for Divx but it doesn't produce 128 kbps neither. The real bitrate is 125-125.5 kbps.  It's at least 2.7% less compared to other encoders. Is it ok for you or should we do not include Divx encoder into test?

Suggestions are welcomed.
Title: Public Listening Test [2010]
Post by: IgorC on 2010-04-14 04:36:09
At this moment we have encoded samples with:
1. QuickTime 7.6.6 . TVBR 60 Highest. Here it produces the same output as older 7.6.5
2. Nero 1.5.4.0 -q 0.41 (VBR)
3. CT encoder 8.2.0 (Winamp 5.5.572 (January 13, 2010)  dll & MediaCoder GUI) 130 CBR
Title: Public Listening Test [2010]
Post by: halb27 on 2010-04-14 08:45:36
... Divx but it doesn't produce 128 kbps neither. The real bitrate is 125-125.5 kbps.  It's at least 2.7% less compared to other encoders. Is it ok for you or should we do not include Divx encoder into test?

Suggestions are welcomed.

I wouldn't care about a 2.7% difference.
Title: Public Listening Test [2010]
Post by: C.R.Helmrich on 2010-04-14 13:44:31
Me neither. But for fairness sake I will evaluate the DivX encoder against a 128-kb bitstream of the CT 8.2 encoder (true bitrate about 126 kb).

Igor, please prepare/send the DivX and CT 128-kb bitstreams so I can do the pre-test. Thanks!

Edit: Found someone at the office with a full version of Winamp, and we noticed that it's possible to create both MP4- and ADTS-formatted LC bitstreams (format conversion in the playlist library). See this tutorial (http://blog.winamp.com/2009/11/06/how-to-convert-to-mp3/). So actually, it won't be necessary to make a 130-kb encoding to compensate for ADTS overhead.

Chris
Title: Public Listening Test [2010]
Post by: Alex B on 2010-04-14 15:07:59
DivX -v 5 = 136.3 kbps
iTunes 128 vbr = 133.6 kbps
QT tvbr 65 highest = 130.2 kbps
Nero -q 0.42 = 132.9 kbps
CT (Winamp 5.572) = 128.0 kbps
   
average   = 132.2 kbps  (=> test "target" = 132 kbps)

The bitrates are approximately within +- 3%.

My reasoning - the bitrate tests in Excel format (I tested my usual 25+25 complete tracks):
http://www.hydrogenaudio.org/forums/index....st&p=700607 (http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=77994&view=findpost&p=700607)


Some notes:

-- Nero -q 0.42 would be closer to the average than -q 0.41 with my test set. (I tested 0.41, 0.42 and 0.43)

-- For QT CVBR, please don't consider using anything else than a setting that can be found in iTunes.

-- Winamp 5.572 "MP4 LC-AAC 128 kbps" produces 128 kbps files according to foobar. (not 130 kbps)

-- Use the real Winamp program for encoding "CT/Winamp MP4 LC-AAC", not MediaCoder.

-- DivX -b 128 produced 124.8 kbps with my test set - propably caused by this:
Quote
Known issues for DivX Plus HD AAC Encoder Beta 1:

The CBR mode does not pad frames if the input audio complexity does not require using all of the available bitrate.  ...
(a quote from: http://labs.divx.com/node/11682 (http://labs.divx.com/node/11682))


Sorry, I don't have time to write more now. Testing and creating the Excel sheet took too much time already and I am late from something else...


EDIT

A quick edit though,

-- It would be interesting to see how DivX VBR works even though its bitrates are on the "high side" with some complex tracks. IMO, its behavior is more like true VBR than Nero's or QT's "true" VBR behavior. It is more in the line with LAME, Vorbis or Musepack VBR behavior. For comparison I added LAME -V5 to my Excel sheet.

-- I used the latest versions of all SW (checked the versions today).

-- Fixed the "quote" link.

-- I have posted some useful instructions for the DivX encoder here: http://www.hydrogenaudio.org/forums/index....st&p=686087 (http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=77272&view=findpost&p=686087). Additional info can be found from the above "quote" link.
Title: Public Listening Test [2010]
Post by: IgorC on 2010-04-14 17:41:11
Alex B
A big thank you.  I'm totally agree with all your statements and your setting will be used if everybody agree.
Only one thing. I've sent a PM to Benski (Winamp's developer) asking  if bitrate shifting  132 for CT (MediaCoder) is ok. While real bitrate for 132 is actually 133. 133 is average for other 4 encoders.

Chris, I will send you all samples as soon as we reach agreement about CT encoder.
Sorry.
Title: Public Listening Test [2010]
Post by: C.R.Helmrich on 2010-04-14 19:32:17
Thanks, Alex!

Igor, I already got the following bitstreams:


So the only thing missing is the DivX VBR bitstream (I agree with Alex, VBR is more interesting in our case).

If you want I can forward you my bitstreams for cross-checking tomorrow.

Chris
Title: Public Listening Test [2010]
Post by: benski on 2010-04-14 20:03:05
Alex B
A big thank you.  I'm totally agree with all your statements and your setting will be used if everybody agree.
Only one thing. I've sent a PM to Benski (Winamp's developer) asking  if bitrate shifting  132 for CT (MediaCoder) is ok. While real bitrate for 132 is actually 133. 133 is average for other 4 encoders.

Chris, I will send you all samples as soon as we reach agreement about CT encoder.
Sorry.


Sure, setting the bitrate manually shouldn't result in any problems.  132kbps won't be as "well tuned" as 128kbps but the quality should always be nominally better than 128kbps
Title: Public Listening Test [2010]
Post by: C.R.Helmrich on 2010-04-14 20:14:52
Good, then we take Winamp 132kb MP4 and DivX -v5. Once I receive the bitstreams from Igor, we will do the pre-test. If anyone else is interested in joining, please send a personal message to me or Igor. A few extra listeners won't hurt.

Chris
Title: Public Listening Test [2010]
Post by: IgorC on 2010-04-14 20:16:16
ok, I will send you the files in a few minutes. CT 132 and Divx -v5
Title: Public Listening Test [2010]
Post by: Sylph on 2010-04-15 11:35:59
Question: is anyone interested in adding the — if I have the right info — Liquid Audio AAC now in AVS Audio Converter 6 to this listening test?

And Fraunhofer's from MAGIX is also out of the picture?
Title: Public Listening Test [2010]
Post by: IgorC on 2010-04-15 18:07:20
Impossible.
Too late. The sample are already defined and no any other encoder will be accepted.

Before I will start to argue I would like to ask you. Where were you during 3 months of discussions? You knew perfectly about it.

But even if it wasn't late those encoders still had very tiny possibility.
Reasons:
1. Did you ABX Liquid Audio AAC encoder yourself? It's awful as there is no any simple lowpass. No talk about something more advanced tools. 
2. Sizes of packages of MAGIX. 100-200 MB just for AAC encoder?  Apple is exclusion as it's well know here and personally I think it's reasonable price for high quality encoder.
3. Those combos encoders OGG-AAC-MP3-AVR-BLA encoders.... etc etc.. have no serious encoders.
4. AAC from RealPlayer was excluded as it has adware (plus not so good quality CBR encoder) and today it's shame for company who put adware to their products. Maybe MAGIX has no direct adware but it's somehow get some extra control in OS which isn't funny.
Title: Public Listening Test [2010]
Post by: C.R.Helmrich on 2010-04-15 18:44:30
Relax, Igor  My 2 cents:

1. We had defined a limit on the number of encoders in the test. This limit is reached. So we like to test only the best encoders available. I trust Igor when he says that the Liquid Audio encoder is not one of the best.

2. Since the encoder version used in the Magix software, the Fraunhofer encoder has undergone some considerable quality tuning. We decided not to include the latest Fraunhofer encoder because IIRC it's currently not availble in a cheap software or as a demo version "for everyone" (and because I'm helping to coordinate the test).

If you like to know how the current Fraunhofer encoder sounds, I'm sure there will come a time when you can listen to it through some software.

Chris
Title: Public Listening Test [2010]
Post by: Sylph on 2010-04-15 19:11:30
Wow. One wants to help and gets attacked.  Wow. I have MAGIX so it wouldn't cause me any additional effort to encode.

And I actually didn't think you had any knowledge about AAC and any such stuff when this thing started so I thought it was going to flop. But I see C. R. Helmrich joined with many others and it's going great!
Title: Public Listening Test [2010]
Post by: Sylph on 2010-04-15 19:15:33
If you like to know how the current Fraunhofer encoder sounds, I'm sure there will come a time when you can listen to it through some software.

Chris


I think that time will not come, sadly. With every day that passes, it seems less likely a Fraunhofer AAC will be implemented anywhere. Especially with QT & Nero AAC popularity.

But it's strange MAGIX hasn't implemented the latest one in MP3 Maker v16. 
Title: Public Listening Test [2010]
Post by: Alex B on 2010-04-15 21:07:23
Here's a line graph of the bitrates in my test. It shows interesting variation from a track to another. Naturally, it doesn't tell how the bitrate varies inside the files in individual frames, but seems like the encoders don't always agree with each other when they estimate the needed bitrate. For comparison, I included LAME -V5.

(http://i224.photobucket.com/albums/dd212/AB2K/ha/AAC_bitrates.png)

The tested tracks are:
Code: [Select]
Various

1 AC/DC - Highway to Hell
2 Adiemus - Boogie Woogie Llanoogie
3 Barry White - Sho' You Right
4 Björk - Possibly Maybe (Lucy Mix)
5 Davis Bowie - Starman
6 Dido - Life For Rent
7 Duran Duran - Astronaut
8 ELO - Livin' Thing
9 Erich Kunzel & The Cincinnati Pops - Theme from The Pink Panther
10 Evanescence - Going Under
11 Faithless - Mass Destruction
12 Garbage - Bleed Like Me
13 Jamiroquai - World That He Wants
14 Kraftwerk - Tour de France Etape 1
15 Morrissey - Irish Blood, English Heart
16 Paco De Lucia - Rumba Improvisada
17 Santana - Oye Como Va
18 Simply Red - You Make Me Believe
19 Sting - Desert Rose
20 The Beatles - Let It Be
21 Tina Arena - Symphony of Life
22 U2 - Vertigo
23 Whitney Houston - Queen Of The Night
24 Yello - Planet Dada
25 Yo-Yo Ma - Libertango

Classical

26 Aldo Ciccolini - Satie, Sports Et Divertissements, Le Flirt
27 Alfred Brendel - Beethoven, Piano Sonata No 15, Op 28 Scherzo
28 Baroque Festival Orchestra - Vivaldi, Sinfonia in C major, 3. Presto
29 Berlin Philharmonic Orchestra - Mozart, Requem in D moll K 626, Sanctus
30 Berlin Philharmonic Orchestra - Strauss, Also Sprach Zarathustra (Thus Spoke Zarathustra)
31 Berlin Radio Symphony Orchestra - Mahler, Symphony No  8, 2 Chailly, Ewiger Wonnebrand
32 Christophe Rousset - Farinelli, Il Castrato (OST), J.A. Hasse, Generoso risuegliati
33 Concentus Musicus Wien - Bach, Matthäus Passion BWV 244, Da ging hin der Zwölfen Einer
34 Daniel Barenboim - Mozart, Piano Concerto No 3 in D major, K40-3
35 Gérard Lesne - Vivaldi, Sonate Op 2 No 3 pour violon & bc, III. Adagio
36 Giuseppe Sinopoli - Elgar, Cello Concerto, Serenade for Strings, Enigma-Andante
37 Itzhak Perlman - Paganini, 24 Caprises, No 1 In E
38 Jascha Heifetz - Sarasate, Zigeunerweisen, Op 20 No 1
39 Jessye Norman - Angels we have heard on high (Trad.)
40 Kirov Orchestra & Chorus - Khatchaturian, Gayaneh, Säbeltanz
41 Leslie Howard - Liszt, Douze Grandes Études, S 137 No 1 in C
42 London Sinfonietta - Saint-Saëns, Le carnaval des animaux, Hémiones
43 London Symphony Orchestra - Ravel, Daphnis et Chloé, 10. Tres modere
44 Marie-Claire Alain - Bach, Wo Soll Ich Fliehen Hin BWV 646
45 Michael Nyman - The Piano OST, A Bed of Ferns
46 Orchestre Symphonique De Montreal - Elgar, Enigma Variations No 1
47 Philharmonia Slavonica - Bach, BWV 1067 Rondeau
48 The Cleveland Orcestra - Ravel Valses, nobles et sentimentales, 1. Modere
49 The Philadelphia Orchestra - Tchaikovsky, The Nutcracker, Op 71a (Ballet Suite) No 2
50 Zbigniew Preisner - Trois Couleurs Bleu (OST), First flute
Title: Public Listening Test [2010]
Post by: Polar on 2010-04-15 22:35:17
And I actually didn't think you had any knowledge about AAC and any such stuff when this thing started so I thought it was going to flop.
It's hard not to label such an utterance as blunt and ignorant.  If you had done your homework, you'd know about Igor's numerous and elaborate double-blind AAC listening tests which he took the trouble about to document them and share them with everyone right here, in this very forum.  The search button is right there at the top.
Title: Public Listening Test [2010]
Post by: Sylph on 2010-04-17 13:42:21
I'm not talking about ABX, it's other stuff.
Title: Public Listening Test [2010]
Post by: IgorC on 2010-04-19 01:47:14
There is a simulation of ABC-HR and fake Sample01 packages.
Please report if there will be any error.

I have done the next steps:
1. Java abchr - java from here http://www.hydrogenaudio.org/forums/index....st&p=683924 (http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=77573&view=findpost&p=683924)
2. Put FLAC 1.2.1 and FAAD (http://www.rarewares.org/aac-decoders.php)  decoders in the same folder.

Links:
ABC-HR    h*tp://www.mediafire.com/?gmdrnmmuwyj
Sample01  h*tp://www.mediafire.com/?inkymjmtdjj


P.S.
Low anchor is Itunes LC-AAC 64 kbps VBR
Title: Public Listening Test [2010]
Post by: Alex B on 2010-04-21 23:15:07
I tried the simulaton. Technically it works fine. Some comments:

- In the last MP3 test the bat files were in the BIN folder. The isolated location made easier to find them.
For reference, I uploaded Sebastian's MP3@128 "ABC-HR_bin" here: http://www.hydrogenaudio.org/forums/index....st&p=701994 (http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=77994&view=findpost&p=701994) (In case you don't have it archived.)

- You didn't include actual instructions. Here is Sebastian's readme.txt from the last test:

Code: [Select]
Public MP3 Listening Test @ 128 kbps
------------------------------------

Welcome and thank you very much for taking part in this listening test.
Your results will be very valuable.

Here are the instructions:

1. Decompress the "ABC-HR_bin.zip" package to its own folder.

2. Download one ore more sample packages (location of the packages is
  below).

3. Place the sample package ("SampleXX.zip") in the same folder as
  ABC/HR and uncompress it. Make sure you keep the directory structure
  intact or the configuration files will not work.
  For an idea how the folders should look like (in case you
  downloaded all sample packages), check "folder-setup.png".

4. WINDOWS USERS:
      Navigate to the "bin" directory and run "DecodeXX.bat" ("XX"
      being the number of the sample package you want to prepare for testing).
      If you would like to prepare all packages, run "DecodeAll.bat".
      Wait until the command prompt screen disappears.
  *NIX (Linux, OS X, BSD, Solaris, etc.) USERS:
      Linux users are asked to use Wine with "wine wcmd /c DecodeXX.bat" from
      the "bin" directory.
      OS X users are asked to handle decoding of samples themselves (sorry).

5. You need the Sun Java Runtime Environment version 1.5 or newer to run ABC/HR.
  If you don't have Java, download it from here:
  [url=http://www.java.com]http://www.java.com[/url].

  You need to use the ABC/HR for Java version provided in the
  "ABC-HR_bin.zip" package together with this readme. Older versions are not
  compatible with the configuration file format.
      WINDOWS USERS:
        Double-click "abchr.jar".
      *NIX USERS:
        Run "java -jar abchr.jar" from the shell.

  Once ABC/HR is open, click "Open ABC/HR Config..." and load the
  file "SampleXX.ecf" (again, "XX" being the number of the test you want
  to take - make sure you ran the batch file respective to that package
  before).

6. Take the test. If you need information on how to properly do that,
  check ff123's page: [url=http://ff123.net/64test/practice.html]http://ff123.net/64test/practice.html[/url].

7. After you finish the test, save the results, (7-)ZIP, RAR or ACE them
  together and mail the file to mail@listening-tests.info.
  The test is scheduled to end on November 3rd, 2008. No results
  will be accepted after that date, unless the test is extended. Possible
  extensions will be announced at the test page (http://www.listening-tests.info).
 
  You don't need to test all samples to participate. Even one single
  result is already very helpful. Of course, the more you test, the better
  for the final results' significance.

  All results and comments I receive will be published. If you want
  to be associated with your results, please enter your (nick)name in the
  "Show name in results file" field in ABC/HR (check the checkbox next to
  it to enable the field). Otherwise, your results will be anonymous.

---------------------------------------------

These are the sample packages:

http://.../~sebastian/Sample01.zip
http://.../~sebastian/Sample02.zip
http://.../~sebastian/Sample03.zip
... etc

- OR (alternate download links) -

http://.../Sample01.zip           
http://.../Sample02.zip           
http://.../Sample03.zip           
... etc

The average package size is 4 MB.

Thank you very much!

Best regards,
Sebastian Mares

- Regarding the low anchor, perhaps 64 kbps is too low for emese (the demo sample). In the last  test the low anchor choice was a bit unfortunate. Sebastian wanted to evaluate the progress from the first popular FhG MP3 encoder and picked an old FhG MP3 encoder from "really rarewares". However, that happened to be a buggy prerelease version (Roberto's description wasn't exactly accurate) and the encodings were really too bad to be anyhow useful.

I wonder if e.g. 80 kbps would be suitable for the test samples. The difference should be audible, but not like a night and day difference. Ideally the low anchor should give some hints about what kind of artifacts the contenders might produce. I.e. it should produce similar type of artifacts in a more pronounced form.

EDIT

Remember to include the licenses. You can copy them from Sebastian's test package.
Title: Public Listening Test [2010]
Post by: IgorC on 2010-04-22 16:14:13
Thank you, Alex.

Speaking of low anchor, Emese is most hard sample which I've ever seen.  64 kbps low anchor is actually not that awful for rest of the samples.

Chris has prepared concatenated  reference sample.
Link to download: h*tp://www.mediafire.com/?yhwmwzjgmm3
I've also attached some possible low anchors: itunes 64 CBR, itunes 64 CVBR and CT 80. CT for 80 kbps low anchor as Apple has bug on LC-AAC 80-96 kbps.

01 BerlinDrug
02 AngelsFallFirst
03 CantWait
04 CreuzaDeMä
05 Ecstasy
06 FallOfLife_Linchpin
07 Girl
08 Hotel_Trust
09 Hurricane_YouCant
10 Kalifornia
11 Memories
12 RobotsCut
13 SinceAlways
14 Triangle_Glockenspiel
15 Trumpet_Rumba
16 Waiting
Title: Public Listening Test [2010]
Post by: Alex B on 2010-04-22 17:48:57
I tried the files.

Personally I don't think the low anchor is optimal when the first thing that you hear is the obvious low-pass that makes the encoding entirely different from the others.

I'd like to suggest FAAC (v.1.28 from rarewares) with an adjusted low-pass frequency, for instance:

-q 35 -c 18000

I tried the above and it works pretty well with the concatenated sample.

EDIT

If it would appear to be too good or bad for a specific sample the q value could be adjusted for that sample.
Title: Public Listening Test [2010]
Post by: IgorC on 2010-04-22 18:20:55
itunes CVBR 64 is noticeably better than FAAC -q 35 -c 18000
Title: Public Listening Test [2010]
Post by: Alex B on 2010-04-22 18:32:20
itunes CVBR 64 is resampled to 32 kHz and low-passed at about 12 kHz, otherwise it sounds pretty "clean". It doesn't really help to understand what kind of artifacts (distortion, noise, pre-echo, etc) the sample may produce.

If -q 35 is too bad a higher value can be used.

In addition it would be better to include only 44.1 kHz samples. Sample rate switching may produce additional problems with the ABC-HR program, some operating systems, and/or some sound devices.
Title: Public Listening Test [2010]
Post by: IgorC on 2010-04-22 18:37:42
Hm, good points indeed.

Then we should encode to  FAAC -q>35 -c 18000 for low anchor
Title: Public Listening Test [2010]
Post by: C.R.Helmrich on 2010-04-22 19:31:51
If we're looking for an anchor emphasizing artifacts to be expected, why not use MP3, e.g. LAME CBR at the lowest setting which doesn't downsample to 32 kHz? I think we could actually use the old "version 1.0" Fraunhofer encoder from 1994(?) with an additional 16-kHz lowpass filter applied before encoding (that should avoid the bug).

Edit: The more I think of it, the more I believe we should use two anchors to stabilize the results: one to define the lower end of the grading scale, the other to define a mid-point of the scale. For the lower end, I just imitated the world's first audio encoder: our test set downsampled to 8 kHz using Audition and saved as 8-bit µ-Law stereo Wave file. That's a 128-kb encoding. Nice demonstration of how far we've come in the last 40 years or so

µ-Law file: http://www.materialordner.de/wsRJHTtgLzlgF...TouJw5xomU.html (http://www.materialordner.de/wsRJHTtgLzlgFOGsG3HitTouJw5xomU.html)

Edit 2: When using the µ-Law file as anchor, of course it will be upsampled to 44 kHz again.

Maybe a 96-kb MP3 would be just fine for an intermediate anchor.

Edit 3: Can someone please upload Fraunhofer's 1994 encoder (l3enc 0.99) here (http://www.hydrogenaudio.org/forums/index.php?showtopic=77994)? Roberto's original page expired.

Chris
Title: Public Listening Test [2010]
Post by: C.R.Helmrich on 2010-04-22 22:01:17
Regarding the splitting of the concatenated encodes: I think we should use CopyAudio by Kabal et al. from McGill university to simply cut the Wav decode into the appropriate chunks. Reason:

http://www-mmsp.ece.mcgill.ca/Documents/Downloads/AFsp/ (http://www-mmsp.ece.mcgill.ca/Documents/Downloads/AFsp/)


What do you guys think? If you agree, I'll write "prepare_test.bat" and "prepare_test.sh" Windows and Linux scripts for the ABC/HR package over the weekend.

Chris
Title: Public Listening Test [2010]
Post by: IgorC on 2010-04-25 23:09:14
Ok, Chris, your applications are better. 
I'm also fine with any of low anchors. So FAAC or LAME are just fine.
PM or send Email how you want to proceed.
Title: Public Listening Test [2010]
Post by: lvqcl on 2010-04-25 23:18:26
Quote
Can someone please upload Fraunhofer's 1994 encoder (l3enc 0.99) here? Roberto's original page expired.


This - http://web.archive.org/web/20070927014154/.../rrw/l3enc.html (http://web.archive.org/web/20070927014154/www.rjamorim.com/rrw/l3enc.html) ?
Title: Public Listening Test [2010]
Post by: C.R.Helmrich on 2010-04-26 22:25:45
Thanks a lot, lvqcl! I tried a 112-kb encode with the - apparently bug-free - l3enc version 2.60 (Linux version). The quality is actually too good for a mid-anchor. 96 kbps unfortunately don't work in the unlicensed version. We are currently investigating LAME at 96 kb and 44 kHz sampling rate as anchor.

For the record, the lower anchor will be created and decoded with the following commands. This yields a delay-free anchor.

Code: [Select]
ResampAudio.exe -s 8000 -f cutoff=0.087 -D A-law -F WAVE ha_aac_test_sample_2010.wav ha_aac_test_sample_2010_a-law8.wav
ResampAudio.exe -s 44100 -D integer16 -F WAVE ha_aac_test_sample_2010_a-law8.wav ha_aac_test_sample_2010_a-law.wav
del ha_aac_test_sample_2010_a-law8.wav

Chris
Title: Public Listening Test [2010]
Post by: The Sheep of DEATH on 2010-04-27 06:20:59
What do you think about getting GXLame in as a low anchor (or even a competitor in a non-AAC test)? It's a low-bitrate MP3 encoder, so it just might fit the bill somewhere between V0-V30 (V20 averages 96kbps and defaults to 44kHz).
Title: Public Listening Test [2010]
Post by: Alex B on 2010-04-27 11:42:51
I don't understand why two low anchors would be needed. Wouldn't it better to let the "mid" anchor define where the the lower end of the scale is?  Then there would possibly be a bit wider scale for the contenders. Ideally the low anchor would then get 0-3 and the contenders 2-5. IMHO, it would be enough that there is one low anchor that can be detected easier than the actual contenders.

Also, I don't understand why some old/mediocre MP3 encoder/setting would make a better low anchor than FAAC. FAAC would nicely represent the basis of the more developed AAC encoders. FAAC can be adjusted freely to provide the desired quality level. "-q 35 -c 18000" worked for me, but perhaps -q 38, -q 40 or so would work as well.

In general, it would be desirable that all encoders, including the low anchor, are easily available so that anyone can reproduce the test scenario (for verifying the authenticity of the results) or test different samples/encoders using/including the tested encoders and settings in order to get comparable personal results. Also the procedure to decode and split the test sample should be reproducible by anyone.
Title: Public Listening Test [2010]
Post by: C.R.Helmrich on 2010-04-27 22:06:07
I don't understand why two low anchors would be needed. Wouldn't it better to let the "mid" anchor define where the the lower end of the scale is?  Then there would possibly be a bit wider scale for the contenders. Ideally the low anchor would then get 0-3 and the contenders 2-5. IMHO, it would be enough that there is one low anchor that can be detected easier than the actual contenders.

Use of two anchors follows the MUSHRA methodology and is an attempt at making the grading scale of this test more absolute. After all, all encoders in this test sound quite good compared to old/simple encoding techniques or lower bit rates. As the name implies, the lower anchor shall define the lower end of the scale and should give the listeners an idea of what we mean by "bad quality" (range 0-1). The hope then is that this reduces the confidence intervals (grade variance) for the other coders in the test, including the mid anchor (which should end up somewhere in the middle of the grading scale).

Quote
Also, I don't understand why some old/mediocre MP3 encoder/setting would make a better low anchor than FAAC. FAAC would nicely represent the basis of the more developed AAC encoders. [...]

Actually, it seems it doesn't. In my first informal evaluation, I noticed that FAAC is tuned very differently than the other AAC encoders in the test (less pre-echo, more warbling), and it seems LAME@96kb emphasizes the artifacts of the codecs under test (pre-echo, warbling on tonal sounds, etc.) better than FAAC@64. Btw, the bandwidth of LAME@96 is close enough to the codecs under test (around 15 kHz).

Quote
In general, it would be desirable that all encoders, including the low anchor, are easily available so that anyone can reproduce the test scenario (for verifying the authenticity of the results) or test different samples/encoders using/including the tested encoders and settings in order to get comparable personal results. Also the procedure to decode and split the test sample should be reproducible by anyone.

Agreed. Igor and I are working on scripts, run by the listeners, which do all the decoding and splitting of the bit streams and creation of the (decoded) anchors. My commands for the lower anchor above are a first attempt at this.

Chris
Title: Public Listening Test [2010]
Post by: muaddib on 2010-04-28 08:40:09
Ideally the low anchor would then get 0-3 and the contenders 2-5. IMHO, it would be enough that there is one low anchor that can be detected easier than the actual contenders.

As the name implies, the lower anchor shall define the lower end of the scale and should give the listeners an idea of what we mean by "bad quality" (range 0-1).

The ITU-R five grade impairment scale that is used is between 1 (Very Annoying) and 5 (Imperceptible).
Bad quality would be in range 1-2, probably closer to 1.
Title: Public Listening Test [2010]
Post by: Alex B on 2010-04-28 11:06:49
Use of two anchors follows the MUSHRA methodology and is an attempt at making the grading scale of this test more absolute. After all, all encoders in this test sound quite good compared to old/simple encoding techniques or lower bit rates. As the name implies, the lower anchor shall define the lower end of the scale and should give the listeners an idea of what we mean by "bad quality" (range 0-1). The hope then is that this reduces the confidence intervals (grade variance) for the other coders in the test, including the mid anchor (which should end up somewhere in the middle of the grading scale).

In the past 48 and 64 kbps tests most samples were difficult to me because the low anchor was too bad and the remaining scale wasn't wide enough for correctly stating the differences between the easier and more difficult samples. I.e the low anchor was always like a "telephone" and got "1". The actual contenders were considerably better, but never close to transparency. So the usable scale for the contenders was mostly from 2.0 to 3.5. Actually, even then the grade "2" was a bit too low for correctly describing the difference between the low anchor and the worst contender. At the other end of the quality scale the difference between the reference and the best contender was always significant and anything above 4 would have been too much for the best contenders.

Of course the situation is different in a 128 kbps AAC test, but there is a danger that the two anchors will occupy the grades 1-4 and the actual contenders will get 4-5 and once again be more or less tied even though the testers actually could hear clear differences between the contenders.

Quote
Actually, it seems it doesn't. In my first informal evaluation, I noticed that FAAC is tuned very differently than the other AAC encoders in the test (less pre-echo, more warbling), and it seems LAME@96kb emphasizes the artifacts of the codecs under test (pre-echo, warbling on tonal sounds, etc.) better than FAAC@64. Btw, the bandwidth of LAME@96 is close enough to the codecs under test (around 15 kHz).

I see. I didn't actually try to do that kind of complex cross-comparison so you know more about this than I. You could have posted the explanation earlier... 
Title: Public Listening Test [2010]
Post by: Alex B on 2010-04-28 11:31:41
The ITU-R five grade impairment scale that is used is between 1 (Very Annoying) and 5 (Imperceptible).
Bad quality would be in range 1-2, probably closer to 1.

Oops. That's my mistake and probably Chris just repeated it. I wrote the reply a bit hastily. By default ABC/HR for Java shows five integer grades from 1 to 5 (though that is configurable).
Title: Public Listening Test [2010]
Post by: C.R.Helmrich on 2010-04-28 12:52:30
Of course the situation is different in a 128 kbps AAC test, but there is a danger that the two anchors will occupy the grades 1-4 and the actual contenders will get 4-5 and once again be more or less tied even though the testers actually could hear clear differences between the contenders.

The method of statistical analysis which we will be using this time will take care of this: http://www.aes.org/e-lib/browse.cfm?elib=15021 (http://www.aes.org/e-lib/browse.cfm?elib=15021) Getting two MUSHRA-style anchors (one for worst quality, one for intermediate quality, and hidden reference for best quality) into our test allows us to use MUSHRA-style evaluation for our test, as stated in the referenced paper.

Quote
I see. I didn't actually try to do that kind of complex cross-comparison so you know more about this than I. You could have posted the explanation earlier... 

Sorry, I only did these tests a few days ago

Chris
Title: Public Listening Test [2010]
Post by: C.R.Helmrich on 2010-05-01 23:50:08
What do you think about getting GXLame in as a low anchor (or even a competitor in a non-AAC test)? It's a low-bitrate MP3 encoder, so it just might fit the bill somewhere between V0-V30 (V20 averages 96kbps and defaults to 44kHz).

When I have time, I'll certainly blind-test GXLame against LAME (because I'm interested in your work). However, assuming GXLame sounds better than LAME at low bit rates, I still tend towards LAME as anchor for this test. Here's why: unlike the codecs under test, anchors are supposed to produce certain artifacts, not avoid them.

Chris
Title: Public Listening Test [2010]
Post by: C.R.Helmrich on 2010-05-03 23:06:06
OK, I changed my mind and go along with Alex. The mid anchor will be a "compromised" AAC encoding at 96 kbps VBR. More precisely, one without TNS and short blocks and a bandwidth of 15.8 kHz. It will be created with FAAC v1.28 (http://www.rarewares.org/aac-encoders.php) and the following commands:

Code: [Select]
faac.exe --shortctl 1 -c 15848 -q 50 -w ha_aac_test_sample_2010.wav


Decoder-wise, I'm not sure yet. Either NeroAacDec 1.5.1.0 or FAAD2 v2.7 (http://www.rarewares.org/aac-decoders.php). Can someone point me to an Intel MacOS X (fat) binary of the latter?

Chris
Title: Public Listening Test [2010]
Post by: nao on 2010-05-04 05:34:01
Can someone point me to an Intel MacOS X (fat) binary of the latter?

Here (http://tmkk.hp.infoseek.co.jp/tools/faad-2.7-macosx.tar.bz2) it is.
Title: Public Listening Test [2010]
Post by: The Sheep of DEATH on 2010-05-09 21:46:57
What do you think about getting GXLame in as a low anchor (or even a competitor in a non-AAC test)? It's a low-bitrate MP3 encoder, so it just might fit the bill somewhere between V0-V30 (V20 averages 96kbps and defaults to 44kHz).

When I have time, I'll certainly blind-test GXLame against LAME (because I'm interested in your work). However, assuming GXLame sounds better than LAME at low bit rates, I still tend towards LAME as anchor for this test. Here's why: unlike the codecs under test, anchors are supposed to produce certain artifacts, not avoid them.

Chris


That's perfectly understandable. With its t4 release, I think it's actually quite competitive--I rushed to finish it in time for this test.
Title: Public Listening Test [2010]
Post by: C.R.Helmrich on 2010-06-04 17:02:49
In response to www.hydrogenaudio.org/forums/index.php?showtopic=77809 (http://www.hydrogenaudio.org/forums/index.php?showtopic=77809):

Quote from: C.R.Helmrich link=msg=0 date=
Quote from: muaddib link=msg=0 date=

Also it would be beneficial to create tutorial with each,single,small step that proper test must consist of.

Do you mean a tutorial for the listeners on "what the rules are" and how to proceed before and during the test? That sounds good. Will be done.

I finally found some time for this test again. I've managed to write a nearly test-methodology (ABC/HR or MUSHRA) and user-interface independent instruction sheet to guide the test participants through a test session. It's based on my own experience and adapted to this particular test with regard to anchor and hidden-reference selection and grading. I'v put a draft under

www.ecodis.de/audio/guideline_high.html (http://www.ecodis.de/audio/guideline_high.html)

A description of said "general test terminology", i.e. an explanation of terms such as anchor, item, overall quality, reference, session, stimulus, and transparency, will follow.

Everything related to listener training, i.e. how to use the test software, what kinds of artifacts to expect, and how to spot artifacts, will also be discussed separately. As mentioned, this instruction sheet is the "final one in the chain" and assumes a methodology- and terminology-informed, trained listener.

If you're an experienced listener and feel that your approach to a high-bit-rate blind test is radically different from my recommendation, please let me know about the difference.

Chris
Title: Public Listening Test [2010]
Post by: .alexander. on 2010-06-07 09:30:10
If you're an experienced listener and feel that your approach to a high-bit-rate blind test is radically different from my recommendation, please let me know about the difference.


Chris, I'm not experienced listener at all, and also my headphones are really poor. But I would love to know what people think about my way. I actually don't care about ABX probabilities but simply mux encoded and raw audio into L and R channels so that I can hear both signals simultaneously.

Also since there is some activity related to the test, I'm wondering whether someone could reach Opticom, or just have access to OperaDigitalEar to get advanced PEAQ scores for the test samples.
Title: Public Listening Test [2010]
Post by: C.R.Helmrich on 2010-06-07 18:07:33
I actually don't care about ABX probabilities but simply mux encoded and raw audio into L and R channels so that I can hear both signals simultaneously.

My initial guess is that this is dangerous! You will probably hear artifacts which are inaudible if you just listen to the original and coded version, one after the other, and you might not hear certain artifacts which are clearly audible if you listen to both channels of the codec signal. Example: if original and coded version are slightly delayed to each other, you'll hear this with your approach because human hearing is very sensitive to inter-aural delay. However, if both coded channels are delayed by the same amount compared to the original two channels, this might be inaudible if you listen to both coded channels (which you should). I've never ABXed this way.

Objective quality measures will be done, but might not be published with the results (don't know if I'm allowed to publish Advanced PEAQ scores, the license is owned by my employer, not by me), especially not before the test.

Chris
Title: Public Listening Test [2010]
Post by: .alexander. on 2010-06-11 09:40:22
and you might not hear certain artifacts which are clearly audible if you listen to both channels of the codec signal.

What kind of artifacts could be missed? Excluding stereo issues I can only imagine a very far fetched example. Anyway,
this method can be thought as unit test. Here is what I usually do

[font= "Courier New"]
%%
[a, fs] = wavread('sampleA.wav');
[b, fs] = wavread('sampleB.wav');

[c, i] = xcorr(sum(a,2), sum(b,2), 4096); % fftfilt in Octave
i = i(abs( c )==max(abs( c )));

a(1: i) = [];
b(1:-i) = [];
a(length(b)+1:end) = [];
b(length(a)+1:end) = [];

%%
j = round(rand);
x = circshift([a(:) b(:)], [0 j]);

wavplay(x, fs, 'async')
[/font]
Title: Public Listening Test [2010]
Post by: C.R.Helmrich on 2010-06-12 09:58:16
Alex, thanks for your interest and the Matlab code.

What kind of artifacts could be missed? Excluding stereo issues I can only imagine a very far fetched example.

Yes, I was thinking of stereo issues. Things like BMLD (http://www.dcs.shef.ac.uk/~martin/MAD/bmld/bmld.htm)s and the following.

Let's say you have a near-center-panned stereo item (left and right are almost identical), and the encoder makes one channel 0.5 dB too loud and the other 0.5 dB too quiet. If you ABX per-channel against the corresponding original channel, the channel level difference (CLD) is 0.5 dB, and in the absence of other coding artifacts, that channel is most likely transparent. If you downmix left and right coded channel into one, and downmix left and right original channel into another, and ABX the downmixes, they have a CLD near 0 dB, and thus the coded downmix will also be transparent in the absence of other artifacts. But the CLD between the coded channels is 1 dB, and if you ABX this stereo decoding non-simultaneously against the original near-mono stereo recording on good headphones (as proposed by me), the difference will be audible.

Chris
Title: Public Listening Test [2010]
Post by: C.R.Helmrich on 2010-06-15 21:48:09
Just a quick info. The FAAD 2.7 binaries on RareWarez (http://www.rarewares.org/aac-decoders.php) (Win32) and post # 295 of this thread (http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=77272&view=findpost&p=703921) (MacOS) have been updated to fix a relatively rare PNS-related decoding bug. This doesn't affect decoding of the bitstreams for this test, though (decoding is bit-identical), but it's good to see this was fixed anyway.

Thanks again to Menno, John, and Nao for their help!

Chris
Title: Public Listening Test [2010]
Post by: IgorC on 2010-07-08 22:28:35
I should inform that I have no spare time to organize the test.

My apologies to HA community.