Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: Public Listening Test [2010] (Read 176092 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Public Listening Test [2010]

Reply #175
Very interesting! Have you made this by hand or is there a tool that could do that?

Public Listening Test [2010]

Reply #176
Very interesting! Have you made this by hand or is there a tool that could do that?


mp4box can split streams, but I just extracted each frame into separate file
"mp4box -raws 1 emese_looped.m4a"

Also, I made simple m-script to
* remove silence
* trim emese to contain multiple of 1024 samples
* compensate encoder delay (add silence and skip first 2 aac frames latter)
* write loop using writewav (it can append)
* collect bitrate data


Public Listening Test [2010]

Reply #177
I see. Then I wonder what the graph for QuickTime TVBR would look like...

Chris
If I don't reply to your reply, it means I agree with you.

Public Listening Test [2010]

Reply #178
I am also very interested to see a comparison to the TVBR version, if you find the time. Sadly I'm not practiced in m myself.

[attachment=5721:emese_qt..._highest.m4a]

Public Listening Test [2010]

Reply #179
Using a modified version of faad released by Ivan Dimkovic some years ago, this is the bitrate distribution I get with your sample, rpp3po:


Public Listening Test [2010]

Reply #180
The CVBR looks different since I had to update quicktime to use qtaacenc.



Note this is bitrate distribution for looped emese sample. And each point corresponds to bitrate of one repetition of emese sample.

EDIT: legend

Public Listening Test [2010]

Reply #181
I would never have thought, that the average bitrate variation for CVBR would be that large depending on iterations. Could that effect be mitigated by adding pre-silence to the test samples?

Public Listening Test [2010]

Reply #182
As AlexB has found that Divx hadn't suitable VBR option for 128 kbps http://www.hydrogenaudio.org/forums/index....st&p=686087

I still didn't receive the answer from Divx developer.One week has passed.
He answered very late before.

Let's make decision by ourselfs.

Possible Divx settings:
a)  v4/v5 alternate setting which produces more close bitrate to other competitors on specific sample. I know it's not the best idea but still an option.
b) CBR 128.

Public Listening Test [2010]

Reply #183
I would never have thought, that the average bitrate variation for CVBR would be that large depending on iterations. Could that effect be mitigated by adding pre-silence to the test samples?

Probably not, pre-silence doesn't empty the bit reservoir. I think the best thing would be to concatenate all test samples into a single file (it seems this is commonly done while standardizing MPEG coders), and then to prepend a few seconds of noise to the beginning of that file so that the first sample is encoded with a half empty bit reservoir state.

Chris
If I don't reply to your reply, it means I agree with you.

Public Listening Test [2010]

Reply #184
That sounds like a good approach and not too much overhead. A little script employing mp4box could be used to produce readily cut file sets after encoding.

I think Divx should be CBR 128, with the developers' option to provide a better matching preset if they want to improve their chances in the competition.

Public Listening Test [2010]

Reply #185
Agreed. Agreed.
If I don't reply to your reply, it means I agree with you.

Public Listening Test [2010]

Reply #186
During the last listening tests the two first seconds of each encoding were discarded (by ABC/HR). Isn't it enough to avoid some technical issues?

Public Listening Test [2010]

Reply #187
During the last listening tests the two first seconds of each encoding were discarded (by ABC/HR). Isn't it enough to avoid some technical issues?


Why on earth would you throw out the first two seconds???    There should be one and only one lossless source file.  If trimming needs to be done to the beginning or ending, it should be done on the lossless source file, which will propogate to all the lossy files.  The ABC/HR application should never be altering the audio data.


Public Listening Test [2010]

Reply #189
Yes, but some test samples are only 5-6 seconds long. I wouldn't want to throw away precious 25 or more percent of that.

Chris
If I don't reply to your reply, it means I agree with you.

Public Listening Test [2010]

Reply #190
In addition to the link Guruboolez provided, here are some other relevant links:

http://www.hydrogenaudio.org/forums/index....mp;#entry343318
http://www.hydrogenaudio.org/forums/index....mp;#entry447231
http://www.hydrogenaudio.org/forums/index....mp;#entry382267

Edit: these are the initial related posts in the linked threads. Related replies may follow after a few unrelated other replies.

Yes, but some test samples are only 5-6 seconds long. I wouldn't want to throw away precious 25 or more percent of that.

So as an AAC developer, can you tell if your AAC encoder needs some time to adopt to the content before it provides the best possible quality?

Regardless of the encoder specific behavior, I think that if you want to test a certain very short passage you should simply encode a sample that starts 2 seconds earlier. If the original source track actually starts with the critical passage that is intended to be tested you can always configure that sample to start from the beginning. It would be fair for all encoders. You would then be testing how the encoders can handle a track that starts with such content.


iTunes CVBR may be a real problem if its behavior is inconsistent. We discussed about a bit similar problem when WMA 2-pass VBR was one of the possible test contenders.

In general, the only really correct and fair way to simulate a real life usage situation would be to encode the complete original source tracks, decode the encoded tracks, cut the test samples from the decoded tracks, and store the samples in a lossless format. This has been discussed in the past, but it has never been a viable option for various practical reasons.

Public Listening Test [2010]

Reply #191
Yes, but some test samples are only 5-6 seconds long. I wouldn't want to throw away precious 25 or more percent of that.

Chris

If needed I can easily upload a longer sample for emese (though I'm not convinced that this kind of extreme sample really belongs to this listening test).

Public Listening Test [2010]

Reply #192
For AAC at ~128 kbit/s rates, extreme samples like emese are the bread & butter of this test.

Public Listening Test [2010]

Reply #193
That's right for a public listening test. That's also why such test at 128 kbps test is already doomed.
If you don't want an « all-tied » conclusion like previous tests at this bitrate you necessary have to feed the listeners with extremely difficult to encode samples - which are also unrepresentative ones (by unrepresentative, I mean that the tested material won't correspond to what people are listening on a daily basis). Such methodology is - be sure of that - a very good argument against the validity of the test. Of course we may put one or two special samples for pedagogic purpose, in order to show that lossy encoders could fail on extreme material. But not more.
With more « musical » (difficult but normal) samples the conclusion should be the same than previous LT at 128 kbps : a complete statu quo. I don't think it's worth to put so much energy to show that 128 kbps encoders are equally transparent for a panel of 25...30 persons.

http://www.listening-tests.info/mp3-128-1/results.htm
http://www.listening-tests.info/mf-128-1/results.htm

Public Listening Test [2010]

Reply #194
A "representative" test would just show what we already know. Why should anymore time be wasted on that? There is no need to prove again that all tested encoders can be transparent for most material. The 2010 edition tries to explore the tiny range between most and 100%. Which encoder comes closest? An thus it bears the potential to actually produce new results, we didn't know already: Like is TVBR really better than CVBR or even the other way around, if TVBR choked on some content? How does Nero compare to QT? And what about the new contenders. All of them should be perfectly fine for most material, we don't need to test that again. And going for more representative music selection at lower bitrates would just show which encoder produces the best low bitrate results and not necessarily tell anything about transparency potential in the bitrate range people actually use.

Public Listening Test [2010]

Reply #195
If I understand your last message correctly, I think I'm very far to share any interest in the future test (I apologize from discovering it right now: I haven't read the full debate).
Extreme samples were usually helpful to test encoders under stressing situation and then validate the choice of any high bitrate settings (like lame --alt-preset, musepack, lossywav, etc...). People using such high lossy encodings expect transparent results even in extreme cases (at least in most extreme ones). But 128 kbps is very far from high bitrate and I don't think many people would expect from such low bitrate a robust, artefact-free, music library.

So if I understand all this correctly the future test should tell us how good (or bad) are 128 kbps encodings under extraordinary situations but won't give us any idea about how different they could be with music for daily usage (I assume it was what people are expecting from a test at 128 kbps). If I'm right, the practical interest in such test seems very limited. At least to me.

Regards.

Public Listening Test [2010]

Reply #196
But 128 kbps is very far from high bitrate and I don't think many people would expect from such low bitrate a robust, artefact-free, music library.


The general sentiment was that 128 is even a very high bitrate for an AAC listening test. It was expected that at that bitrate, with normal samples, only very few participants would be able to contribute anything at all, because most wouldn't hear a difference.

Public Listening Test [2010]

Reply #197
128 kbps is indeed high for a public listening test of modern encoders. That's why I really think it would be judicious to not make a new one rather than forcing the chance to get detailed results by using a lot of meaningless samples.

I realize that shouldn't have open the debate about the chosen bitrate. I'm sorry for that and will stop it right now.
I repeat my offer: if needed I can upload a longer version of the emese sample (from the original CD) and maybe other samples if I can.

Public Listening Test [2010]

Reply #198
Thanks for the offer, guruboolez! I'm sure we will need to make use of it.

... but won't give us any idea about how different they could be with music for daily usage ...

Correct, but as rpp3po mentioned, we already know that. Except maybe for the DivX encoder, all encoders in question have been developed for years, so we can expect them to perform equally well (i.e. with satistically insignificant quality differences on average) on daily music. The question is how large the quality differences are for extremely critical material. Sure, an "average" reader might not care about such material, but such a reader can refer to the previous listening tests you referred to.

By the way, related question: Do you - or does anyone else, for that matter - think a 96-kb test with less critical samples will show greater differences between the encoders?

Chris
If I don't reply to your reply, it means I agree with you.

Public Listening Test [2010]

Reply #199
Honestly, I don't think it will. I strongly believe that the difference between contenders must be very strong and immediately obvious to appear on the final plots. And I'm pretty sure that the gap between Nero & Apple is far to be large enough (assuming that a gap really exists) for expecting such results.