Multiformat 48 kbps Listening Test

Topic: Multiformat 48 kbps Listening Test (Read 83593 times) previous topic - next topic

0 Members and 1 Guest are viewing this topic.

Multiformat 48 kbps Listening Test

Reply #125 – 2006-11-05 17:20:22

Well, if differences are only marginal and don't affect quality, I guess we should go with WMCmd.vbs because it can be used for batch encoding. WMP does not encode to Q10 WMA Standard. I could also use Winamp, but I just uninstalled it again.

Multiformat 48 kbps Listening Test

Reply #126 – 2006-11-05 22:10:12

I was talking with Roberto about the problems of testing WMA 2-pass VBR the other day and was wondering about one thing - is only 2-pass VBR affected by the issue I described here or does this affect all VBR modes actually. Therefore, I asked both Ivan and Gabriel how their VBR implementations work and whether or not it is true that "free" VBR will always allocate the same number of bits to a given sample, regardless of the fact that it's part of a full song or the sample was encoded as-is: as an already extracted part of a track. While Ivan confirmed my initial thoughts, Nero producing two more or less identical encodes, Gabriel said this is not the case with LAME. He explained that LAME is using a variable ATH level whichs value is based on the previous loudness. Therefore, encoding a full track is not the same as encoding a sample - even if VBR was used, the sample encoded as-is will not be the same as the sample encoded from the whole track.
I am now wondering how big the effect is. Does this "news" render all previous listening tests based on samples as useless with regards to LAME?

Multiformat 48 kbps Listening Test

Reply #127 – 2006-11-05 22:34:49

It's for that reason Gabriel suggested 2 years ago (and sometimes recalled it) that testers should discard the first one or two seconds from the tested files.
And if I remember correctly it was done for the last listening tests (an option allows this in ABC/HR).

It needs to be confirmed by Gabriel anyway.

Multiformat 48 kbps Listening Test

Reply #128 – 2006-11-05 22:38:51

OK, so it's not something that affects the whole encode, but only the first few samples.

Multiformat 48 kbps Listening Test

Reply #129 – 2006-11-06 09:34:08

@Sebastian,

I would not treat LAME variable ATH as such a problem for the listening test. Fact is that many psychoacoustic models take into account the previous samples - and it is not just variable ATH.

For example, there is temporal post-masking phenomenon - which would create different bit distributions for a given sample, based on the loudness of the samples in the past - however, this phenomenon is very local in time - e.g. maximum duration is approx. 200 ms (unless encoder is buggy)

Also, some encoders are using time-domain methods to estimate tonality of the signal - for example, if the masker is behaving unpredictable in the time domain in the past, encoder might judge the masker as being "noisy" - and this can mean up to 20+ dB in the masker power difference.

Additionally, in SBR you might get slightly different results as there is usually small "SBR Reset" flag being sent every second or so (depending on the encoder) - the difference between two encodings of the same sample, but located in the different region is also not big, but it is definitely there.

Etc..

These are just a few factors that might render samples encoded with different quantization resolution depending on the past samples. However, all of these differences IMHO are not so relevant for a listening test.

I think just adding 2-3 seconds of "run-in" is more than enough to make a fair test.

Multiformat 48 kbps Listening Test

Reply #130 – 2006-11-06 10:24:10

Quote

I think just adding 2-3 seconds of "run-in" is more than enough to make a fair test.

Isn't cutting the first two seconds off in the ABC/HR options the usual practice?

However, this may be a problem with very short samples or samples that start with audio signal that is meant to demostrate a specific problem. Here is an example of such a sample: http://www.hydrogenaudio.org/forums/index....st&p=420360

The first two or three seconds seem to be problematic for all MP3 encoders at about 128 kbps. The sample is also from the very beginning of a real audio track so it is not artificial.

Perhaps a few seconds of some PCM material could be addded before the sample, but should this be digital silence or some average audio material? Would a few seconds of silence make the encoder behave differently when the real sample starts? If the sudden signal change alters the encoding result we would need to know what is the encoder "default" before it starts adjusting its parameters and use an audio signal that would not change this default if possible.

Edit

Naturally it is possible to decode the sample and add an audio signal after that. The only downside would be the larger file size of the lossless test sample.

Multiformat 48 kbps Listening Test

Reply #131 – 2006-11-06 10:40:06

Most modern audio and video encoders will produce different results based on previous samples. It might be because of detection methods (predictability, ATH level,...) or because the encoder is "learning" (mostly video encoders).

In both cases, discarding a few seconds at start (those discarded data beeing similar to tested range - ie no "scene cut") are enough to compensate for this behaviour.

In the Java ABC/HR, up to now, we have to trick it by adjusting the "sample delay" by 2 seconds. (would be nice to be able to specify a testing range instead of this hack)

Multiformat 48 kbps Listening Test

Reply #132 – 2006-11-06 11:06:37

So the correct approach for my example sample would be to encode it as it is (since it is from the beginning of a real audio track), decode it and add at least two seconds of digital silence in the beginning.

If some other "too short" sample is from the middle of the audio track, a longer passage of the same track should be encoded. At least it should start more than two seconds before the intended sample starting point.*

Edit:

*If preferred, this type of encoded sample can be cutted to the intended length after decoding. In this case at least two seconds of silence must be added in the beginning if the sample is going to be used with the two second Java ABC/HR delay setting.

Multiformat 48 kbps Listening Test

Reply #133 – 2006-11-06 11:28:21

Quote from: Alex B on 2006-11-06 11:06:37

So the correct approach for my example sample would be to encode it as it is (since it is from the beginning of a real audio track), decode it and add at least two seconds of digital silence in the beginning.

No.

You encode it as it is, and do not test the first 2 seconds

or

You add two seconds of something at the beginning, encode it, and do not test the first 2 seconds.

(first solution is highly preferable)

Multiformat 48 kbps Listening Test

Reply #134 – 2006-11-06 11:38:53

Quote from: Gabriel on 2006-11-06 11:28:21

Quote from: Alex B on 2006-11-06 11:06:37

So the correct approach for my example sample would be to encode it as it is (since it is from the beginning of a real audio track), decode it and add at least two seconds of digital silence in the beginning.

No.

You encode it as it is, and do not test the first 2 seconds

or

You add two seconds of something at the beginning, encode it, and do not test the first 2 seconds.

(first solution is highly preferable)

The example sample demonstrates a problem in the first few seconds of the track. It represents a real life situation. Just try for example the L3enc version I uploaded. The guitar chords in the very beginning are very bad.

I am not removing the first two seconds when I listen to this track outside a listening test.

Multiformat 48 kbps Listening Test

Reply #135 – 2006-11-06 13:19:18

Out of curiosity, I tried the first three seconds of this AC/DC sample with aoTuV b5 @ -q-1, Nero AAC @ ABR 48kbps and l3enc MP3 @ 128 kbps.

Foobar ABX result was 10/10 for all three when compared with the reference.

In my opinion Vorbis and l3enc produced unusable quality. Nero AAC was much better, I would say "slightly annoying".

Edit: I used "-br 48000" with Nero Digital cl encoder v. 1.0.0.2.

Multiformat 48 kbps Listening Test

Reply #136 – 2006-11-06 15:46:04

Ivan, do you still recommend ABR or is it OK if VBR used?

Does anyone mind the following settings:

Ogg Vorbis AoTuV AO; aoTuV b5 [20061024] (based on Xiph.Org's libVorbis): q-1.0

Nero HE-AAC Nero AAC codec / May 1 2006: VBR, Q0.20

WMA Standard Windows Media Audio 9.2: VBR Quality 10, 44 kHz, stereo 1-pass VBR

WMA Professional Windows Media Audio 10 Professional: 48 kbps, 44 kHz, 2 channel 16 bit 1-pass CBR

The settings were chosen so that all encoders reach more or less the same bitrate with my material. Bitrate tables are welcome.

Edit: WMA Professional will reach 48 kbps with all material because it encodes with CBR. The other encoders produce ~50 kbps.

If developers and majority of the community agrees with this, I suggest we should start discussing samples. Should we use some samples from the HE-AAC test? I also have some files I would like to post (in case I didn't already), like a Vangelis and a Uriah Heep one.

Multiformat 48 kbps Listening Test

Reply #137 – 2006-11-06 16:08:57

Quote

Ivan, do you still recommend ABR or is it OK if VBR used?

I'm fine with both - ABR should provide less quality deviation, but VBR should score a bit higher on average.

Up to you guys.

Multiformat 48 kbps Listening Test

Reply #138 – 2006-11-06 16:36:07

Sorry, but I am afraid I did not understand. What do you mean with "ABR should provide less quality deviation"?

Multiformat 48 kbps Listening Test

Reply #139 – 2006-11-06 17:13:04

I meant - ABR quality (subjective grade) is more consistent, with "shorter" confidence intervals than VBR at that bitrate.

This is because VBR mode could undercode some samples and they would sound slightly less good than when they are coded with ABR mode.

However at average VBR is indeed a bit better.

Multiformat 48 kbps Listening Test

Reply #140 – 2006-11-06 17:35:07

Quote from: Gabriel on 2006-11-06 11:28:21

You encode it as it is, and do not test the first 2 seconds

or

You add two seconds of something at the beginning, encode it, and do not test the first 2 seconds.

(first solution is highly preferable)

Gabriel, but what if a song doesn't start "fading in" but like Alex B pointed out with the AC/DC sample?

Multiformat 48 kbps Listening Test

Reply #141 – 2006-11-06 17:45:13

How many samples should we use, 12?

Multiformat 48 kbps Listening Test

Reply #142 – 2006-11-06 23:13:32

Quote from: Sebastian Mares on 2006-11-06 15:46:04

Ivan, do you still recommend ABR or is it OK if VBR used?

Whatever mode was used in the HE-AAC listening test should be used for this test, also.

Multiformat 48 kbps Listening Test

Reply #143 – 2006-11-06 23:18:22

Quote from: Sebastian Mares on 2006-11-06 17:45:13

How many samples should we use, 12?

Last time there were 18 samples in multi-aac test. Now it's multi-codec test. So more people should be interesting in it. 18-20 samples?

Multiformat 48 kbps Listening Test

Reply #144 – 2006-11-07 08:37:25

Well, I think 18 samples is maximum.

Multiformat 48 kbps Listening Test

Reply #145 – 2006-11-07 09:24:47

Quote from: Sebastian Mares on 2006-11-06 16:36:07

Sorry, but I am afraid I did not understand. What do you mean with "ABR should provide less quality deviation"?

What Ivan is telling is that he's not totally confident in his VBR mode ;-)

Full VBR is a matter of trusting your psymodel, which most of the time is not perfect. If your codec is efficient enough compared to competitors, it's usually safer to rely on ABR (ie VBR is not worth the risk if you are good enough).
(now you know why iTunes is ABR and not fully VBR, and why it is recommended to use Lame in VBR)

Quote from: Sebastian Mares on 2006-11-06 17:35:07

Gabriel, but what if a song doesn't start "fading in" but like Alex B pointed out with the AC/DC sample?

If you really want to test the start of your sample, you would have two choices:

*re-rip the samples with 2 extra seconds at the beginning
*add 2 seconds of silence at the start of the sample

Multiformat 48 kbps Listening Test

Reply #146 – 2006-11-07 11:46:03

Quote

Full VBR is a matter of trusting your psymodel, which most of the time is not perfect. If your codec is efficient enough compared to competitors, it's usually safer to rely on ABR (ie VBR is not worth the risk if you are good enough).

Actually,

Looking here:

http://www.hydrogenaudio.org/forums/index....showtopic=41191

It looked like Nero VBR @48 kbits/s was just a bit better than ABR.

However, at such a low bit-rate I don't believe there are big benefits of using true VBR - there is not too much space to scale the bit-rate down before sound start to degrade a lot - which means that there won't be space to scale it up, either - in case of need.

So, ABR should do just fine.

Multiformat 48 kbps Listening Test

Reply #147 – 2006-11-07 12:52:26

OK, ABR for Nero then. If everything else is fine, we should focus on samples now.

Multiformat 48 kbps Listening Test

Reply #148 – 2006-11-07 13:18:44

We should remember that in the test we should use a setting that should produce the best average quality with various complete audio tracks. So if Ivan recommends ABR to users who are going to encode a complete audio library at about 48 kbps then it should be used.

If the recommendation is VBR then it should be tested even if a certain set of selected test samples would possibly result a bit better quality in ABR mode... *

Edit

* ... or when the ABR mode would be a safer choice for winning this particular test, like Gabriel explained.

Multiformat 48 kbps Listening Test

Reply #149 – 2006-11-12 12:06:28

Here's a bitrate table and graph in Excel format. I used my usual set of 25 various full length tracks:

bitrates_48kbps_test.xls

Average bitrates:

Nero Digital 1.0.0.2 -br 48000 => 48 kbps
Nero Digital 1.0.0.2 -q 0.21 => 50 kbps
Nero Digital 1.0.0.2 -q 0.20 => 48 kbps
WMA 10 Pro CBR 48 kbps => 48 kbps
WMA 9.2 standard VBR10 => 47 kbps
Vorbis aoTuV beta 5 -q -1 => 49 kbps

Some of you may find the following screenshot interesting too. Some track peaks of my test file set, starting from the highest peak:

Any comments?

EDIT

I tested Nero -q 0.2 and changed Nero -q 0.205 to -q 0.2 since it is the selected test option (it was: Nero -q 0.205 => 49 kbps). Also the linked Excel file is updated.

Notice