Skip to main content
Topic: Objective measurements of portable players using df-metric (Read 514 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Objective measurements of portable players using df-metric

After two years of beta-testing I would like to present objective measurements of portable players, performed in accordance with the new audio metric - df-metric.

Such measurements:
  • correlate well to human perception of sound quality;
  • are valid for digital audio;
  • do not contradict traditional audio metric but extend it;
  • are understandable to people without technical background;
  • can be performed at home using any recording device; accuracy of that device will determine final accuracy of the measurements.

This listener-centric audio metric is aimed at consumer control over manufacturers of audio equipment.

While R&D of this new measurement approach is still in progress, the objective measurement procedure for hardware audio circuits is mature enough for practical use. Portable players are very suitable candidates to start with.

The following page has short and clear explanation of the measurement method along with some help for interpretation of its results. Measurements of first two devices are included.

Feel free to propose portable players for testing. I'm ready to answer your questions.

 http://soundexpert.org/portable-players
keeping audio clear together - soundexpert.org

 

Re: Objective measurements of portable players using df-metric

Reply #1
So what can a layman conclude from looking at the pretty infographic of test results of HTC desire C? There's a lot of red and orange in there, and red = bad, so does that mean that the artifacts its analog output introduces are audible? Are they audible in particular conditions?

Re: Objective measurements of portable players using df-metric

Reply #2
So what can a layman conclude from looking at the pretty infographic of test results of HTC desire C? There's a lot of red and orange in there, and red = bad, so does that mean that the artifacts its analog output introduces are audible? Are they audible in particular conditions?
Looking at two df-slides a layman can conclude only that the one is better than another. Nothing else. For sure the Df measurements are less meaningful than quality scores from listening tests. But if there are more than two devices (27 here - http://soundexpert.org/portable-players-beta) then the relationship between df-values and quality scores will be more clear. At least there will be some anchor devices which are well-known as good and bad, so Df values will become more meaningful.
keeping audio clear together - soundexpert.org

Re: Objective measurements of portable players using df-metric

Reply #3
Quote
the one is better than another
audibly better?

Re: Objective measurements of portable players using df-metric

Reply #4
I like the test instrumentation.  32 ohm load will reveal a lot of problems with typical hardware in my experience.  However, I see a few potential confounders:

1)  You're using 44.1khz when almost anything android (which is an awful lot of things these days) is going to default to 48khz and then very often resample.  Android has a relatively high quality windowed sinc resampler, but if you look at very broadband signals like the white noise example, you're probably going to detect the effect of the resampler and penalize pretty heavily when in reality the output is likely to be fine for real music.  Trying two different sampling rates, or else detecting resampling may be more accurate.  Certainly if you tested some of the devices you are proposing in rockbox, you would see very different results depending on the sampling rate setting in our software.

2) The square and triangle tests looks questionable to me.  I see you're doing this correctly and generating band limited square/triangle signals (e.g. via fourier synthesis), but they're still going to have all sorts of weird interactions with the high pass filtering, imaging filter and resampler that probably have no relevance to real audio quality.  In fact, I suspect that you are seeing this in many of your tests given the large difference between the triangle and square results.  Is there a reason to consider such artificial signals at all? 

3)  The use of a common scale is a little confusing.  -6dB on the -92dbfs test is a much better result than it would be on the 1 khz sin result, but the average person is not going to grasp that from the way the data is presented.  Showing the -92dBfs results relative to the noiseless case (so that the -6dB value maps close to 98 dB) would make it a lot more clear what that test means. 

4)  The bandlimiting you're using isn't really explained, and I didn't want to register to download the utility you're using for noise generation.  In general, I like the RMAA approach of doing things like frequency and IMD sweeps rather than just testing broadband signals like white noise because effects around 100-10,000Hz matter vastly more than very high or very low frequencies.  If you want to do broadband noise measurements anyway, I think you should band limit them to some set of relevent ranges and test independently (e.g. 50-500 Hz, 1000-5000 Hz) and leave out frequencies above 15kHz entirely.  Otherwise you are going to end up measuring a lot of things that do not matter such as the high pass filter transition band and the imaging filter while obscuring things that do matter (e.g. lower frequency bass roll off).  I also recommend aggressively band limiting the real music samples as well just to make sure things like transient bits of clipping (in the recording, not your hardware) do not introduce artifacts.  

What happens if you loopback the the recording device by the way?  How close to 16 bit limited are the individual tests?  This would serve as a great control. 

Re: Objective measurements of portable players using df-metric

Reply #5
I like the test instrumentation.  32 ohm load will reveal a lot of problems with typical hardware in my experience.  However, I see a few potential confounders:

1)  You're using 44.1khz when almost anything android (which is an awful lot of things these days) is going to default to 48khz and then very often resample.  Android has a relatively high quality windowed sinc resampler, but if you look at very broadband signals like the white noise example, you're probably going to detect the effect of the resampler and penalize pretty heavily when in reality the output is likely to be fine for real music.  Trying two different sampling rates, or else detecting resampling may be more accurate.  Certainly if you tested some of the devices you are proposing in rockbox, you would see very different results depending on the sampling rate setting in our software.
Still most of audio files that we listen are 44.1. That was the reason. Internal resampling is a good question. I'm thinking about small research of various resamplers in order to see how different signals are being distorted by them. My current understanding is that Df values for resampling are significanly higher than for other problems in audio circuits. One example of Df measurements for resampling is in my old article [http://soundexpert.org/news/-/blogs/visualization-of-distortion] Fig.7. For glockenspiel audio Df values are -74dB and -84dB depending on resampler. So I think this will not affect resulting Df values greatly as the latter are around -30dB for real-life signals and audio circuits of portable players. I hope the research of resamplers using df-metric will help to understand this better. And particularly, what signals are most sensitive to it. What resamplers do you recomend for the test?

Quote
2) The square and triangle tests looks questionable to me.  I see you're doing this correctly and generating band limited square/triangle signals (e.g. via fourier synthesis), but they're still going to have all sorts of weird interactions with the high pass filtering, imaging filter and resampler that probably have no relevance to real audio quality.  In fact, I suspect that you are seeing this in many of your tests given the large difference between the triangle and square results.  Is there a reason to consider such artificial signals at all?
Triangle signals are important for my another psychoacoustic research (don't want to dig into details now). Square wave was intended for measuring slew rate but, to be honest, I didn't touch the issue yet.

Quote
3)  The use of a common scale is a little confusing.  -6dB on the -92dbfs test is a much better result than it would be on the 1 khz sin result, but the average person is not going to grasp that from the way the data is presented.  Showing the -92dBfs results relative to the noiseless case (so that the -6dB value maps close to 98 dB) would make it a lot more clear what that test means.
Yes, in df-metric many Df values have sense in comparison to another ones (sine 12.5k and DFD for example). Dependency of Df values on the type of signals is inherent in df-metric and this must be accounted while interpreting results of measurements in any case. Taking into account that new signals will be added in future and probably some current ones - removed, I think it's better just to show values that were measured (without further complications). At this stage of the research at least. 

Quote
4)  The bandlimiting you're using isn't really explained, and I didn't want to register to download the utility you're using for noise generation.  In general, I like the RMAA approach of doing things like frequency and IMD sweeps rather than just testing broadband signals like white noise because effects around 100-10,000Hz matter vastly more than very high or very low frequencies.  If you want to do broadband noise measurements anyway, I think you should band limit them to some set of relevent ranges and test independently (e.g. 50-500 Hz, 1000-5000 Hz) and leave out frequencies above 15kHz entirely.  Otherwise you are going to end up measuring a lot of things that do not matter such as the high pass filter transition band and the imaging filter while obscuring things that do matter (e.g. lower frequency bass roll off).
IMD sweeps and noise divided by ranges look interesting. At the moment the only heavily band-limited noise signal, used in testing, is BS EN 50332-1. Usually technical signals  help to reveal some particular features of audio device performance (often without clear understanding how the feature affect listening experience). As the measurement procedure is always the same, the only two questions, when adding new tech. signals, are: what exactly we need to test and what signal can help to do this. For example I expect that with wide-spreading of VR-audio the very important audio parameter will be phase accuracy and some wide-band or high-frequency noise signals could help to measure it.

Quote
  I also recommend aggressively band limiting the real music samples as well just to make sure things like transient bits of clipping (in the recording, not your hardware) do not introduce artifacts. 
I would prefer to avoid preprocessing music material before feeding into DUT as it is against one of the main features of df-metric - testing as close as possible to real-life scenario of DUT use. Actually the tracks used for testing are very different by sound, have different RMS levels and frequency ranges. At least two of them have severe clipping. I plan to publish diffrograms of all sound tracks for all tested devices. May be influence of clipping better to examine comparing diffrograms of different tracks and different devices ... And most of other tracks are clipping-free. 

Quote
What happens if you loopback the the recording device by the way?  How close to 16 bit limited are the individual tests?  This would serve as a great control.
You mean to test the recording device with the same set of test signals? For the purpose I would need some playback device of high quality (lab. quality). My recording device has playback capability but its quality is mediocre (as it is less important in recorders), so loopback will show only quality of playback+recording chain. Or may be I didn't understand your thought ...

Thanks for your detailed comments and proposals, very helpful and insightful indeed.
keeping audio clear together - soundexpert.org

Re: Objective measurements of portable players using df-metric

Reply #6
Quote
the one is better than another
audibly better?
Yes, audibly. That's the main point - df measurements with sufficient amount of real-life audio material correlate well to perceived audio quality. Not always but in cases when artifact signatures are similar between devices. Without further details I can say that audio circuits of portable players have pretty similar artifact signatures.
keeping audio clear together - soundexpert.org

Re: Objective measurements of portable players using df-metric

Reply #7
I like the test instrumentation.  32 ohm load will reveal a lot of problems with typical hardware in my experience.  However, I see a few potential confounders:

1)  You're using 44.1khz when almost anything android (which is an awful lot of things these days) is going to default to 48khz and then very often resample.  Android has a relatively high quality windowed sinc resampler, but if you look at very broadband signals like the white noise example, you're probably going to detect the effect of the resampler and penalize pretty heavily when in reality the output is likely to be fine for real music.  Trying two different sampling rates, or else detecting resampling may be more accurate.  Certainly if you tested some of the devices you are proposing in rockbox, you would see very different results depending on the sampling rate setting in our software.

Still most of audio files that we listen are 44.1. That was the reason. Internal resampling is a good question. I'm thinking about small research of various resamplers in order to see how different signals are being distorted by them. My current understanding is that Df values for resampling are significanly higher than for other problems in audio circuits. One example of Df measurements for resampling is in my old article [http://soundexpert.org/news/-/blogs/visualization-of-distortion] Fig.7. For glockenspiel audio Df values are -74dB and -84dB depending on resampler.

That is a test of the foobar resampler e.g. a near perfect, power unconstrained resampler.  Even a good portable device is not and should not use a resampler like that because its ridiculously power inefficient.  Mobile is all about tradeoffs, tolerating some inaudible effects if it saves battery life. 

I agree with testing 44.1khz because it is so common, but the problem you're going to have here is that if you just do a difference over the full band, you'll be extremely sensitive to completely irrelevant things.  I would try to design the test to be as insensitive to these things as possible.

So I think this will not affect resulting Df values greatly as the latter are around -30dB for real-life signals and audio circuits of portable players. I hope the research of resamplers using df-metric will help to understand this better. And particularly, what signals are most sensitive to it. What resamplers do you recomend for the test?

Try the one I made for rockbox.  It is extremely power efficient (only about integer 5 multiply-adds per sample if I remember correctly), and in general good sounding, but the difference will be very large on broadband signals because that was not something I was optimizing for.   I suspect by ear most people will never notice it. 

Quote
4)  The bandlimiting you're using isn't really explained, and I didn't want to register to download the utility you're using for noise generation.  In general, I like the RMAA approach of doing things like frequency and IMD sweeps rather than just testing broadband signals like white noise because effects around 100-10,000Hz matter vastly more than very high or very low frequencies.  If you want to do broadband noise measurements anyway, I think you should band limit them to some set of relevent ranges and test independently (e.g. 50-500 Hz, 1000-5000 Hz) and leave out frequencies above 15kHz entirely.  Otherwise you are going to end up measuring a lot of things that do not matter such as the high pass filter transition band and the imaging filter while obscuring things that do matter (e.g. lower frequency bass roll off).
IMD sweeps and noise divided by ranges look interesting. At the moment the only heavily band-limited noise signal, used in testing, is BS EN 50332-1. Usually technical signals  help to reveal some particular features of audio device performance (often without clear understanding how the feature affect listening experience). As the measurement procedure is always the same, the only two questions, when adding new tech. signals, are: what exactly we need to test and what signal can help to do this. For example I expect that with wide-spreading of VR-audio the very important audio parameter will be phase accuracy and some wide-band or high-frequency noise signals could help to measure it.

I don't think transition band (>16 khz) frequencies are ever going to be relevant, and especially for resamplers it often makes sense to trade off against them, so I really don't recommend testing them, or if you do test them, seperate them from more important bands so that you can see where the problem is and decide if it matters on a given device.

Quote
  I also recommend aggressively band limiting the real music samples as well just to make sure things like transient bits of clipping (in the recording, not your hardware) do not introduce artifacts. 
I would prefer to avoid preprocessing music material before feeding into DUT as it is against one of the main features of df-metric - testing as close as possible to real-life scenario of DUT use. Actually the tracks used for testing are very different by sound, have different RMS levels and frequency ranges. At least two of them have severe clipping. I plan to publish diffrograms of all sound tracks for all tested devices. May be influence of clipping better to examine comparing diffrograms of different tracks and different devices ... And most of other tracks are clipping-free. 

The problem with real life scenarios is that people do not hear in the time domain, but when you take a difference signal, you weight errors in that domain.  In real life people hear in a modified frequency domain where the width of each band increases with center frequency (which is why we quantize audio there for compression).  Breaking your signal into different frequency bands is therefore more realistic - it is how people really perceive audio as a sum of different frequencies not as an oscillation in the time domain.  It also overcomes the main limitation of difference signals (or at least reduces it), that they weight error linearly in frequency so that high frequency errors swap out bass errors. 

Quote
What happens if you loopback the the recording device by the way?  How close to 16 bit limited are the individual tests?  This would serve as a great control.
You mean to test the recording device with the same set of test signals? For the purpose I would need some playback device of high quality (lab. quality). My recording device has playback capability but its quality is mediocre (as it is less important in recorders), so loopback will show only quality of playback+recording chain. Or may be I didn't understand your thought ...

Yeah I mean loop back on the recording device if you can.  I suppose if you think it is very bad then its not informative, but its useful to have an idea how sensitive your metric is to errors.  Maybe another way to think about it would be how many taps would be needed an in an otherwise noiseless anti-image filter to achieve a given level of signal difference.  My sense is that near-perfect output is probably still going to have significant difference energy due to the anti-image filter transition band, and sensitivity to that is something you might want to think about controlling. 

Re: Objective measurements of portable players using df-metric

Reply #8
Quote
the one is better than another
audibly better?
Yes, audibly. That's the main point - df measurements with sufficient amount of real-life audio material correlate well to perceived audio quality. Not always but in cases when artifact signatures are similar between devices. Without further details I can say that audio circuits of portable players have pretty similar artifact signatures.

I assume they start to correlate at some point, but not all of the time when the devices are in the range of audible transparency, So what is that point in the final df graph? In other words, device with what df score will be audibly indistinguishable from original signal in some kind of an ABX test?

Re: Objective measurements of portable players using df-metric

Reply #9
That is a test of the foobar resampler e.g. a near perfect, power unconstrained resampler.  Even a good portable device is not and should not use a resampler like that because its ridiculously power inefficient.  Mobile is all about tradeoffs, tolerating some inaudible effects if it saves battery life. 

I agree with testing 44.1khz because it is so common, but the problem you're going to have here is that if you just do a difference over the full band, you'll be extremely sensitive to completely irrelevant things.  I would try to design the test to be as insensitive to these things as possible.
Quote
The problem with real life scenarios is that people do not hear in the time domain, but when you take a difference signal, you weight errors in that domain.  In real life people hear in a modified frequency domain where the width of each band increases with center frequency (which is why we quantize audio there for compression).  Breaking your signal into different frequency bands is therefore more realistic - it is how people really perceive audio as a sum of different frequencies not as an oscillation in the time domain.  It also overcomes the main limitation of difference signals (or at least reduces it), that they weight error linearly in frequency so that high frequency errors swap out bass errors.
Most real-life audio material (especially if it is perfectly mixed and mastered) has natural high frequency roll-off. So, the "difference over the full band" problem (which exists indeed) is not so dramatic when using high quality audio material for testing. SE test set Variety consists of such tracks. Tech. signals is another story. In df-metric they are not used for sound quality assessment (may be "Program Simulation Noise" to some extent), but only for revealing some particular aspects of audio circuit performance (so, they are interesting mostly to developers of that circuits). In fact, for different types of DUTs and for different tasks there should be different sets of tech. signals.

Now I see that testing resamplers using df-metric becomes more and more important and interesting.
 
Quote
Try the one I made for rockbox.  It is extremely power efficient (only about integer 5 multiply-adds per sample if I remember correctly), and in general good sounding, but the difference will be very large on broadband signals because that was not something I was optimizing for.   I suspect by ear most people will never notice it. 
Will I need to have some device with rockbox software for testing your resampler or there are some other options?

Quote
I don't think transition band (>16 khz) frequencies are ever going to be relevant, and especially for resamplers it often makes sense to trade off against them, so I really don't recommend testing them, or if you do test them, seperate them from more important bands so that you can see where the problem is and decide if it matters on a given device.
I planned to use current set of SE test signals including "Variety". I want to see how they are affected by resamplers alone.

Quote
Maybe another way to think about it would be how many taps would be needed an in an otherwise noiseless anti-image filter to achieve a given level of signal difference.  My sense is that near-perfect output is probably still going to have significant difference energy due to the anti-image filter transition band, and sensitivity to that is something you might want to think about controlling. 
I did some tests of this kind as I use low-pass filter for recorded test sequence which is 96/24. To have level of filter artifacts below -150dB with white noise I use FIR filter of 4000 order (actually twice of that because it is used in reverse mode the second time).
keeping audio clear together - soundexpert.org

Re: Objective measurements of portable players using df-metric

Reply #10
I assume they start to correlate at some point, but not all of the time when the devices are in the range of audible transparency, So what is that point in the final df graph? In other words, device with what df score will be audibly indistinguishable from original signal in some kind of an ABX test?
To my current understanding, the levels -26dB -27dB are the anchors. For most listeners the portable players with lower Df medians will be indistinguishable from each other. Assuming the listeners are not trained, use good headphones and have "regular" music taste. It looks like Apple portable devices (which have similar audio solutions since iPhone4) were designed baring in mind exactly this audio quality level - top of mass market.
keeping audio clear together - soundexpert.org

Re: Objective measurements of portable players using df-metric

Reply #11
You can test out our resampler using the rockbox simulator: http://rasher.dk/rockbox/simulator/

Copy a file to the disk folder, then launch the simulator, browse to where you put the test file, long select it to bring up a context menu (enter or space bar if I remember correctly), open with, test_codec. If you turn on DSP, and the sample rate in rockbox settings is set to a different sampling rate than the file, it will use our resampler when you write out a decoded wav file, otherwise it keeps sampling rate the same).

Sorry for the complexity, the tool is made for us to do cosec benchmarking and debug, and so complex.

Re: Objective measurements of portable players using df-metric

Reply #12
At least there will be some anchor devices which are well-known as good and bad, so Df values will become more meaningful.
For comparison's sake, what would the hypothetical result of an ideal device look like?
Quis custodiet ipsos custodes?  ;~)

Re: Objective measurements of portable players using df-metric

Reply #13
Good question. To be honest, I don't know. My speculative guess is -35dB (histogram median) for portable audio. I would love to test some perfectly designed player in order to have high anchor. Another question - what player is considered as good/best sounding? 

For testing non-portable audio components (receivers, transports, amplifiers) I need precision audio interface for acquisition of analog signal. For these devices Df values will be lower.

If you mean ideal device in mathematical sense, all Df values will be -Inf dB. Current accuracy of Df computation is -145 dB so all of them will be around this value ))
keeping audio clear together - soundexpert.org

 
SimplePortal 1.0.0 RC1 © 2008-2018