Way round MP3 efficiency limitations?

2003-03-10 16:20:19

Hi,

I've had a few thoughts based on my (limited) understanding of what decisions made about MPEG-I layer 3 (.mp3) have hampered the quest for efficient, transparent VBR encoding. Perhaps there are some ways to edge it nearer to the efficiency of the MusePack (.MPC) format without departing from the .MP3 standard, which remains well-supported in software and hardware players of all sorts.

Currently lame APS is about 190-200 kbps typically, while mppenc --quality 5 --xlevel (or --standard) seems to be around 155-165 kbps typically (about 18% fewer bits for .MPC)

I don't know enough of the details to shoot these ideas down myself, so perhaps you more knowledgeable people could do so! I'm only thinking out loud, really, and don't need to be treated with kid gloves if I'm being daft or showing my ignorance!

1. sfb21. As I understand it, there's no gain factor for the top spectral band, which causes it to require more bits to encode than it should, and for perceptual transparency (preserving clarity, sheen and sparkle), the frequencies in the top band, sfb21, are still important.

2. Time resolution. The time resolution is limited by the block size, and for replication of transients, avoiding pre-echo, and so on, the shorter the time resolution the better. The best that MP3 can achieve is short blocks (576 samples), which at 44.1 kHz sampling rate are 13 milliseconds long.

So, I'm wondering, ignoring encoding speed, would there be a possible advantage (in a well-tuned codec) to upsampling CDs from 44.1 kHz to 48 kHz before encoding? I'm guessing that LAME would need some attention to really get optimally tuned at 48 kHz (or kSa/s).

I'd presume that the sfb21 band at 48 kSa/s would start at a frequency about 9% higher, and closer to the lowpass (which could remain at 19 kHz as with --alt-preset standard, which is becoming known as --preset standard in later LAME codecs, which is a lower frequency on a normalised basis of inverse bit-period) thus reducing the proportion of audible information that has to be encoded in this inefficient band.

Is this a large enough advantage to be worthwhile?

Also, the time resolution of standard blocks (1152 samples) would improve from 26 ms to 24 ms and short blocks would shorten from 13 ms to 12 ms.

Although only a slight improvement, this might allow marginally fewer short blocks to be used, aiding efficiency.

For the resampling, I'd guess that the fastest method with perceptually inaudible frequency-domain ripple could be used. It might even be possible to estimate whether the expected ripple is masked in each frame/block/granule, and only when it isn't, to recompute using a slower resampling method (like fully bandwidth-limited). I'd imagine that a discrete transform isn't amenable to effectively resampling via the frequency domain as a full Fourier Transform could be, but if it were, it might introduce the least computational overhead to the change of bitrate.

Of course side effects could be many, including a lack of on-the-fly decoding support for most CDaudio burning software, and possibly a lack of support for some hardware players that might only handle 44.1 kHz.

Maybe even the greater number of frames/granules per second would actually waste as many bits as are saved in sfb21 etc. (especially if short block switching doesn't give any advantage).

Anyway, I'd appreciate comments.

I presume that trying it out from a resampled .WAV would be pointless, assuming that --alt-preset standard is only truly optimally tuned for 44.1 kHz.

Regards,

Dick Darlington

Way round MP3 efficiency limitations?

Reply #1 – 2003-03-10 16:34:09

Hi,

It might be possible (with a major psymodel rehaul) to reach --APE quality at maybe ~145-180kbps for mp3, but:
- that would require A LOT of work
- that would require A LOT of time
- that would still not reach musepack's level of quality.

The problem is, short blocks are very expensive in bits. So, if you want ideal quality, you have to use them a lot - which is incompatible with low-bitrate operation..

Way round MP3 efficiency limitations?

Reply #2 – 2003-03-10 16:41:23

The trick with 48khz upsampling has been tried ages ago. As far as I remember, the problem of quality loss due to upsampling and then downsampling again and the fact that the encoder is not tuned for 48khz sample rates outweighed the quality gains.

Edit: And don't forget, you get more blocks per same unit of time, so more bits. This compensates for bit savings due to less sfb21 usage.

Way round MP3 efficiency limitations?

Reply #3 – 2003-03-10 17:31:32

Thanks guys, those replies have answered most of my questions and a little more.

I was wondering whether the more-blocks per unit time might be outweighed by less info per block (plausible from an Information Theory point of view), though clearly the psymodel would have to be optimised all over again for 48 kHz to take advantage of the informational stuff.

I thought it would require too much work to be worthwhile, and as someone with a preference for MPC myself (I don't have a hardware player - my nearest being an old MiniDisc Recording Walkman), it was more out of interest than practical purpose that I asked.

DickD

Way round MP3 efficiency limitations?

Reply #4 – 2003-03-10 18:21:24

From an information theory point of view, I also think the 48kHz approach is a bit superior - at least if time resolution matters.

At vey low bitrates, 48kHz will be worse - mainly because you rely on the efficiency and time duration of long blocks. That's why encoders often resample when forced to small bitrates.

Way round MP3 efficiency limitations?

Reply #5 – 2003-03-10 18:26:55

Quote

I was wondering whether the more-blocks per unit time might be outweighed by less info per block (plausible from an Information Theory point of view),

Overhead matters.

Way round MP3 efficiency limitations?

Reply #6 – 2003-03-11 08:34:23

Your numbers a wrong for short blocks. A short block is 192 samples.
But you are right for 48kHz. 48kHz would produces less pre-echo than 44.1kHz.
The problem is that 44.1kHz is more widespread, and so the psy model is mostly tuned for 44.1kHz.

For the sfb21, you are out of luck. The sfb21 is starting at about the same frequency point for both 44.1 and 48kHz.

Once again, an mp3 modified to remove bad design points is called AAC.

Notice