input wanted: demystifying MP3

Topic: input wanted: demystifying MP3 (Read 3955 times) previous topic - next topic

0 Members and 1 Guest are viewing this topic.

input wanted: demystifying MP3

2023-11-08 13:49:37

Hey all! I'm a designer for Audacity. For the current 3.4 release, we've refactored our exporter and while we were at it, decided to hide the option to choose between stereo and joint stereo in MP3 for casual users. That decision came after reading this forum and wiki extensively, we just use -m j now. (for people who know what they're doing, talking to lame directly also is possible)

Naturally, I've now got some angry users claim that Joint Stereo "makes it narrow panned both channels too close together on the output meters" and want to make some explainer piece for this and other MP3 things in the near future.

So my question to you is: What are some other things related to MP3 or perhaps audio in general you wish the general public would know about? The two other things I'd like to touch upon so far are codec transparency (which makes "what's better quality - V0 or 320" usually academic) and ABX (so people can test for themselves whether what I'm saying is BS - or whether their long-held opinion is wrong). Anything else?

Re: input wanted: demystifying MP3

Reply #1 – 2023-11-08 15:39:37

Where is this information going?   The user manual?   Somewhere else?

Maybe something about why to choose Constant, Variable, or Average.   ...I usually recommend Joint Stereo, and VBR because there's no need to "waste bits" on silence or "simple sounds".

And maybe something about the fact that a higher bitrate doesn't always give you "better quality". Sometimes you can achieve transparency at a lower bitrate, or sometimes there are artifacts that don't go away or get better at a higher bitrate.

And people sometimes look at the spectrum and see the loss of high frequencies as loss of quality or to "prove" it's inferior. But I say, if you hear a compression artifact, that's usually not what you hear.

What I usually say about Joint Stereo is that this part of the compression (conversion from L & R to M/S) is completely lossless and reversible. And that it's "smart" and makes better use of the available "bits" by not encoding the same information twice (I know that's a simplification). I'm pretty sure FLAC and maybe all other popular compression formats use M/S by default.

P.S.
I think the re-design of the export windows/configuration will be an improvement that makes things easier.   The old sample-rate settings were confusing lots of people.

Re: input wanted: demystifying MP3

Reply #2 – 2023-11-08 19:04:28

Quote from: DVDdoug on 2023-11-08 15:39:37

I'm pretty sure FLAC and maybe all other popular compression formats use M/S by default.

Actually, FLAC the format can also store Left & Side and Right & Side, in addition to Mid & Side and dual mono. Selection is made independently for every encoded frame.
Reference FLAC the encoder will do number-crunching for Left, for Right, for Mid and for Side, and then pick the pair that compresses best: Side combined with Left or Mid or Right - or dual mono.
Mid is never encoded with anything else than Side. Reason is, Mid = average rounded down, i.e. taking the sum and discarding the LSB - which is reconstructed by Side, as sum and difference have same parity.

FLAC also differs from e.g. WavPack in that FLAC cannot decorrelate more than two channels. WavPack compresses 5.1 with a joint-stereo strategy on L/R, a joint stereo-stereo strategy on BL/BR, and then C as mono and LFE as mono. But WavPack cannot choose to encode Side with anything but Mid.
TAK I think finds a suitable correlation matrix.

So the formats are not all doing the same, by any means.

Re: input wanted: demystifying MP3

Reply #3 – 2023-11-08 23:12:35

To answer "what is the benefit?" I did some experiments with a FLAC file. I mainly used FLAC because we don't have to consider "quality" and the only (unknown) variable is bitrate.

Maybe this information will be helpful in your explanation if you can simplify it (and assuming you agree with the conclusion)...

It looks like it makes approximately a 13% difference (not whole lot). That is, you get the same quality with a 13% lower bitrate or you could get quality equal to a 13% higher bitrate.

The idea is that if FLAC didn't have M/S processing a stereo file would be twice the size of a mono file. So we can compare the true stereo FLAC with M/S to the "imaginary" dual mono FLAC without M/S, and the difference is M/S.

This is just a one-off experiment with FLAC instead of MP3 so as they say, "your milage may very".

Here are the actual numbers and I included the WAV files which are as-expected:

Original WAV stereo, 59.7MB, 1336kbps
WAV mono, 29.9MB, 768kbps
WAV dual mono, 59.7MB, 1336kbps

FLAC stereo, 34.6MB, 888kbps
FLAC mono, 19.6MB, 503kbps
FLAC dual mono, 19.6MB, 503kbps

Re: input wanted: demystifying MP3

Reply #4 – 2023-11-08 23:39:57

I actually tested the effects of inter-channel decorrelation on lossless codecs here: https://hydrogenaud.io/index.php/topic,121770
Some misunderstandings of mine had to be corrected throughout the thread ...

But for lossy, things are a bit different, as it isn't about preserving all the information.
If someone makes the observation that joint stereo sounds like "more narrow" stereo, then explanations may include:

Biased observation: people read "joint stereo" and believe that it will join the channels more closely, that dual mono is the right thing, blah blah blah. Will not hold up to a blind test.
Compressing two channels in a lossy way means different losses per channel. Channels might sound more distinct because the lossiness introduces artefacts that make it sound that way.
Joint stereo means the encoder can save some bits, and then it will re-evaluate the "how to get best sound for cheap". If it can get overall better sound by ever so slightly narrowing down the stereo image, then that is the right thing to do, and if you don't like it, increase bit rate. Even if it saves bits that can be used to improve, it isn't given that absolutely every aspect is getting better (or at least not worse).
Even if the previous point gives more room for improvement, then it is based on a model of reality - and sometimes that fails.

I am not at all going to say "in no particular order" ...

Re: input wanted: demystifying MP3

Reply #5 – 2023-11-09 08:26:52

Commenting on Post 1:

I see the export pane in the "what's new" information, and it casually labels (what you say is) Joint Stereo as "Stereo". What's more, I can't see a button leading to advanced options where the option to select separate stereo is available, so how would I do that if I wanted it? I admit that mostly I want Joint Stereo, and I have on occasion selected Stereo by accident (and ended up with a file larger than anticipated), but that was my fault not yours – my view is that I want the option to make that mistake.

I suggest that "getting some users angry" is not a good path to go down just for the sake of making things simpler for the inexperienced – educate the inexperienced rather than try to defend what you've done by flowery arguments which will not persuade power users.

Does it actually matter if an inexperienced user doesn't know the difference between Stereo and Joint Stereo? They'll still get an acceptable result.

My advice: roll out an update ASAP, with both Stereo and Joint Stereo as options on the export panel. The very presence of a Joint Stereo button is educational in itself.

Re: input wanted: demystifying MP3

Reply #6 – 2023-11-10 18:04:57

@fooball: IMO the export panel of an audio editor shouldn't display all the options for something like LAME. That would look messy, confuse most users and generate support tickets. Power users who want to tinker with weird settings have other ways to do so.
But I admit that I didn't look too much into the difference between "forced" and "joint" stereo. If there is a realistic scenario that we need to worry about (audible bad consequences), your point is valid. Otherwise, not really.

Re: input wanted: demystifying MP3

Reply #7 – 2023-11-10 18:17:53

Quote from: Brand on 2023-11-10 18:04:57

IMO the export panel of an audio editor shouldn't display all the options for something like LAME. That would look messy,

But it did previously. How messy is it having buttons for Stere and Joint Stereo?

Quote

confuse most users and generate support tickets.

Confuse??? Generate support tickets because both options are there??? They have been there, so is there a history of them generating support tickets?

Quote

Power users who want to tinker with weird settings have other ways to do so.

Fine, but where are the tools to do that? The OP mentions "talking to lame directly", but how? Why not just have an advanced options sub-panel? Too simple an idea I suppose.

Quote

But I admit that I didn't look too much into the difference between "forced" and "joint" stereo. If there is a realistic scenario that we need to worry about (audible bad consequences), your point is valid. Otherwise, not really.

So basically you didn't know the difference between Stereo and Joint Stereo, this discussion has educated you, and you've proved my point.

Re: input wanted: demystifying MP3

Reply #8 – 2023-11-10 20:13:40

IMHO there is no practical need to offer anything other than -V (0..6) and an option for -b 320.
LAME has been tested to death many times over, developers have chosen those defaults for good reasons.
If you need something very specific, it's a safer bet anyway to use command line and then be exactly sure what you get.

Re: input wanted: demystifying MP3

Reply #9 – 2023-11-10 20:42:18

Surely anybody who uses an audio editor will have at least a basic knowledge of lossy and lossless codecs, otherwise they'd be wasting their time. The idea of over simplifying encoder interfaces to cater for those who don't know what they're doing is counter productive. Most users of audio editors know precisely why they're using them and want all of the options offered by the various codecs to be available. Or, am I missing something?

Re: input wanted: demystifying MP3

Reply #10 – 2023-11-10 21:02:49

Quote from: fooball on 2023-11-10 18:17:53

you've proved my point.

I'm not sure about that, because the more I learn about "forced stereo", the less I want it to come back to Audacity.

But I get your point that adding more options (and there are plenty to be added - the MP3 export window could take up the whole screen) will lead to people asking more questions... There's an educational aspect there. I just don't think the downsides are worth it.

Quote from: john33 on 2023-11-10 20:42:18

Surely anybody who uses an audio editor will have at least a basic knowledge of lossy and lossless codecs, otherwise they'd be wasting their time. The idea of over simplifying encoder interfaces to cater for those who don't know what they're doing is counter productive. Most users of audio editors know precisely why they're using them and want all of the options offered by the various codecs to be available. Or, am I missing something?

I think you're greatly overestimating the codec knowledge of most users of such software (no hard data, just anecdotes).
More to the point, is removing "forced stereo" a case of over simplifying or just a good decision that results in higher quality MP3s overall?

Re: input wanted: demystifying MP3

Reply #11 – 2023-11-10 21:32:55

Then why not offer a similar scenario to foobar? Simplified VBR and CBR interfaces for the uninterested/uneducated (I'm not trying to be patronising!), but with the option to expose the command line for those who know what they're doing and have specific requirements? Surely the use of joint stereo is so widespread that those who believe they perceive a difference in imagery should probably not be using a lossy codec in the first place!

Re: input wanted: demystifying MP3

Reply #12 – 2023-11-11 00:20:51

Quote from: Brand on 2023-11-10 21:02:49

I'm not sure about that, because the more I learn about "forced stereo", the less I want it to come back to Audacity.

Nobody's making you use it.

I'm not sure about calling it Forced Stereo, Full Stereo maybe. The point is that it is two separate channels encoded in the MP3, so the channels can carry entirely separate waveforms. Joint Stereo attempts to reduce the data by exploiting the similarities between the channels, but if the channels are totally dissimilar there is no reduction, and decoding may introduce crosstalk. Therefore, if either is to be despised, surely that should be Joint Stereo?

I realise there is an argument for not using lossy compression at all in that scenario, but nonetheless MP3 is ubiquitous as a format.

Back on topic: the OP wants suggestions for persuading enthusiasts that omitting the Full Stereo export option is a good idea. I say that's barking up the wrong tree. Some people might never use it, but enthusiasts are never going to be happy with a reduction of functionality (and an unnecessary reduction so far as I can see). Whose bright idea was that???

Re: input wanted: demystifying MP3

Reply #13 – 2023-11-11 00:38:04

Quote from: fooball on 2023-11-11 00:20:51

Joint Stereo attempts to reduce the data by exploiting the similarities between the channels, but if the channels are totally dissimilar there is no reduction, and decoding may introduce crosstalk. Therefore, if either is to be despised, surely that should be Joint Stereo?

This exact misconception is the reason why the option should be removed. Selecting joint stereo does not force mid/side coding! If the two channels are dissimilar, they will be coded separately, exactly as they would when choosing forced stereo.

Re: input wanted: demystifying MP3

Reply #14 – 2023-11-11 08:29:03

Quote from: Octocontrabass on 2023-11-11 00:38:04

This exact misconception is the reason why the option should be removed. Selecting joint stereo does not force mid/side coding! If the two channels are dissimilar, they will be coded separately, exactly as they would when choosing forced stereo.

Are you saying there is no benefit to full stereo, or almost no benefit? Interesting, but does that matter if that's what the user wants to do? This sounds like the argument of why bother with this codec or that codec when xxx codec is "clearly" superior.

Okay, so I'm not as well versed as some people on the technical details, but I'm more interested in the general point of not restricting use.

Re: input wanted: demystifying MP3

Reply #15 – 2023-11-11 09:57:47

At least for LAME, which due to patenting was developed as an educational tool, there is no reason to disable any stereo mode.

Also, the LAME manual does not promise that joint stereo - which in LAME means, encoder can switch between M&S and L&R from frame to frame - is always beneficial. That is up to how well an encoder handles it. Quoting:

Using mid/side stereo inappropriately can result in audible
compression artifacts. Too much switching between mid/side and L/R
stereo can also sound bad. To determine when to switch to mid/side
stereo, LAME uses a much more sophisticated algorithm than that
described in the ISO documentation.

Re: input wanted: demystifying MP3

Reply #16 – 2023-11-11 16:30:07

This is what I thought joint-stereo to be:
https://forum.audacityteam.org/t/what-is-the-difference-between-joint-stereo-and-stereo/36030/4

Quote

In normal “Stereo” mode, MP3 stores a separate Left and Right channel, though bitrate can be distributed between Left/Right as needed. In “Joint Stereo” mode, there are still two channels, but they are called Mid and Side. Before the compression, Mid/Side are computed from Left/Right. While Mid stores the common portion of the Left/Right channels, Side stores the difference. After decompression, Left/Right will be reconstructed from Mid/Side.

The advantage of “Joint Stereo” is that, usually, the Left/Right channels are pretty similar, so most information will be in the Mid channel and only very few information in the Side channel. This means that, with “Joint Stereo” mode, the redundant part doesn’t have to be stored twice! This means that “Joint Stereo” can use the available bits more efficiently than normal “Stereo” mode does - provided that the Left/Right channels are somewhat similar.

The rule of thumb is that “Joint Stereo” is advantageous at lower MP3 bitrate, while it’s less helpful on higher bitrates. Also the advantage depends a lot on the content. Anyway, with the LAME MP3 encoder you don’t need to worry. It uses “Joint Stereo” by default (-m j 2), but still switches between Left/Ride and Mid/Side mode dynamically, i.e. it always picks the “best” mode for every frame. You can force Mid/Side mode (-m f 2), but that’s not recommended…

And what you guys are calling "forced stereo" is what I'm used to calling "simple stereo".

Re: input wanted: demystifying MP3

Reply #17 – 2023-11-11 16:59:36

Quotation from the LAME documentation:

Quote

LAME - Mid/Side Stereo

During years, what is called Joint-stereo has been misunderstood.
Joint stereo in MP3 is a mechanism to selectively choose between three modes of storing stereo information. These three modes are Simple Stereo , Mid-Side Stereo, and Intensity-Stereo.

In Simple Stereo, the encoder analyzes the left and the right channels independently and stores the information as-is, without further checking the similarities in the signal¹

In Mid-Side Stereo, the encoder analyzes the left, right² , mid (l+r) and side (l-r) channels. It then gives more bits to the mid than the side channel (as usually the side channel is less complex) and then stores just the mid and side channels into the resulting MP3.
This way, the mid channel can be encoded as if the frame was bigger, and as such have more quality with the same bitrate.
Note: Mid/side in MP3 is switched frame-by-frame. In AAC, it can be switched band by band.

Intensity-Stereo (not supported in LAME) uses a technique known as joint frequency encoding, which is based on the principle of sound localization.
Human hearing is predominantly less acute at perceiving the direction of certain audio frequencies. By exploiting this 'limitation', intensity stereo coding can reduce the data rate of an audio stream with little or no perceived change in apparent quality.
It works by merging the upper spectrum into just one channel (thus reducing overall differences between channels) and transmiting a little side information about how to pan certain frequency regions.
This type of coding does not perfectly reconstruct the original audio because of the loss of information and can cause unwanted artifacts. However, for very low bitrates this tool usually provides a gain of perceived quality. ³

The LAME mid/side switching criterion, and mid/side masking thresholds are taken from Johnston and Ferreira, Sum-Difference Stereo Transform Coding, Proc. IEEE ICASSP (1992) p 569-571.

The MPEG AAC standard claims to use mid/side encoding based on this paper.

1. This is not the same than dual-mono. Dual-mono should be used where the left and right channels of the input file contain two different streams, where you should choose one (as in two different languages)
2. If one channel has much less noise masking in a certain band than the other, it could happen than the noise spread (by mid/side stereo) may no longer be masked for that channel. If both channels have the same masking, then the noise spread between both channels will be equally well masked.
To prevent this from happening, there is an analysis done on the left and right channel to determine the noise masking thresholds and properly mask the noise.
3. Quote from wikipedia Joint_stereo.

Re: input wanted: demystifying MP3

Reply #18 – 2023-11-11 21:54:41

Quote from: john33 on 2023-11-10 21:32:55

Then why not offer a similar scenario to foobar? Simplified VBR and CBR interfaces for the uninterested/uneducated (I'm not trying to be patronising!), but with the option to expose the command line for those who know what they're doing and have specific requirements?

Yes, I quite like Foobar's converter UI and I wouldn't mind something similar in Audacity. It could be improved too, by making the transition from the "simple mode" slider+buttons UI to the "advanced mode" CLI arguments more obvious. Maybe by putting them on the same page, with the CLI options updating visually in real-time, following the "simple mode" changes.
It's not something I would necessarily prioritize, but it would be nice to have and it should satisfy pretty much all needs.

Quote from: fooball on 2023-11-11 00:20:51

I'm not sure about calling it Forced Stereo, Full Stereo maybe.

The official LAME guide calls it (forced) L/R stereo. I made a mistake in omitting the L/R part, because that's what differentiates it from joint stereo or forced M/S stereo.

BTW, in the User Interface Guidelines they say: "There is no choice between different stereo modes, as the default mode [joint stereo] should be optimal. Other modes are likely to decrease quality."

Quote

the OP wants suggestions for persuading enthusiasts that omitting the Full Stereo export option is a good idea. I say that's barking up the wrong tree. Some people might never use it, but enthusiasts are never going to be happy with a reduction of functionality (and an unnecessary reduction so far as I can see). Whose bright idea was that???

Audio converting enthusiasts wouldn't be using Audacity in the first place. It was never great at this task and I assume that the devs have more urgent things to work on, like all the audio editing features.
If you need more audio converting features, you can pick from one of the existing (free and open source) apps that excel at this, including the command line utilities and their various graphical front ends. While I would personally like to have more options (in an "advanced" section), I cannot honestly say that this is what most Audacity users need and that this should be prioritized in Audacity's development.

About the OP's "button controversy": I trust the LAME developers in their joint stereo default/recommendation. If someone wants to claim that forced L/R stereo mode is beneficial to the point that there has to be button for it, I'd like to see some evidence, like a killer sample for joint stereo. Ideally something realistic, but even a purpose-made signal would be interesting. Otherwise, I worry that having that option in a prominent place might do more harm than good.

Re: input wanted: demystifying MP3

Reply #19 – 2023-11-11 23:04:44

Quote from: fooball on 2023-11-11 08:29:03

Are you saying there is no benefit to full stereo, or almost no benefit?

Neither. I'm saying it's worse than joint stereo. A lot of time and effort has gone into tuning LAME's default settings, and in the vast majority of cases, overriding those defaults will either hurt quality or waste bits.

Quote from: fooball on 2023-11-11 08:29:03

Interesting, but does that matter if that's what the user wants to do? This sounds like the argument of why bother with this codec or that codec when xxx codec is "clearly" superior.

There are good reasons to choose a "worse" codec like MP3 over other codecs. There are no good reasons to override LAME's default settings.

Re: input wanted: demystifying MP3

Reply #20 – 2023-11-13 20:56:10

Quote from: fooball on 2023-11-09 08:26:52

I see the export pane in the "what's new" information, and it casually labels (what you say is) Joint Stereo as "Stereo". What's more, I can't see a button leading to advanced options where the option to select separate stereo is available, so how would I do that if I wanted it?

You're not seeing an option for Joint Stereo in that screenshot, you're seeing just the channel count there. It can be 1 (mono), 2 (stereo) or up to 32 channels for which you then can map yourself which tracks are supposed to go to which output channel. If you want to use advanced options, it lives in an "External program" and "custom FFmpeg" export option, which gives you full access to command-line LAME and FFmpeg (or anything else that reads STDIN), and an advanced format/codec selector dialog, respectively.

Quote from: fooball on 2023-11-09 08:26:52

I suggest that "getting some users angry" is not a good path to go down just for the sake of making things simpler for the inexperienced – educate the inexperienced rather than try to defend what you've done by flowery arguments which will not persuade power users.

We have millions of users, it's actually impossible to do anything which doesn't get someone angry. Even doing nothing has people angry because Audacity looks so ugly and low-quality and as people upgrade their PCs it keeps getting worse because it neither handles large amounts of pixels nor HiDPI well.

I want to make a piece on it precisely to educate and to put out a signal against "lame joint stereo is worse" which prevails on the web, not really aimed at enthusiasts but rather people who saw the "joint stereo" in the past, got confused by it, googled it, saw it's allegedly worse and stuck to "stereo" ever since.
Power users have options already - just not as exposed.

Quote from: john33 on 2023-11-10 20:42:18

Surely anybody who uses an audio editor will have at least a basic knowledge of lossy and lossless codecs, otherwise they'd be wasting their time. The idea of over simplifying encoder interfaces to cater for those who don't know what they're doing is counter productive. Most users of audio editors know precisely why they're using them and want all of the options offered by the various codecs to be available. Or, am I missing something?

Audacity is the one audio editor in which this presumption cannot be made. People use it for making voice notes, people use it for converting wav and wma to mp3, people use it to make their first songs, people use it to make audio books, people use it to calibrate the shutter speed of their cameras and to apply reverb to images. We may be able to assume rudimentary "MP3s are audio files I can attach to email, WAV are audio files which my email yells at me for being too large", but not too much more than that.

Given that we're making the app *much* more powerful and complex in upcoming releases (end goal is a DAW), it is imperative to reduce cognitive load wherever we can. Hiding options which have at best no real noticeable effect on the output is part of that, and in the future we'll also be hiding options not relevant for your workflow (think music vs podcast workspaces).

I would be in favor of an "advanced options" accordion down the road, but given you can pass the CLI arguments directly elsewhere, it's not a priority (especially given that all options for all supported encoders sounds like a lot of work)

Notice