q value (encoding algorithm quality) and interpreting spectrograms

Topic: q value (encoding algorithm quality) and interpreting spectrograms (Read 10635 times) previous topic - next topic

0 Members and 1 Guest are viewing this topic.

q value (encoding algorithm quality) and interpreting spectrograms

2017-01-04 23:31:37

Hi there,

I was playing around with the "algorithm quality" setting in my mp3 encoder GUI.
I use TAudioConverter. I want to create the best mp3 output possible and am not concerned about output file size or computing power needed.
In TAudioConverter, for "algorithm quality", a number from 0 to 9 can be selected. I decided to convert my FLAC song three times to mp3, each time with only another number selected for "algorithm quality", all in combination with cbr 320.

As I cannot hear the difference between the outputs, I decided to view "spectrograms" of the FLAC file and the output files to see the differences. Moreover I compared file sizes of the output mp3 files.

By checking the file sizes in windows explorer I discovered that the file sizes of all output mp3 files are exactly the same (not even a single byte different). On some forums (this one and doom9 for example) I have read multiple discussions about the influence of the algorithm quality, with some people stating that it will make output files smaller. Well, that's not the case (makes sense to me as the bitrate is constant and the same in all cases).
So then I thought, apparently, if the algorithm quality is set higher, the quality of the mp3's must differ. I heard that q 0 gives the best quality (and much computing power needed) and q 9 the worst.
To test this I made spectrograms (I used Spek for this, from spek.cc):

FLAC

q 0

q 3

q 9

Now I as a noob would say that the q9 algorithm quality setting leaves more of the original (FLAC) spectrogram intact and is therefore better. So, my question is: what am I doing wrong or how does it work?
I'd greatly appreciate it if someone who know the answers could explain!

This is the MediaInfo information for the FLAC and the mp3 cbr 320 "algorithm quality 0" files :
(you can see from the information that LAME 3.99r was used and indeed some "q 0" is used for encoding)
(oh, and don't mind the changed tags, I always use an mp3 tagger for that, but it doesn't do anything else than tagging, so not related to the question I suppose)

FLAC:

General
Complete name : C:\Users\Jonas\Downloads\01-choir_of_young_believers-hollow_talk.flac
Format    : FLAC
Format/Info : Free Lossless Audio Codec
File size : 31.9 MiB
Duration    : 5 min 21 s
Overall bit rate mode : Variable
Overall bit rate    : 832 kb/s
Album : This Is For The White In Your Eyes
Track name    : Hollow Talk
Track name/Position : 1
Performer : Choir Of Young Believers
Producer    : Ghostly International
Genre : Indie
Recorded date : 2008
Writing application : FLAC 1.2.1
Rip Date    : 2014-12-16
Retail Date : 2008-00-00
Media : CDDA
Ripping Tool    : EAC Secure
Release Type    : Normal
Related : http://tinyurl.com/nrssdzt

Audio
Format    : FLAC
Format/Info : Free Lossless Audio Codec
Duration    : 5 min 21 s
Bit rate mode : Variable
Bit rate    : 832 kb/s
Channel(s)    : 2 channels
Channel positions : Front: L R
Sampling rate : 44.1 kHz
Bit depth : 16 bits
Stream size : 31.9 MiB (100%)
Writing library : libFLAC 1.2.1 (UTC 2007-09-17)
Language    : English

mp3 cbr 320 "algorithm quality 0"

General
Complete name : C:\Users\Jonas\OneDrive\Muziek\Choir of Young Believers\This Is For The White In Your Eyes\01 Hollow Talk.mp3
Format    : MPEG Audio
File size : 12.4 MiB
Duration    : 5 min 21 s
Overall bit rate mode : Constant
Overall bit rate    : 320 kb/s
Album : This Is For The White In Your Eyes
Album/Performer : Choir of Young Believers
Part/Position : 01
Part/Total    : 01
Track name    : Hollow Talk
Track name/Position : 01
Track name/Total    : 10
Performer : Choir of Young Believers
Composer    : Anders Rhedin, Fridolin Tai Nordsø Schjoldan & Jannis Noya Makrigiannis
Genre : Alternative
Recorded date : 2008
Writing library : LAME3.99r
Cover : Yes
Cover type    : Cover (front)
Cover MIME    : image/jpeg

Audio
Format    : MPEG Audio
Format version    : Version 1
Format profile    : Layer 3
Mode    : Joint stereo
Duration    : 5 min 21 s
Bit rate mode : Constant
Bit rate    : 320 kb/s
Channel(s)    : 2 channels
Sampling rate : 44.1 kHz
Compression mode    : Lossy
Stream size : 12.3 MiB (98%)
Writing library : LAME3.99r
Encoding settings : -m j -V 4 -q 0 -lowpass 20.5

Re: q value (encoding algorithm quality) and interpreting spectrograms

Reply #1 – 2017-01-04 23:52:45

First mistake is looking at spectrograms.

Re: q value (encoding algorithm quality) and interpreting spectrograms

Reply #2 – 2017-01-05 00:27:39

I love how self-proclaimed "noobs" somehow have the confidence to declare the program developers and users all wrong.

From LAME command line manual:

* -q 0..9 algorithm quality selection
Bitrate is of course the main influence on quality. The higher the bitrate, the higher the quality. But for a given bitrate, we have a choice of algorithms to determine the best scalefactors and Huffman encoding (noise shaping).

-q 0: use slowest & best possible version of all algorithms. -q 0 and -q 1 are slow and may not produce significantly higher quality.

-q 2: recommended. Same as -h.

-q 5: default value. Good speed, reasonable quality.

-q 7: same as -f. Very fast, ok quality. (psycho acoustics are used for pre-echo & M/S, but no noise shaping is done.

-q 9: disables almost all algorithms including psy-model. poor quality

Re: q value (encoding algorithm quality) and interpreting spectrograms

Reply #3 – 2017-01-05 05:14:18

Quote from: doomie on 2017-01-04 23:31:37

By checking the file sizes in windows explorer I discovered that the file sizes of all output mp3 files are exactly the same (not even a single byte different). On some forums (this one and doom9 for example) I have read multiple discussions about the influence of the algorithm quality, with some people stating that it will make output files smaller. Well, that's not the case (makes sense to me as the bitrate is constant and the same in all cases).

It should go without saying, but when you tell the encoder to make files that are exactly the same size, you get files that are exactly the same size.

You're more likely to see differences in file size if you use ABR or VBR mode. However, there's no guarantee the size will correspond to q.

Quote from: doomie on 2017-01-04 23:31:37

Now I as a noob would say that the q9 algorithm quality setting leaves more of the original (FLAC) spectrogram intact and is therefore better. So, my question is: what am I doing wrong or how does it work?

The spectrogram only shows you which parts of the spectrum have been preserved by the encoder, but not how well they have been preserved.

In higher quality modes, LAME is better at figuring out which parts of the spectrum are inaudible at different points in time and throwing that data away. With less information to encode, the remaining information can be encoded at a higher quality.

In lower quality modes, LAME can't figure out which parts of the spectrum are inaudible. The limited number of available bits must be shared between more of the spectrum, which means fewer bits are available for the parts you can hear. The important parts have to be encoded at a lower quality since the unimportant parts are wasting space.

Of course, at 320kbps, there are enough bits to go around that even the "lower quality" of q9 is unlikely to sound different from the original audio. This experiment might be more meaningful if you repeat it at a much lower bitrate (and stop using spectrograms).

Re: q value (encoding algorithm quality) and interpreting spectrograms

Reply #4 – 2017-01-05 09:26:48

Thank you very much! Learned something new today

Re: q value (encoding algorithm quality) and interpreting spectrograms

Reply #5 – 2017-01-05 16:49:10

You wouldn't try to "listen to" images to assess their quality, why are you trying to "look at" music?

Re: q value (encoding algorithm quality) and interpreting spectrograms

Reply #6 – 2017-01-05 19:39:10

@Apesbrain : Your documentation reference is outdated:

http://lame.cvs.sourceforge.net/viewvc/lame/lame/USAGE (search -q 0 )
http://lame.cvs.sourceforge.net/viewvc/lame/lame/doc/html/detailed.html#q

@doomie : As has been said by others, lossy encoders work by deciding what is more important to preserve more faithfully, and what can be different.
Using the quality setting, you are choosing which tools you let the encoder use to decide that. If it has no tools, it cannot decide, so everything is given some chance. This, by definition, distributes the available bits worse than if it has the information.

Since spectograms are not a faithful representation of what we hear, they can't be used to judge quality more than some obvious defects.
Also, what you see in q0 is a specific defect of MP3 as a format. The frequencies from 16Khz and upwards can make the encoding use much more bits than needed in some special scenarios. As such, LAME tries to avoid allocating bits in there, or reducing the quality in that band if that lets it increase the quality below that frequency, which is what we can hear more clearly. It can only do this if it has the adequate information.

Edit: Once I suggested to implement a spectogram in such a way that represents better how the audio is heard. The most basic idea was to use a logarithmic frequency scale (ideally one that represents the ear hair cells) as well as applying known thresholds of hearing to the height of those frequencies. Then, probably it would also need to take care about the ear response (how fast and how precise it needs to be), so i don't think it would look much like a spectogram in the end.

Re: q value (encoding algorithm quality) and interpreting spectrograms

Reply #7 – 2017-01-07 06:36:17

Quote from: saratoga on 2017-01-04 23:52:45

First mistake is looking at spectrograms.

I think the first mistake was not to understand the concpet of CBR.
The biggest mistake was to try to judge quality from spectrograms

Quote from: doomie on 2017-01-04 23:31:37

all in combination with cbr 320.
By checking the file sizes in windows explorer I discovered that the file sizes of all output mp3 files are exactly the same (not even a single byte different)

this is just as much of a surprise as if i went out an bought 6x 1 gallon tugs. Filled them all with 1gallon of water 3 times over but with 3 different levels of contaminated water. and then be surprises that in all 3 cases i have exactly 6 gallons of water...

By saying 320CBR you are telling the encode to use exactly 320kbits per second . nothing more and nothing less
That's same as filling a 1gallon tug with 1gallon of water.
You are using the same music track so its the same numbers of seconds. aka the same number of tugs
You are now adjusting the quality of the data encoding into the bits. The same as the quality of the water.

So yes since yo are saying use exactly X times Y bits all 3 times you are getting the same result ( in size)
Your file are so "forced" to that exact size that they contains padding. aka bits that are just there to fit the size to what you specified without having any effect on the actually audio.

You can use mp3packer to remove that padding bu turning it into a VBR file (Without changing the actual audio) and then you will probably see a difference.size.

But when you ask for a specific size of file, it shouldn't be surprising that you get it...

That combined with the facts that spectrograms does not show general audio quality should answers you question.

Notice