I start using AAC just several days ago, before that, I use lame V2 setting to compress my music collection.
I came across this CD, which I can't not understand the spectrogram. See attached png file and flac file.
I use foobar and nero AAC setting -q0.55 converting to .m4a file. The spectrogram shows that nero like to cut those strong peaks, while preserves more weaker peaks. I can't understand this, I thought the peaks should be given more bitrates? Am I wrong?
(http://i64.tinypic.com/axyxrk.png)
Louder sounds -> more masking of higher frequencies, and Nero psycho-acoustic model probably decided that it can cut them more aggressively.
Spectrograms do not show bitrates. Some sections may be low bitrate, others high bitrate, but you can't tell by looking at this graph.
As for the circled areas... the loud (green/yellow) content in the lower middle frequencies, where your ear is very sensitive, is probably completely masking (obscuring) the quiet (purple) content in the 14-16 kHz band, so the encoder has wisely chosen not to waste bits preserving those highest frequencies... the louder parts need it more.
Louder sounds -> more masking of higher frequencies, and Nero psycho-acoustic model probably decided that it can cut them more aggressively.
I have to say that nero's aac model is very wierd, at least from the spectrogram, though I know spectrogram not really tells the quality, but it makes me worried that it will cut something that should not be cutted.
While I tested other encoders, for example apple's aac using qaac doesn't show this behaviour even down to quality q64 corresponds to 128kbps. Apple just doesn't cut.
(http://s22.postimg.org/fp4lob29t/2016_01_31_185057.png)
And lame's spectrogram will start to show gradation in high frequency when setting to v5 , but it will carefully preserve the peaks.
(http://s12.postimg.org/68d1ni3xp/2016_01_31_184817.png)
The spectrogram tells you nothing about how well compressed audio is. If you're worried, don't be.
That piano recording from 1951 has nothing above 12 kHz (or below 50 Hz) except noise (tape hiss). The audio is also mono and is mostly tonal and simple (not too many simultaneous or quickly changing tones), so it very comfortably fits in a lower bitrate than what you are using. The encoder has a lot of extra room to encode the quieter parts, i.e. the tape hiss, but it is a tradeoff... when the lower, louder components get more complex, the encoder needs to concentrate on those parts, especially if the psychoacoustic model predicts that the higher frequencies are masked.
If the hiss weren't so strong, the encoder may not even be tempted to preserve it at all. Also, different encoders will make different decisions as to how high of frequencies they can preserve... you can't know whether the tradeoff is acceptable except by listening, though I would trust that if it is cutting off quiet high frequencies, it's because it needs the bits for the loud lows & mids. In any case, if you can't hear a difference between the original and the lossy, then the lossy is, in effect, lossless and perfect, no matter what the spectrogram looks like.
Also realize that your spectrogram view is linear, but the pitches and your hearing are logarithmic... so in effect, your spectrogram is devoting half its space (the entire top half, and then some) to inaudible noise. That, combined with its use of a bright palette of colors even for the quietest audio, means you're getting an intensely exaggerated view that overemphasizes the least audible parts. The blue/indigo/purple parts of your graph are at or barely above the threshold of hearing, and that's if you're in a deathly quiet room with the volume cranked as loud as you can play the piano parts without hurting your ears.
To get a better visual sense of what you're hearing, and how much of a non-issue the highest frequencies are, attached is a fully logarithmic view in Adobe Audition, devoting very little screen real estate to the top end. It also wisely uses much dimmer colors for the quieter sounds. The analyzed audio is encoded with Nero AAC with q=0.54. The top half of your spectrogram is the top one-sixth of this one.
you can't know whether the tradeoff is acceptable except by listening
This.
People seem to want so desperately to have some easy way to choose their format, codec and settings without bothering to do it the right (https://hydrogenaud.io/index.php/topic,16295.0.html) way.
Thank you so much for your patient and detailed answer. Especially for letting me know "the pitches and your hearing are logarithmic." I didn't know this before. By the way, I also tried adobe audition. But is there a color bar to show the exact correspondence between brightness and intensity values? I can not find such an option via google.
Unfortunately, no, there's no way to customize the colors in Audition's spectrum view, or to know what dB each color represents. You can only adjust the scales (https://helpx.adobe.com/audition/using/displaying-audio-waveform-editor.html#customize_the_spectral_display) and certain analysis parameters. The view is more for showing relative/contextual frequency info so you can visually spot transient noise and target certain frequency bands (e.g. you can paint with the auto-healing tool right in that window, and/or click-drag to draw a box to confine all actions, including playback, to a specific frequency range).
but it makes me worried that it will cut something that should not be cutted.
AAC is
lossy compression. It's going to throw-away
some data.
If you're not happy about that, you'll have to use a lossless format.
Unless you can
hear a difference in an ABX test, I'd assume it's doing it's job and throwing-away the
right data.
If you can hear a difference in an ABX test (and if that difference is not acceptable to you), you can try a higher bitrate, try a different encoder, or go lossless.
Someone
could make a lossy encoder that produces better
looking waveforms by throwing away stuff you can't see instead of stuff you can't hear... But, the audio performance would likely be worse.
Thank all the kind replies. I learned a lot.