Converting lossless to lossy
Reply #22 – 2010-08-18 21:17:19
You can't necessarily say that LossyWAV would be robust at similar bitrates to where Vorbis would be robust. That's why we wouldn't be keen to recommend q1 or q0 as your main computer audio files. • We're very confident that encoding from lossyWAV -q5 to MP3 or similar is remarkably robust compared to lossless-to-MP3 conversion, so it's a good alternative to lossless for transcoding (e.g. for your iPod or phone). For some of us, FLAC is viable on external devices like the Sansa Clip or Fuze. • We're pretty confident that lossyWAV -q2 or -q2.5 is transparent in itself, and that it has at worst fairly subtle issues used as a transcoding source. • We're confident that at -q1 or -q0 a little extra audible noise or sometimes noise-modulation can start to be noticed in careful listening for at least some samples, though the quality is fine for listening in noisy environments or at quiet levels. LossyWAV doesn't (yet) try highly advanced methods to hide the noise added when reducing the audio bitrate that can be used by transform codecs such as MP3, Vorbis or AAC. Transform codecs use special mathematical transforms to represent the data in another form (effectively a frequency representation) where knowledge about how the sounds mask the audibility of noise to the human ear can be used to permit more noise by storing with less accuracy or omitting (filtering out) certain frequency ranges altogether. LossyWAV currently does relatively simple analysis. It never filters out frequencies from the audio signal. It never alters the timing of impulses or smears impulses. It currently doesn't aim to exploit temporal masking either (which it could implement, potentially opening the door to pre-echo artifacts if the psymodel is incorrect). It generally attempts to add a fairly broad-spectrum noise below the level of audibility by reducing bit depth in a way that lossless encoders can exploit. Some problems that arise in lossy-to-lossy transcoding come from an accumulation of similar effects such as time-smearing that are benign when applied by one lossy encode because they fall within the temporal maskng curve, for example, but when applied again in cumulative fashion by the second lossy encode can become audibly different from the original and are no longer masked from audibility. Out of simplicity, imagine that you could hear a 14 dB change to feature X (which is perhaps 25 milliseconds before a sudden drum-hit) but not a 8 dB change. Assume that your first lossy encoder can save bits by rounding up its representation of feature X by +7.5 dB. This would not be noticeably different to your ear. Encoder 2, when supplied with the original lossless audio, might decide to represent feature X +7.0 dB louder. Against this would no be noticeably different to your ear. If instead of supplying Encoder 2 with the original lossless audio, you instead transcode by supplying it with the decoded output of encoder 1, and the second lossy encoder decides it can save bits by rounding up feature X by +6.8 dB from what it was fed with. After transcoding, Feature X wouldn't necessarily be audibly different from the output of Encoder 1, but it would be 14.3 dB different to the original lossless audio, making it become audibly different to the lossless original audio. This is a crude representation of how an effect like temporal smearing or pre-echo can become a problem when transcoding. (I'm not claiming that any of the figures used are especially representative of real transcoding artifact generation). Encoding to MP3, Vorbis, AAC, WMA or Musepack, most masking effects such as these tend to be exploited fairly fully when encoding, so transcoding with two such encoding steps may well cause occasions when two changes in the same direction become audibly different from the original, and because the second encoder doesn't know what the original was, it can't base its decisions on the qualities of the original audio. LossyWAV doesn't currently cause temporal smearing or exploit temporal masking, so tends to be more benign in these areas when used as a source for transcoding.