End of the MP3? Quality still good, but far from great
2007-09-12 09:36:34
Greetings HA. Long time lurker checking in. Start with some kudos to HA for years of invaluable info and countless tips, tricks, and tweaks. Helluva resource for those of us stuck in that no man's land between the lunatic fringe of audiophilia and the bleeding edge of techno-wizardry. I don't know how actively LAME is being developed these days (I've been out of the game for a couple years), but it's been my encoder of choice since the turn of the century. I've seen rivals come and go, and after years of doomsday prophesiers predicting the death of mp3, I think it's fair to say that the King will retain his throne for a few years yet (sorry Mr Jobs, cute phone though.) Alas, the time has come for me to leave to courtyard, as my inner ear has finally won the war with my inner geek. Several years back, one of my audiophile "friends" (the snotty, technophobic variety) was offering his usual scoffs at my then "hi-tech" setup (slack linux-based digital media server paired with mid-fi audio components) when I dared him to A/B a 256kbps mp3 (encoded by the trusty old v3.93) against the original source material (standard redbook CD). He proudly agreed, and 15 frustrating (for him) yet oh so enjoyable (for yours truly) minutes later, he proudly stuck his foot in his mouth. Previous comparisons from years prior were what sold me on lossy compression in the first place. Even some lower bitrate (192kb/s) material proved challenging, if not impossible to identify. Snobby audio purism notwithstanding, if *my* ears couldn't tell the difference, then why should I care? Besides, it was such good fun to trash-talk those holier-than-thou "Audiocrats," and lord knows they made it easy every time they uttered that Audiophile gibberish ("This driver has a timbrel airiness that is simply unmatched!") Over the years, however, my own taste for high end audio grew considerably, as did my pursuit of that "perfect" sound. With each upgrade I would force myself to sample a few tracks and compare the mp3 against the waveform, as I couldn't stomach the thought of spending thousands on components only to feed them crappy source materials. And while countless auditions, purchases, and obnoxious pursuits gradually elevated my set-up to respectable reference levels, my mp3s rarely failed to stand up to empirical scrutiny. With the purchase of my current speaker line (Krix) a couple years ago, I did begin to notice some minute yet discernible differences on a few "problem" tracks, so I opted to brute force the issue by re-doing my entire collection at the best variable bit rate settings available, which proved indistinguishable from source on each of the offending tracks. Convinced that my overhauling days were over, I resigned myself to happily encode future discs at HQ VBR settings and live out my days in audio bliss. Fast forward two years, and several K in further upgrades. I've known another "quality check" has been long overdue, and with the price of storage space at historical lows, I found myself now in a decent postion to accept bad news if and when it came. So I dusted off a few CDs, ripped a few tracks and sat down for listening sessions. -Reference Gear- Signal Processing/Conversion: Xenon Chip, 16/48 Sis DAC (Xbox 360); Crystal Semiconductor CS-49400 DSP, 24/192 Wolfson DAC (Arcam AVR-250) Amplification: Arcam AVR-250; Bottlhead S.E.X. integrated tube amplifier Drivers: Krix Lyrix, Equinox, and KDX-M/C; Klipsch Forte II; Grado S60s First informal test: Sample from existing mp3 collection vs. original wavform I chose Tina's "What's love got to do with it," an all time favorite. I started with the original mp3 (encoded using and older version of LAME). Pairing the Xbox (transport only) with my AVR-250 (DSP and conversion), I listened for about 30 seconds, and couldn't imagine the waveform sounding much better. 10 seconds into the latter, I was about ready to call it day. Then the snare drum hit. Damn that sounded crisp. Don't remember it sounding that crisp in mp3 format. Switch back. I'll be damned. On the wav file I hear a tight, precise "pshhht" when the snare drum snaps. On the mp3, it sounds almost identical, but it doesn't snap. Instead there's this drawn out, wavy echo that shadows the impact, making it sound something like "pshhhuwwwwww." Not good. 2nd informal test No worries though, LAME's probably got a new version by now, I'll just download the latest (3.97) and try the highest VBR settings again. Crap, my old scripts in EAC are outdated. What happend to -alt preset? Oh I see, -V 0 it is then! Rinse and repeat. No change. Getting a little worried. Time to break out the big guns. Let's see you try that crap with -b 320, Tina. Yes! Wait, no. Shit. The artifact is much reduced at his bitrate, but still detectable. I've got to let the HA folks know. Time to register. What's this, a 5 day probation?! But I haven't even...Doh! There's got to be way around this....let's check the FAQ's....Hot damn those HA folks have gotten anal about empirics. A/B this! Show me proof noob! [Remembering I'm a scientist and that evidence is the coin of the realm], Sweet. It's about time we get an audio forum that puts proof ahead of posturing. Besides, I've got work to do. No way I'm going to drag my computer into the living room, and my laptop has no digital out. I'll have to identify the problems on my reference gear, and conduct formal testing on my PC. Bear in mind this ups the difficulty quotient considerably, as the DAC's on my bargain basement HP's onboard audio are probably leagues below the hardware in my Xbox and certainly the Arcam, not to mention that pre/post amplification duties now fall on my lowly Klipsch pro-media 4.1 (I know it's a solid multimedia set-up, but that's not saying much). Still, I'll be listening through a set of Grado S60s, the lowest entry in the Prestige Line, but some damn fine cans just the same. Assuming the cheap DAC and the Promedias don't smooth over the imperfections, I should still be able to detect the differences b/w the mp3 and the wav. Formal Test 1: VBR MP3 (-V 0) vs. Wav After downloading a clever little program that automates the A/B process (WinABX), I had to extract all samples to wav using Audacity, since WinABX simply refuses to load mp3s natively. I fire up the program, click ABX mode, and have at it. The differences are beyond pronounced. 1/1, 2/2, 3/3, 4/4...11/11. Too easy. Let's the pull the Grados and see if the differences are detectable using the mass market (I got them at Best Buy) Promedia speakers themselves. A little tougher, but still quite easy. 10/10 and I'd had enough. Regardless of which hardware I chose, the difference in the impact of the snare drum b/w the mp3 and wave file were unmistakable. ------------------------------------- WinABX v0.42 test report 09/08/2007 12:11:41 A file: I:\t\MP3 testing\What's Love -V 0clip.wav B file: I:\t\MP3 testing\What's Loveclip.wav Start position 00:00.0, end position 00:05.0 12:12:32 1/1 p=50.0% 12:12:56 2/2 p=25.0% 12:13:20 3/3 p=12.5% 12:13:39 4/4 p=6.2% 12:13:57 5/5 p=3.1% 12:14:19 6/6 p=1.6% 12:14:40 7/7 p=0.8% 12:15:00 8/8 p=0.4% 12:15:10 9/9 p=0.2% 12:15:31 10/10 p< 0.1% 12:16:01 11/11 p< 0.1% 12:16:50 reset 12:17:18 1/1 p=50.0% 12:17:22 2/2 p=25.0% 12:17:25 3/3 p=12.5% 12:17:30 4/4 p=6.2% 12:17:35 5/5 p=3.1% 12:17:39 6/6 p=1.6% 12:17:41 7/7 p=0.8% 12:17:50 8/8 p=0.4% 12:17:56 9/9 p=0.2% 12:18:07 10/10 p< 0.1% 12:18:38 test finished Formal Test 2: CBR MP3 (-b 320) vs. Wav Here's where things get interesting. Though the artifacts were barely (but consistently) detectable on my reference system, even at 320 CBR, I simply could not reproduce the distinction using my PC hardware, despite knowing exactly what to listen for. See for yourself: ------------------------------------- WinABX v0.42 test report 09/08/2007 12:23:25 A file: I:\t\MP3 testing\What's Love - b 320clip.wav B file: I:\t\MP3 testing\What's Loveclip.wav Start position 00:00.0, end position 00:05.0 12:26:02 0/1 p=100.0% 12:26:31 0/2 p=100.0% 12:26:38 reset 12:28:01 0/1 p=100.0% 12:28:05 reset 12:31:13 0/1 p=100.0% 12:31:23 0/2 p=100.0% 12:31:30 0/3 p=100.0% 12:31:34 0/4 p=100.0% 12:31:37 0/5 p=100.0% 12:31:40 0/6 p=100.0% 12:31:43 1/7 p=99.2% 12:31:45 2/8 p=96.5% 12:31:46 2/9 p=98.0% 12:31:48 3/10 p=94.5% 12:31:49 4/11 p=88.7% 12:31:55 reset 12:32:31 test finished Formal Test 3: CBR MP3 (-b 320) vs. Wav, alternate sample At that point I began wondering if any imperfections at CBR 320 were simply too minute to ascertain using anything but hi end components. So I decided to test another track. This time I chose one that had been particularly difficult years before, even prompting me to switch to higher quality encoding: Smooth Criminal by Michael Jackson. About 30 seconds in, things get pretty busy. Quick strumming bass, synthesizers, guitar, drums, and Michael's almost lispy voice. What?! Why is his voice that lispy? Bloody hell. She Ran Underneath The Table He Could See She Was Unable So She Ran Into The Bedroom She Was Struck Down, It Was Her Doom Like the snare issue in Tina's track, there's a slight loss of coherence to the lyrics on the mp3 version. The dead give away, however, is that same wavy echo, this time in between the words. What's particularly troubling is that in some cases the echo seems to *precede* the lyric itself ("So she" in line 3 above, e.g), as if the sampling algorithm gets ahead of itself. To be sure, this distinction was by no means a cinch to identify, particularly on my PC hardware. Nonetheless, when I listened through the Grado's, it became fairly easy to distinguish between the wav and the mp3. Observe: ------------------------------------- WinABX v0.42 test report 09/08/2007 12:32:34 A file: I:\t\MP3 testing\Michael Jackson - Smooth Criminalmp3clip.wav B file: I:\t\MP3 testing\Michael Jackson - Smooth Criminalclip.wav Start position 00:00.0, end position 00:08.0 12:34:42 1/1 p=50.0% 12:34:56 2/2 p=25.0% 12:35:30 3/3 p=12.5% 12:35:41 4/4 p=6.2% 12:36:36 5/5 p=3.1% 12:37:11 6/6 p=1.6% 12:37:20 7/7 p=0.8% 12:39:03 7/8 p=3.5% 12:39:09 reset 12:39:49 1/1 p=50.0% 12:40:08 2/2 p=25.0% 12:40:17 3/3 p=12.5% 12:40:25 4/4 p=6.2% 12:40:33 5/5 p=3.1% 12:40:41 6/6 p=1.6% 12:41:10 6/7 p=6.2% 12:41:27 7/8 p=3.5% 12:41:55 8/9 p=2.0% 12:42:04 9/10 p=1.1% 12:42:13 10/11 p=0.6% 12:42:43 11/12 p=0.3% 12:42:58 12/13 p=0.2% 12:43:15 13/14 p< 0.1% 12:43:27 14/15 p< 0.1% 12:43:32 test finished While these results are disappointing to say the least, I have to balance mp3s imperfections agaisnt several factors: -I only tried a few tracks -320 CBR performed quite well -Differences are fare less pronounced using mass market audio components Given these findings, it's clear that I'll have to re-rip/encode my collection using some higher bitrate codec, perhaps even lossless. No intention of canning my mp3 collection, though, since it will serve more than admirably in my DAP and car audio setup. Hopefully the LAME gurus can put these findings to good use. Who knows, there may even be a way to tweak the sampling algorithms to eliminate these flaws. I'm happy to email the tracks that I used for testing (the sample clips are quite small), and I've even done a little non-parametric (read: eyeball) analysis of the waveform pairs (mp3 vs. wav) that shows some minute, though pervasive differences in the waveform. Cheers, TP