Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: How to find out if a WAV has been generated from a MP3 (Read 18180 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

How to find out if a WAV has been generated from a MP3

I hope it's the right section

Is there any more or less hidden tag that let one understand if a wave has been ripped from an audio cd, or from an mp3 for don't-indagate-further-why reasons?


How to find out if a WAV has been generated from a MP3

Reply #2
Thanks, exactly THE SAME issue.
However, that technical discussion is a bit too much for me.
From what I understood, one should analyze some parts with percussion drums.
This song (celtic-medieval genre) is fully made up of percussions.
Here are the goldwave spectrum analyzers... they look so similar:

5ms steps

Wave which was in a remote corner of my hard disk, it should be supposed to come from the original cd laying at my home atm:


mp3 128kbps:


----

Another closer comparison:

How to find out if a WAV has been generated from a MP3

Reply #3
That's called waveform, not spectrum.

How to find out if a WAV has been generated from a MP3

Reply #4
Sorry, I didn't know.
Please can you suggest me hoe to extrapolate something meaningful from Goldwave?

How to find out if a WAV has been generated from a MP3

Reply #5
I found SPEAR and...

0-15k circa, frequencies above 15k are almost absent.

wav


mp3

How to find out if a WAV has been generated from a MP3

Reply #6
CD-audio at 44.1 khz samplerate can carry frequencies up to 22.05 khz. Spectral display of waveforms can show the existing frequencies.
Here is a cd-ripped clip at full quality. Notice the peaks where the frequencies go all the way up to 22.05 khz.


For comparison, here is the same audio clip, encoded to Lame ABR 256 (I don't remember which Lame version). Note that most of the audio is cut off at around 16 khz, with the peak-points going up to 18 or 19 khz.



I sometimes check mp3's that I purchase from Amazon or eMusic to see if they were ripped from an already-lossy source (so, functionally transcoded).
Amazon and eMusic use high-bitrate mp3 encoding, usually Lame V2 or V0 (and sometimes Amazon uses 256 CBR or Lame ABR at around 245kbps).
These will keep some frequencies above 16 khz.
So if I see a Lame -V0 file with lowpass at 16 khz, I can conclude that the lowpass that low is not a result of the -V0 encode (unless some funky switches were added to the Lame encode), but rather of the source material - it is likely that the Lame -V0 encode was itself transcoded from a lossy source that had a 16 khz cutoff.

For example, here is a song by Dragonforce that I recently purchased from eMusic. This is power metal, and ~240 kbps mp3 files won't have a lowpass at 16 khz. So the 16 khz cutoff probably means that the eMusic encode was itself encoded from an already-lossy source.


To be sure, I purchased the same song from Amazon, encoded at 256 kbps CBR. And there are lots of places where frequencies exceed 16 khz, indicating that a legitimate audio-cd source material doesn't have a 16 khz cutoff.
God kills a kitten every time you encode with CBR 320

How to find out if a WAV has been generated from a MP3

Reply #7
Tim, that's really, really interesting and it's quite easy to understand, many thanks.
Some time ago I already have read that mp3 usually get a frequency cutoff (especially when they are transcoded, I'd say), and the critical point is usually at 16k.
My graphs look similar to your third one.
Saying this, I'd state that my fake lossless song had been probably an mp3, and had been converted to wav sometimes in the past.

How to find out if a WAV has been generated from a MP3

Reply #8
Glad that's useful. If your graphs look like my 3rd, one (with complete cutoff of frequencies above 16 khz) then they were definitely sourced from low-mid quality lossy audio.

note that in the spectral views I presented only show one channel because R and L channels are almost always equivalent in terms of frequency cutoff (unless the encoded music has major differences between R and L channel) and I can make the point easily enough with an image that takes up less space if I only show one channel.

a few more comments:

1) the common cutoff around 16 khz is there for a reason, because most people can't hear above that frequency (especially when there's other music at lower frequencies going on - i.e., they may be able to hear a 16 khz tone, but not hear the difference in actual music when it is 16 khz lowpassed. Also, high-frequency stuff is disproportionately difficult to encode (uses a lot of bits) so there's more reason for wanting to lowpass.

2) fake lossless or fake high-quality mp3's (sourced from lower-quality compressed-audio files) are somewhat common. there are a number of different reasons/sources for this:
  • people trying to sell something that's not as high-quality as it actually is
  • accidental error on the part of the original purveyor, whether individual or a record company in service of iTunes or Amazon or eMusic, ripping from a cd that itself was created from mp3 files
  • users who mistakenly think that transcoding mp3's to lossless or a higher bitrate improves quality
  • some may be record companies intentionally injecting lower-quality stuff into file-sharing networks as a disincentive for people to use such networks
God kills a kitten every time you encode with CBR 320

How to find out if a WAV has been generated from a MP3

Reply #9
Glad that's useful. If your graphs look like my 3rd, one (with complete cutoff of frequencies above 16 khz) then they were definitely sourced from low-mid quality lossy audio.

This is simply not true!  I could have sampled something at 32kHz and then resampled before mastering to CD.  I could have sampled something from FM radio.  I could have simply put something through a low-pass filter.

How to find out if a WAV has been generated from a MP3

Reply #10
Good point - sorry for the blanket statement. I should have said "they were likely sourced from low-mid quality lossy audio."
There are, as you say, other ways to get that lowpass signature. But I would wager that the vast majority of ostensibly-lossy files out there, that show a complete cutoff of frequencies above 16 khz, are sourced from lossy audio.
God kills a kitten every time you encode with CBR 320

How to find out if a WAV has been generated from a MP3

Reply #11
There's a reason why I suggest a spectral view (color-coded frequency amplitude vs. time) instead of simple frequency analysis (amplitude vs. frequency).  A spectral view allows one to see holes in the spectrum (through zooming) which says a lot more than just a low-pass cutoff frequency.

It is your second plot which is the most damning, but even if the codec is doing nothing above 16kHz, you will still likely see holes in the spectrum below this frequency around transient events.

It looks like you're using EAC to show these plots.  I never liked it much because it still shows blue in areas where there really is nothing (should be black).  I much prefer Adobe Audition.  I saw some images recently that appeared to be created with Sox that looked more like Audition.

How to find out if a WAV has been generated from a MP3

Reply #12
Unfortunately I don't have Audition, I'll take a look at Sox CLI in the next weeks, it seems like it can be integrated with gnuplot, sounds interesting.
I tried with EAC, and the difference between the fake lossless file I tought it was ripped from cd, and a real lossless file (Aerosmith) is pretty clear:



How to find out if a WAV has been generated from a MP3

Reply #13
I could give you the same result by doing any of the things I suggested in my first reply to timcupery which would have given you the same looking graphs.  None of those things would be accurately characterized as "fake lossless".

How to find out if a WAV has been generated from a MP3

Reply #14
It is your second plot which is the most damning, but even if the codec is doing nothing above 16kHz, you will still likely see holes in the spectrum below this frequency around transient events.

The second plot is more clearly damning because it is zoomed in on a short section of the song, rather than over some minutes.

It looks like you're using EAC to show these plots.  I never liked it much because it still shows blue in areas where there really is nothing (should be black).  I much prefer Adobe Audition.  I saw some images recently that appeared to be created with Sox that looked more like Audition.

I don't interpret the blue as "real" frequencies.
I'm not sure what the blue is showing, but it's something. It may be showing frequency of dithered noise. When an mp3 with a 16 khz cutoff is dithered or added-noise-shaping upon decode, EAC will show entirely blue above the 16 khz. I think this is a strategy for disguising fake "lossless" files so that they appear to show the entire spectrum.
God kills a kitten every time you encode with CBR 320

How to find out if a WAV has been generated from a MP3

Reply #15
Of course, I have read what you said, greynol, there will always be a doubt because somebody could have applied a lowpass filter, but at least now I have a starting point to work with.
I didn't even know that lossy files in general had a cut at a certain frequency since some days

How to find out if a WAV has been generated from a MP3

Reply #16
I could give you the same result by doing any of the things I suggested in my first reply to timcupery which would have given you the same looking graphs.  None of those things would be accurately characterized as "fake lossless".

I think all flapane was saying in his most recent post was there is that the lossless-format file wasn't ripped from the original cd source (and he shows the original cd source in his second graph). So the first graph (from the lossless-format file) had gone through some sort of processing prior to being encoded to lossless.

flapane didn't say what sort of processing this was; I was the one who'd made a blanket statment earlier along these lines.
However, note that I still was willing to say it is likely that such 16khz-cutoff "lossless" files were originally from mp3 or other lossy source.
God kills a kitten every time you encode with CBR 320

How to find out if a WAV has been generated from a MP3

Reply #17
I'm not sure what the blue is showing, but it's something.

I just took a track from a CD, resmapled to 32kHz and then back to 44.1kHz.  EAC's spectral view showed lots of stuff in blue above 16kHz where there is literally nothing.  So whatever EAC is showing is not real; not in any way shape or form.

Don't take my word, try it yourself.


Of course, I have read what you said, greynol, there will always be a doubt because somebody could have applied a lowpass filter, but at least now I have a starting point to work with.

That's the thing, spectral views show so much more; all you have to do is zoom in.  You'll usually be able to distinguish lossy encoding from a simple low-pass this way.

I didn't even know that lossy files in general had a cut at a certain frequency since some days

They don't have to.  320kbit won't necessarily cut at 16kHz, but go just as high as plenty of "lossless" discs (which do not have to go all the way up to 22.05kHz and often don't).

How to find out if a WAV has been generated from a MP3

Reply #18
greynol, thanks for taking the time to clarify things.
especially that there are other ways beyond frequency distribution by which to distinguish lossy compression.

with regards to EAC's audio editor and what is the blue.
I converted a full-spectrum wav file to 32 khz, and then resampled back to 44.1 first without dither, then with dither.
The dithered version fills in the entire upper spectrum with blue, while the non-dithered shows only a little blue. (to be clear, I dithered not on the conversion from 44.1 to 32 khz, but only on the way back from 32 khz to 44.1)
So I'll go back to: the blue is showing something. I'm just less sure what it is than I was before.

original wav:


32 khz resample:


32 khz resample with dither



Any ideas?

EAC's documentation says very little about their wav editor. The only info I can find about the color in the spectral display is in a documentation PDF on EAC's website:
Quote
Further the color of a point describe the amplitude of each frequency band.
God kills a kitten every time you encode with CBR 320

How to find out if a WAV has been generated from a MP3

Reply #19
Another data point to add to the puzzle of EAC's blue.
Often when I decode an mp3 from an "oldies" recording, where actual audio data doesn't go nearly to 16 khz, the spectral analysis in EAC shows blue all the way up to 16 khz, and nothing (relatively) above. Example from a Glen Miller song (from Lame 3.97 V0 track purchased from Amazon):


God kills a kitten every time you encode with CBR 320

How to find out if a WAV has been generated from a MP3

Reply #20
Some time ago I already have read that mp3 usually get a frequency cutoff (especially when they are transcoded, I'd say), and the critical point is usually at 16k.

In my experiments, in mp3 CBR 320 kbit/s it was at 20 kHz; in mp3 CBR 256 kbit/s it was at 19,0-19,2 kHz.


How to find out if a WAV has been generated from a MP3

Reply #22
I don't know if it's relevant, and whether it's an idiosyncracy of EAC or a general trait of lossy/lowpassed audio, but in those plots much of the spectrum above the cutoff seems to be an attenuated 'mirror image' of the lower frequencies.

How to find out if a WAV has been generated from a MP3

Reply #23
You do realize that your results are specific to a particular codec, its version, and settings used, correct?

Certainly I do. Certainly they ARE. As ANY setting used of ANY particular codec of a format, ANY version chosen etc., ARE SPECIFIC for an encoding done.

How to find out if a WAV has been generated from a MP3

Reply #24
You do realize that your results are specific to a particular codec, its version, and settings used, correct?

yeah. and also to the source material.
God kills a kitten every time you encode with CBR 320