Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: True FLAC vs. Fake FLAC (Read 90101 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

True FLAC vs. Fake FLAC

Reply #51
3 things:

1. hi I am new

2. I recently stumbled across a paper which details an algorithm that, with a very high success rate, guess the bit rate of an audio file just using data from the file's high-frequency spectrum. If developed further it could remove the need to visually inspect a spectrogram etc. and would be much faster.

http://www.fileden.com/files/2009/2/14/232...20Frequency.pdf


Quote
In order to obtain the feature data, the source MP3 files were each
decompressed into a 1411 kbps WAV file using the Fraunhofer
IIS MP3 Surround Commandline Decoder V1.4 [2]. This was
done because audio files in this format can easily be read into
MATLAB, and as we have demonstrated, transcoding to a higher
bit rate does not affect the frequency characteristics of the audio
which we are observing.


It is, in my opinion, not a good sign when the author of a paper does not understand that WAV is a lossless format and so resorts to arguing that "transcoding" to PCM probably doesn't change the audio.  Regardless, all that paper demonstrates is that if you know that LAME 3.97 was used with default lowpass for each bitrate, you can figure out the source bitrate by looking at the lowpass setting. 


True FLAC vs. Fake FLAC

Reply #52
It is, in my opinion, not a good sign when the author of a paper does not understand that WAV is a lossless format and so resorts to arguing that "transcoding" to PCM probably doesn't change the audio.  Regardless, all that paper demonstrates is that if you know that LAME 3.97 was used with default lowpass for each bitrate, you can figure out the source bitrate by looking at the lowpass setting.

So does the paper actually not do what it claims it does? I don't see any reliance on prior knowledge concerning the "history" of the file in question.

True FLAC vs. Fake FLAC

Reply #53
So does the paper actually not do what it claims it does?


It does what they claim, take a known encoder and version and then determine what bitrate was used.  It doesn't do what people in this thread are interested in though.

I don't see any reliance on prior knowledge concerning the "history" of the file in question.


Suggest reading section 2, "procedure".  They train their model using the same encoder and settings they will then attempt to detect.  Without this the system is useless.

True FLAC vs. Fake FLAC

Reply #54
Quote
It doesn't do what people in this thread are interested in though.


once you have an algorithm which can estimate original bit rates of a transcoded lossless format, getting a yes/no answer to the question "are my flacs 'real'?" seems trivial.


Quote
Suggest reading section 2, "procedure". They train their model using the same encoder and settings they will then attempt to detect. Without this the system is useless.

Correct.

I still don't see any reliance on prior knowledge concerning the "history" of the file in question. Of course any algorithm of this nature needs information about what it is looking for!

True FLAC vs. Fake FLAC

Reply #55
Quote
It doesn't do what people in this thread are interested in though.


once you have an algorithm which can estimate original bit rates of a transcoded lossless format, getting a yes/no answer to the question "are my flacs 'real'?" seems trivial.


If you know that the file was encoded with a given lame version, then you already know the answer to the question  "are my flacs that I've created from my LAME mp3s 'real'" is "No". 

That said you are correct that determining if the output of a given LAME version will be lossy is quite trivial. 

Quote
Suggest reading section 2, "procedure". They train their model using the same encoder and settings they will then attempt to detect. Without this the system is useless.

Correct.

I still don't see any reliance on prior knowledge concerning the "history" of the file in question. Of course any algorithm of this nature needs information about what it is looking for!


As I said above, the prior knowledge is the encoder and settings (aside from bitrate) used to create the file in question.

True FLAC vs. Fake FLAC

Reply #56
Quote
If you know that the file was encoded with a given lame version, then you already know the answer to the question "are my flacs that I've created from my LAME mp3s 'real'" is "No".


Yes but surely the OP was referring to a situation where the file history is unknown. (Suggest reading OP.) Even in this case the algorithm should be able to take arbitrary WAVs and, if they are indeed transcodes, guess their original bitrate with a great deal of accuracy.


Quote
As I said above, the prior knowledge is the encoder and settings (aside from bitrate) used to create the file in question.

No. The prior knowledge is the information about the frequency characteristics of lame-encoded mp3s. What I am saying is that once you have trained the algorithm, you can then take arbitrary WAVs (or flacs, or whatever) and use the algorithm on them. This is pretty standard. Train an algorithm with a set of inputs, then give it arbitrary inputs and see how it does. 100% accuracy cannot be expected.

True FLAC vs. Fake FLAC

Reply #57
Yes but surely the OP was referring to a situation where the file history is unknown.


In this case you cannot use this software.  The authors have demonstrated identification of files that are known a priori to be transcoded by incorporating that knowledge into their algorithm.  Hence my point above that its not useful for what people in this thread want to do.

(Suggest reading OP.)


No need to get angry at me.  I'm not attacking you, I'm just trying to lead you towards an understanding of why what you are proposing does not work.

Even in this case the algorithm should be able to take arbitrary WAVs


It should?  The authors certainly haven't demonstrated that.  In fact they are quite clear that they have not chosen arbitrary wav files. 

Quote
As I said above, the prior knowledge is the encoder and settings (aside from bitrate) used to create the file in question.

No. The prior knowledge is the information about the frequency characteristics of lame-encoded mp3s.


I'm not being condescending, but read more carefully, there is a LOT more prior information being used here.  The procedure explains that the training set and the unknown set were encoded with identical settings and encoder.  This is not by chance. The authors have not accidentally made their problem extremely easy compared to the one you want to solve. 

What I am saying is that once you have trained the algorithm, you can then take arbitrary WAVs (or flacs, or whatever) and use the algorithm on them.


Ignoring for a moment what is actually going on, if this actually worked, why do you think the authors decided not to show that this was possible?  Perhaps they were concerned about making their paper too exciting

True FLAC vs. Fake FLAC

Reply #58
I've seen examples of both false positives and false negatives.

I guess when a full CD is identified as CDDA it's safe to consider it an original and not an MP3 reconstruct. This might be the reason why Tau Analyser only works for a full CD and not an individual file.

OK, i'll go a bit off topic for the last time (i hope  ). I have receive the answer from Qobuz. They have checked the file and think at has been through some MPA compression. They will ask the producer for a true original and offered me a free album to compensate. They were pretty quick to react too. Good point for them

True FLAC vs. Fake FLAC

Reply #59
I have receive the answer from Qobuz. They have checked the file and think at has been through some MPA compression. They will ask the producer for a true original and offered me a free album to compensate. They were pretty quick to react too. Good point for them

Many people, even many musicians, simply can't hear the difference between a lossy version and the original, which shouldn't be surprising, given the robustness of the lossy formats and all the listening tests we're familiar with. Some/many are also just not very technologically savvy. It really would not surprise me to find out, then, that artists or their representatives wouldn't necessarily even know that MP3/AAC/whatever is lossy at all, or recognize that once something is lossily encoded, there's no going back, even if they have a converter that turns their MP3s back into the WAVs or AIFFs needed by their labels and distributors. That's one of the reasons transcodes happen in general; people think "if higher bitrates mean higher quality, then I'll just convert this 128 kbps MP3 to a 320 kbps one! or maybe I'll just convert it to WAV and it'll be perfect!"

True FLAC vs. Fake FLAC

Reply #60
Quote
Some/many are also just not very technologically savvy.

Unfortunately, this is very true.
I have contacted several artists that I follow on Soundcloud about this.
Knowledge of lossy/lossless encoding is not a prerequisite to creating music, and is sometimes not learned.

True FLAC vs. Fake FLAC

Reply #61
And then, there are people who are not even musicians or technicians involved at some point.
Hopefully, it will change now that compagnies selling FLAC start to appear throughout the web. Unfortunately there is still too few poeple who are well aware of the issue to be careful about what they buy.
1 year ago i was still ripping to MP3 and burning those back on CDs when i wanted to copy a disc! 

It's up to us to spread the word now

True FLAC vs. Fake FLAC

Reply #62
I have noticed that mp3s commonly add a large error to the input signal - when the error is estimated visually by looking at waveforms, not if you are listening to the decoded file (what the format is made for really).

In my simplified understanding, this can be interpreted as signal-dependent narrow-band noise insertion (psy-model guided quantization of subbands), and perhaps phase-error? Are there no known mechanisms to guesstimate that such an error was inserted at one stage? I believe that natural music commonly contain spectrally sparse content (pure harmonic waveforms) or temporally coherent impulses (at least when rising in level). Can one not search for such things in a file, and find traces commonly attributed to mp3 encoding and with little chance of being generated via other means?

-k

True FLAC vs. Fake FLAC

Reply #63
On exceptionally pure tone-like signals you can see the shape of the codec's noise skirting it.

I can't think of many recordings that contain spectrally sparse content. I went looking for some once when trying to assess the audibility of distortion. Even a solo flute or violin or piano is too rich to spot masking noise at high bitrates, and with most pop, rock, jazz etc you can forget it completely.

Pure impulses are a great test signal for ID-ing codecs, but only synthetic signals are known to be clean. With anything else, the pre-echo is usually lost in the other instruments. You could find it in isolation in some recordings, and maybe make a judgement that nothing else had caused it, but it doesn't sound like something you could automate.

If the recording you want to check contains no ultra-pure tone and no isolated impulse-like sounds, then this task is impossible IMO. You can't see the coding noise in the coded version when looking in the waveform view or the spectral view. Apart from the low pass, some of the common lossy distortions can be easier to hear than see.

Cheers,
David.

True FLAC vs. Fake FLAC

Reply #64
There used to be a freeware DOS command line/console program that could examine a WAV and tell if it came from an MP3.

It's called AuCDtect and does a spectrum analysis looking for patters introduced by lossy compression. It works very well, you'll hardly ever get a false negative. There a windows frontend called Tau Analyzer and even a foobar plugin, all avaiable here: http://en.true-audio.com


OK thanks very much for that.  I haven't been able to find that program for years!
opinion is not fact

True FLAC vs. Fake FLAC

Reply #65
By false positive I mean that the file is fine, but detected as lossy.

Tau Analyser uses the same engine as auCDtect, so you can decode to wav and use auCDtect directly or automate it with fooCDtect or another frontend.


Where can i download fooCDtect?


True FLAC vs. Fake FLAC

Reply #67
I know the topic title sounds absolutely absurd but hear me out. I've tested my FLAC collection by encoding them into V0 MP3. I then decoded the MP3 back to WAV and then compressed it in FLAC. My question is: unless you've ripped the files yourself, how would you know the FLAC file you have is ACTUALLY lossless instead of an MP3 converted into FLAC? I've used the TEST option in FLAC frontend and it doesn't give a result. I have used Audiotester and it does say the file failed because it's TRUNCATED.

Bottom-line: Is there a sure-fire way of knowing that a FLAC file is truly lossless and not a derivative of a lossy file?


If the flacs you have correspond to a complete album you can test it with CUETools. If you have the flacs and a ".cue" file, CUETools will tell you if the whole rip correpsonds to a rip in the CTDB or AccurateRip databases or not. If it does, then I think you can be sure it's all genuine. If you don't have the ".cue" file, sometimes CUETools will also be able to check, sometimes it won't be able.

True FLAC vs. Fake FLAC

Reply #68
I have two J.S. Bach cds (Das Wohltemperierte Klavier I and II, Leonhardt, Harmonia Mundi/BMG Classics) which I believe to be legitimate, and sound perfectly.

Yet Audiochecker finds them to be 99% MPEG. It may have something to do with being a single instrument (harpsichord), so they may have used a lowpass filter to reduce noise in the higher frequencies where a cembalo is not supposed to be.

So we can't trust checking software completely.

True FLAC vs. Fake FLAC

Reply #69
I have two J.S. Bach cds (Das Wohltemperierte Klavier I and II, Leonhardt, Harmonia Mundi/BMG Classics) which I believe to be legitimate, and sound perfectly.

Yet Audiochecker finds them to be 99% MPEG. It may have something to do with being a single instrument (harpsichord), so they may have used a lowpass filter to reduce noise in the higher frequencies where a cembalo is not supposed to be.

So we can't trust checking software completely.

None around here has ever told to thrust this kind of software completely, in the first place.
All the more, generally harpsichord is one of the most challenging instruments for lossy codec and very revealing for humans in ABX tests, but those are analog recordings from late sixties or early seventies, so I think they could hardly be considered a valid reference.
... I live by long distance.

True FLAC vs. Fake FLAC

Reply #70
Can anybody tell if these are both fake:


but "aucdtect" reports mpeg 95% on the 2nd one too

I used "spectro" to make the screenshots


 

True FLAC vs. Fake FLAC

Reply #72
And as pointed out earlier by folks, lowpass does not equal lossy encoding, I tend to filter out any audio I do not find valuable to the work I'm creating, in that case the master could actually be flagged as being "lossily encoded" etc.

True FLAC vs. Fake FLAC

Reply #73
One other reason I like engineers who use the PM2 A-D,  you can't have an MP3 HDCD file.

But there is a program that was good at detecting lossy gens call Audio Checker v1.2 by dester.