Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: A proposal/suggestion on transcode detection (Read 6196 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

A proposal/suggestion on transcode detection

Hey guys. As I've been going over my lossless collection, I've been pondering the various tools out there that can maintain it and ensure its integrity. In my case, using FLAC, that would include things like FLACTester/AudioTester, metaflac, etc. However, there seems to be, from what I've seen, a pretty gaping hole out there, which would be a tool for the detection of lossy-sourced files (AKA transcodes).

There are some tools out there for this purpose, but at the moment, the only ones I've seen (and I've looked; but maybe there is something out there, it just ain't easy to come across) analyze CD audio. An example would be something like TauAnalyzer. So, my question would be, why is such a tool lacking for lossless audio in file form? Decoding on any modern machine isn't an especially intensive process, and it seems to me that an intelligent analyzer would only need to examine a few frames/seconds worth of data and then return (as Tau Analyzer does for example) an assessment for you.

Hey, I know this will probably fall on deaf ears. For one, I'm not quite at the technical level to make a full-on honest "proposal" as I'd like to. But perhaps I could get a few cogs turning in a few heads, I don't know. It just seems to be like something that's rather essential to ensuring the integrity and entire point of lossless, and something that shouldn't be too difficult to code, either.

Ok, hopefully we get at least a few replies, eh?

A proposal/suggestion on transcode detection

Reply #1
For lossyWAV processed files there is, in lossyWAV 1.1.0, a summary of input and output data on a codec-block basis (--blockdist) and on a sample basis (--sampledist). This is displayed as a distribution of lsb's and msb's for each. If the distribution of codec-block lsb's is not split between NUL (all zeros) or 0 (bit 0) then the file is likely not to be "natural". Equally if the distribution of sample lsb's is not approximately 0=50%; 1=25%; 2=12.5%, 3=6.25%, etc (taking into account zero samples in NUL) then the file is also likely not to be "natural".

lossyWAV processed files jump out by having, depending on processing settings, a wide distribution of codec-block lsb's and a corresponding distribution of sample lsb's.
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)

A proposal/suggestion on transcode detection

Reply #2
So, my question would be, why is such a tool lacking for lossless audio in file form?

You could try Audiochecker, which is a graphical frontend for auCDtect - not the most elegant or aesthetically pleasing app in the world, but I've tested it on a variety of material, and it seems to do a pretty good job of spotting transcodes. In my admittedly limited testing it correctly identified every lossy>lossless transcode, using a variety of codecs including LAME at every setting up to 320Kbps CBR, and even Musepack --braindead was no problem. I haven't yet tried it with lossyWAV or Wavpack lossy though.

It does sometimes flag false positives (ie it will occasionally identify a genuine lossless file as being "probably" MPEG-sourced), but I guess you'd use common sense and take such results in the context of, say, an entire album, and if only one file were suspicious it would likely be an error.

It just seems to be like something that's rather essential to ensuring the integrity and entire point of lossless, and something that shouldn't be too difficult to code, either.

I suppose if you own the original source material there'd be no real need to worry about "integrity," if by that you mean the possibility of lossy>lossless transcodes... not that I'm suggesting the primary use of such a tool would be for illegal filesharing of course. 

A proposal/suggestion on transcode detection

Reply #3
... if you own the original source material ...

You may need something like Aucdtect to tell that: A CD you buy somewhere may be a pirate copy pressed from lossy source. - Have a look on the True Audio homepage (they tell you about 25% of fake-audio CDs in a collection bought from different russian stores) - this seems to be the application they designed these tools for.

 

A proposal/suggestion on transcode detection

Reply #4
Ok, thanks for the replies. I'll look over this Audiochecker tool.

(By the way, though, not everything you don't own a physical copy of is under copyright. You can purchase lossless files online, and there are also live bootlegs, albums released for free with sharing rights, etc. Personally, I don't fully trust anything I didn't rip myself. Besides, an application for detecting transcodes is neutral as to the source!)

Ack, editing this post. I thought the program looked familiar - I've used it before. It's next to useless because of its incredible slowness. It even takes a few minutes, sometimes, simply to cancel its analysis of a file! Unbelievable. I currently have over 7,000 FLACs, and it would probably take over a week to check them all (and I'm running a fairly modern machine).

So, it seems I'm back to my original assertion; there's a gaping hole here for a tool that can detect transcodes for lossless files. Unless you're running a supercomputer, it seems.