Audio identification

Topic: Audio identification (Read 17731 times) previous topic - next topic

0 Members and 1 Guest are viewing this topic.

Audio identification

2007-01-10 01:02:43

I recently heard on the radio about a technology used to prevent piracy on video distribution sites being developed that attempts to identify and match audio with one that is a copyrighted work. I was wondering what the effects of differing mastering and lossy codecs could have on these systems. Also, do these also work using a psychoacoustic model?

Audio identification

Reply #1 – 2007-01-10 02:48:43

I think they use Morse code in a very high pitched whistle to spell out the name of the work plus a serial number.

Audio identification

Reply #2 – 2007-01-10 03:14:33

http://en.wikipedia.org/wiki/Acoustic_fingerprint

Audio identification

Reply #3 – 2007-01-10 03:36:30

A friend summarized an article in [H]ardOCP about audio fingerprinting. The RIAA envisions a market where each CD has a unique fingerprint so to be traced to specific stores and purchase dates. All they need then is InterPol's research baby--an advanced video storage system that has built in facial recognition.
The RIAA gives media pirates a bad rep'--they don't even have eye patches.

Audio identification

Reply #4 – 2007-01-10 17:29:07

Quote from: SamHain86 on 2007-01-10 03:36:30

A friend summarized an article in [H]ardOCP about audio fingerprinting. The RIAA envisions a market where each CD has a unique fingerprint so to be traced to specific stores and purchase dates. All they need then is InterPol's research baby--an advanced video storage system that has built in facial recognition.

How? Steganography injected watermarking noise into the audiotracks? With mere encode to a lossy format, those watermarking will disappear.

Audio identification

Reply #5 – 2007-01-12 21:05:55

Quote from: pepoluan on 2007-01-10 17:29:07

How? Steganography injected watermarking noise into the audiotracks? With mere encode to a lossy format, those watermarking will disappear.

What about you first google a bit instead of assuming such things ?
If you had just followed the wikipedia link before answering, you might
have learned that "Audio compression techniques (MP3, WMA, Vorbis, etc.)
are also generally based on perceptual characteristics, and a robust acoustic
fingerprint will allow a recording to be identified after it has gone through
such compression, even if the audio quality has been reduced somewhat.
Robust acoustic fingerprints are also immune to analog transmission
artifacts, enabling radio broadcasts to be identified."

Audio identification

Reply #6 – 2007-01-12 21:47:02

How about these requirements for digital cinema video and audio watermarking (from the DCI Digital Cinema System Specification v1:

Quote

9.4.6.1.2. Image/Picture Survivability Requirements

• Image Forensic Marking is required to be visually transparent to the critical
viewer in butterfly tests for motion image content.

• Is required to survive video processing attacks, such as digital-to-analog-digital
conversions (including multiple D-A/A-D conversions), re-sampling and
re-quantization (including dithering and recompression) and common signal
enhancements to image contrast and color.

• Is required to survive attacks, including resizing, letterboxing, aperture
control, low-pass filtering and anti-aliasing, brick wall filtering, digital video
noise reduction filtering, frame-swapping, compression, scaling, cropping,
overwriting, the addition of noise and other transformations.

• Is required to survive collusion, the combining of multiple videos in the
attempt to make a different fingerprint or to remove it.

• Is required to survive format conversion, the changing of frequencies and
spatial resolution among, for example, NTSC, PAL and SECAM, into another
and vice versa.

• Is required to survive horizontal and vertical shifting.

• Is required to survive arbitrary scaling (aspect ratio is not necessarily
constant).

• Is required to survive camcorder capture and low bit rate compression (e.g.
500 Kbps H264, 1.1 Mbps MPEG-1).

9.4.6.1.3. Audio Survivability Requirements

• Audio Forensic Mark is required be inaudible in critical listening A/B tests.

• The embedded signal is required to survive multiple Digital/Analog and
Analog/Digital conversions.

• Is required to survive radio frequency or infrared transmissions within the
theater.

• Is required to survive any combination of captured channels.

• Is required to survive resampling and down conversion of channels.

• Is required to survive time compression/expansion with pitch shift and pitch
preserved.

• Is required to survive linear speed changes within 10% and pitch-invariant
time scaling within 4%.

• Is required to survive data reduction coding.

• Is required to survive nonlinear amplitude compression.

• Is required to survive additive or multiplicative noise.

• Is required to survive frequency response distortion such as equalization.

• Is required to survive addition of echo.

• Is required to survive band-pass filtering.

• Is required to survive flutter and wow.

• Is required to survive overdubbing.

Audio identification

Reply #7 – 2007-01-12 22:22:30

Yeah, that's a good idea. Treating (soon to be ex-)customers as criminals MUST be the solution. The money for developing this stuff would be better spent as price reductions for the consumer. But no, on top of that, we even slap DRM on online purchased files and charge nearly as much as for the CD (but we save the cost for printing, storing etc). Mm, this goes slightly off-topic...

However, as a technie, I would be interested in how exactly such watermarking would work. But to no avail, I guess - security through obscurity at its best.

Audio identification

Reply #8 – 2007-01-12 23:53:09

Quote from: ImAlive on 2007-01-12 22:22:30

The money for developing this stuff would be better spent as price reductions for the consumer. But no, on top of that, we even slap DRM on online purchased files and charge nearly as much as for the CD (but we save the cost for printing, storing etc).

Goes to show how record companies are creating the atmosphere for people to steal their product. If an artist is good: I will buy their CD. It is more likely for me to see them in concert. I will support an artist that way---not support the people that produce their label's products.

I find this research into anti-piracy techniques is a waste. We humans are smarter than the technology we create. We are pattern finders, machines are number crunchers. Some one will break this system, and make the machine remove it. All new users have the ability to pop questions into Google to find out why they can't listen to their CDs, and find an answer within 20 minutes of searching.

Also, what about the research that suggests that piracy doesn't affect sales? Harvard for one has researched this for years. A simple Google search will give you results, I'll provide one: http://www.news.harvard.edu/gazette/2004/0...ilesharing.html.

Audio identification

Reply #9 – 2007-01-13 17:37:19

Quote from: SamHain86 on 2007-01-12 23:53:09

I find this research into anti-piracy techniques is a waste. We humans are smarter than the technology we create. We are pattern finders, machines are number crunchers. Some one will break this system, and make the machine remove it.

That's a very naive and uninformed point of view about security technologies.
Not only can we build security systems which cannot be broken with the current
knowledge, but we can even build provably unbreakable algorithms, see for
instance http://en.wikipedia.org/wiki/One-time_pad

Audio identification

Reply #10 – 2007-01-14 03:28:38

Quote

How? Steganography injected watermarking noise into the audiotracks? With mere encode to a lossy format, those watermarking will disappear.

Steganography with watermarking noise isn't as common with audio as it is with text data and still images. The reason being is that the quantization noise is extremely difficult to control. It's been discussed before. The acoustic fingerprint is more common though.

Audio identification

Reply #11 – 2007-01-14 04:48:56

Quote from: mixminus1 on 2007-01-12 21:47:02

How about these requirements for digital cinema video and audio watermarking (from the DCI Digital Cinema System Specification v1:

Quote
9.4.6.1.2. Image/Picture Survivability Requirements...

Those are some pretty impressive requirements.

Edit: Grammatical error.

Audio identification

Reply #12 – 2007-01-14 05:28:12

Indeed very interesting.
The one time pad is secure but only if its used carefully. For example notes should be disposed of properly and the pad must be totally random. Mistakes have been done and as a result cracked it. Exploits on the same wiki link.
I recall a printer manufacturer selling printers that watermark the documents printed.

Watermarking is being used to identify dvd screener and (yet to be released) cd audio leaks.

Audio identification

Reply #13 – 2007-01-14 05:36:27

Quote from: spath on 2007-01-13 17:37:19

Quote from: SamHain86 on 2007-01-12 23:53:09

I find this research into anti-piracy techniques is a waste. We humans are smarter than the technology we create. We are pattern finders, machines are number crunchers. Some one will break this system, and make the machine remove it.
That's a very naive and uninformed point of view about security technologies.
Not only can we build security systems which cannot be broken with the current
knowledge, but we can even build provably unbreakable algorithms, see for
instance http://en.wikipedia.org/wiki/One-time_pad

NOTHING is ever safe in the digital (and real world). You can make the safest and most secure system out there (compare that to a home security system). If hackers want in, they will get in (if a burglar wants in, they will get in). The digital age has helped secure some things but everything in the digital era is based on number crunching and solving one complex algorithm after another. Since these codes are human made, they can also be destroyed by humans. Engineers (whether Civil, Electrical, Mechanical, Chemical, etc.) never make the assumption that something is sound. Just when you start to think that something will never be broken into or taken down, someone steps in a ruins everything.

Audio identification

Reply #14 – 2007-01-14 14:05:05

Quote from: TREX6662k6 on 2007-01-14 05:28:12

The one time pad is secure but only if its used carefully. For example notes should be disposed of properly and the pad must be totally random. Mistakes have been done and as a result cracked it. Exploits on the same wiki link.

Flawed implementations of the OTP have been broken,
but the algorithm itself (and its correct implementations)
provide 100% provable security.

Watermarking has not been studied long enough to have
reached the same results, but some watermarking schemes
are already provably robust against certain classes of attacks,
in the sense that you cannot remove the watermark without
degrading the content beyond acceptable quality. See for
instance http://www.rle.mit.edu/dspg/documents/ProvablyRobust.pdf

Audio identification

Reply #15 – 2007-01-15 12:09:44

Watermarking by definition manipulates noise of the data. Make it more robust, and noise increases. Make it store more data, and likewise, noise increases.

Mere fingerprinting... the post prior to my previous post says that the Evil Organization (*cough* RIAA *cough*) wants to track sales of audio up to the selling store. Unless you master a track differently for different stores... no way watermarking it is then.

Remember that lossy encoding all operates with different assumptions; a watermark technology that survives MP3 might not survive Vorbis. A watermark that survives MP3 and Vorbis might not survive AAC. A watermark that survives all lossy encoding... might be too noisy.

I remember vaguely of a watermarking technology for images... it survives cropping, editing, rotating... but apply a slight gaussian blur and the watermark disappears completely.

Hmm... maybe Vorbis' noise normalization can also be optimized to kill watermarks?

Audio identification

Reply #16 – 2007-01-15 12:26:18

Not being an expert, there are some points that come to mind:
1. In order to be acceptable to customers, water-marking should be invisible/inaudible or only slightly visible/audible.

2. An audio/video transmission system may be expected to do any transformation on the signal that is invisible/inaudible, and some that are slightly visible/audible.

I cannot see how 1. and 2. can be combined. Either watermarking must make serious perceptible changes to the content, or it must be non-robust to signal chains that are built around human perception (such as mp3)?

Watermarking of lossless image-files for instance, can exploit the redundancy in SNR, knowing that high-frequency "noise" wont be smoothed out. If the user does a simple scaling/compression though... The hwole point of image compression is to encode only the bits that matters to the user, and then there are no room for encoding further information without affecting... bits that matter to the user.

-k

Audio identification

Reply #17 – 2007-01-16 15:40:14

Quote from: pepoluan on 2007-01-15 12:09:44

Watermarking by definition manipulates noise of the data. Make it more robust, and noise increases. Make it store more data, and likewise, noise increases.

One could say: "watermark datarate * robustness = power of induced noise"
The thing is: You don't need to transport big data chunks. A simple 128 bit number (or less) is all that's needed to mark and identify audio streams. Put in a lot of redundancy & FECC for robustness and the noise level is still pretty low due to the extremely low datarate of the embedded "watermark stream" like 3 kbits/sec or something. Although I've no idea how these techniques actually work it sure is believable that it's possible to add such watermarks transparently and robustly so they survice a couple of transcodings.

SG

Audio identification

Reply #18 – 2007-01-19 05:37:29

Quote from: SebastianG on 2007-01-16 15:40:14

One could say: "watermark datarate * robustness = power of induced noise"
The thing is: You don't need to transport big data chunks. A simple 128 bit number (or less) is all that's needed to mark and identify audio streams. Put in a lot of redundancy & FECC for robustness and the noise level is still pretty low due to the extremely low datarate of the embedded "watermark stream" like 3 kbits/sec or something. Although I've no idea how these techniques actually work it sure is believable that it's possible to add such watermarks transparently and robustly so they survice a couple of transcodings.

Well, yes you have a point. What I was trying to convey actually is that to create a watermark that survives MP3, Vorbis, WMA, AAC, WavPack Lossy, OptimFROG DualStream, MPC, Atrac, and <put in any unmentioned lossy codecs here> at the same time will undoubtedly drive the noise to unacceptable levels.

Audio identification

Reply #19 – 2007-01-19 09:11:17

Looking at the thread-starter, it seems to me that to sucessfully identify a musical work, it would be a lot simpler to:

A) Exploit the "humanly perceptually significant cues" that are embedded into each song, and correlate those to a database - like Sony Ericsson is doing in their mobile phones where you can record 5 seconds of a song played on radio and it will identify it, giving you the option to purchase it.

than

B) Embedding information into the music that is not significant to humans, while being robust to compression/transmission systems that try to remove all information that is insignificant to humans

The reason:
For finding copyrighted work on eg YouTube, one does not have to distinguish between individual purchases of that song, identifying the song is enough to demand that video removed. It seems to me that everyone (myself included) instantly jumped to the conclusion that steganography was needed for this activity...

:-)

Audio identification

Reply #20 – 2007-01-19 09:22:53

Quote from: pepoluan on 2007-01-19 05:37:29

What I was trying to convey actually is that to create a watermark that survives MP3, Vorbis, WMA, AAC, WavPack Lossy, OptimFROG DualStream, MPC, Atrac, and <put in any unmentioned lossy codecs here> at the same time will undoubtedly drive the noise to unacceptable levels.

Depends on who you ask. I find it surprising how there hasn't been much mention of psychoacoustic models for comparing two pieces of audio. Perhaps this method is not used at all for algorithmic generation of audio signatures? I'm almost starting to get confused with the terminology I'm using. Sorry about that.

Audio identification

Reply #21 – 2007-01-19 10:12:30

Watermarking is often proposed to be used as an 'alternative' for DRM, which sounds not too bad in the beginning...

... but the problem here is the 'what if'. What if I have a CDR made of bought, watermarked tracks lying around in my car and a burglar takes the stereo & the bunch of CDs with him - and puts the nicely reripped (quality loss, you moron! hehe) tracks on P2P. I will be traced and sued - for not paying attention to where I put my CDs? Same story for a nephew who visits your place and quickly swipes (bad boy! *spank*) a part of your watermarked collection on his USB stick, gives it to his friends, of whom one will put it on P2P...

In this scenario, it is likely that you will have to guard your music like some companies' secrets. Put 'em in a safe... however, since nothing like that has ever happened, it is as of yet unknown what the consequences could be.

Probably, the best thing to do from the industry side would be to treat customers as customers again and go without DRM AND watermarking.

So far, any news on how this ultra-robust watermarking is gonna work?

Audio identification

Reply #22 – 2007-01-19 15:52:29

Quote from: pepoluan on 2007-01-19 05:37:29

Well, yes you have a point. What I was trying to convey actually is that to create a watermark that survives MP3, Vorbis, WMA, AAC, WavPack Lossy, OptimFROG DualStream, MPC, Atrac, and <put in any unmentioned lossy codecs here> at the same time will undoubtedly drive the noise to unacceptable levels.

Why? The audio survives! You can still tell what the song is. (If you can't, the quality is probably low enough that it's not worth protecting!).

OK, so that's fingerpriting - but think of all the audio domains which perceptual encoding makes no attempt to exploit.

I'll give you an example: there's an algorithm out there which hides "echoes". (That's a simple way of putting it, but it'll do for now). The Human Auditory system is almost deaf to certain echo amplitudes and delays, but it's not due to conventional spectral or temporal masking (it's way above these thresholds), and no psychoacoustic codecs do anything (intentional) to these echoes.

That's a good watermark. Before you say "but a recording might contain a specific echo" - obviously known patterns are used and detected. Even if at some moment one comes up at random, that's just noise on the input to the detector - the system can be designed to cope with that.

Whether all this is a good idea or not is a different debate, but the technology certainly works as far as embedding, psychoacoustic coding, and subsequent detection.

Cheers,
David.

Audio identification

Reply #23 – 2007-01-19 19:27:21

Heh, I don't disagree with you, David. I surely can still identify what song is being played when I dial into an AM radio station. But the noise is horrendous

That said, mucking with noise for the majority of the song may be passable, but for some songs may cause interesting artifacts. I draw a (hopefully not too far-fetched) analogy: Most tracks encodes transparently with Vorbis -q 3, but some tracks seem to have an affinity with noise normalization of -q 3, and requires -q 4, even -q 5, to become transparent.

Audio identification

Reply #24 – 2007-02-02 01:01:53

This is a very intesting problem. It has more to do with the nature of perception. There would certainly be a use for such a class of algorithms.

The problem is nobody has yet determined a way to do this. Same thing for pattern/facial/image recognition. People can do this very quickly, as yet computers cannot. For example, often times when joining a discussion forum or site you'll be asked to identify some scrambled text. This is done to weed out spammers and bots.

Recognition is a very interesting thing. Speech and text recognition are still in their infancy, though they have improved somewhat. Same thing of facial recognition. The algorithms to do this are still very brittle, and have nothing of the flexibility and fluidity of human perception/recognition.

Ideally a 'fingerprinting' algorithm could recognize a tune as surely as person can, no matter how it's encoded. We aren't quite there yet, so I would not worry about this too much as yet.

An interesting 'AI' problem.

Notice