Skip to main content

Topic: Synchrotron (Read 9216 times) previous topic - next topic

0 Members and 1 Guest are viewing this topic.
  • rpp3po
  • [*][*][*][*][*]
  • Developer
Synchrotron
Synchrotron 1.2

Both audio encoders and decoders often add over a thousand samples of delay at the beginning of a file. This prevents both gapless playback and proper, sample synced ABX testing.

Several different implementations for gapless meta information have evolved over time for different lossy encoders. In practice this can work out pretty well, if you have full control over encoding and playback.

If you want to compare samples of different encoders, experience has shown that one cannot be sure that delay has really been removed for all files. Some decoders add their own delay or not and remove encoder delay (by reading meta information) or not. For example, converting Quicktime encoded AAC files through the Quicktime framework does not add decoding delay and removes encoder delay. Converting the same file to WAV with VLC adds 1088 samples of overall delay instead. I also got different overall delays from LAME encoded files at 48 kb/s and 128 kb/s.

Synchrotron can remove delay introduced by all lossy codecs, without having to rely on meta data. It uses a mathematical process called cross-correlation to exactly sample synchronize two files and then cut off any leading delay from the second (or just display it).

It serves the following purposes:
  • Prepare files prior to ABX testing.
  • Verify your encoder's/decoder's or disk writer's accuracy concerning meta-data based delay handling.
  • Generally provide ability to display two files' cross correlation.
  • Provide well structured and easy to read sample code, so that other developers can implement the same mechanism into their programs (e.g. Foobar's ABX component).

This is the cross platform Java binary:

Synchrotron-1.2.zip

There is only a command line interface, yet no GUI. It is easy to integrate into scripts or other applications. I'm also not planning to write a GUI, feel free to try it yourself, if you are interested.

Code: [Select]
Usage: java -jar synchotron.jar [--cut] primary_wav secondary_wav

  primary_wav: original PCM WAVE file (up to 24 bit integer supported)
secondary_wav: PCM WAVE file with similar content and possible delay
        --cut: remove delay from the beginning of secondary_wav
   --fullscan: scan entire file (very slow)


Sample output:
Code: [Select]
java -jar Synchrotron.jar --cut tmp1.wav tmp2.wav 

PCM_SIGNED 44100.0 Hz, 16 bit, stereo, 4 bytes/frame, little-endian
Skipped 24598 leading sample(s) to improve accuracy.
Delay: 1088 - Cross Correlation: 0.99712723
Delay removal successful!


Make sure that you have a current Java Runtime Environment installed, either from your distribution's package repository (Linux) or from here (Windows). Mac OS X has it pre-installed.

Put the jar file into the same directory as your WAV files or better: a directory listed in your PATH environment variable. For example: /usr/local/bin or c:\windows, ...

Testing and comments welcome!

For developers:

The code is plain OO. All cross-correlation related issues are encapsulated inside the Correlator class. Java specific code, for example anything related to audio file reading, is located in the AudioFileCorrelator class. You are probably only interested in the former.

Testing has shown, that it is totally sufficient to cross-correlate about 40000 samples instead of the whole file. AudioFormatCorrelator will forward the audio streams to a significant position, so that cross-correlation isn't just applied to leading silence/noise.

This is the source including Netbeans project files:

Synchrotron-1.2-Source.zip
  • Last Edit: 24 June, 2009, 09:12:42 PM by rpp3po

  • Axon
  • [*][*][*][*][*]
  • Members (Donating)
Synchrotron
Reply #1
OK rpp3po, this is good sh*t, but this reminds me of a feature I desperately want for vinyl craziness and I wanted to bounce it off of you. But you are most likely going to shoot me for asking for it.

How hard would it be to dynamically cross-correlate the signal? That is, do the cross-correlation at the start of the file, do the time shift internally, and then every T seconds, cut out a 2T-sized chunk of the file, window it with a Gaussian pulse, and then redo the cross-correlation.

Canar, if this is too oddball of a request could you split this off to a separate thread?

Also, can this do subsample delays? Is that even a concern with lossy encoders?
  • Last Edit: 06 June, 2009, 12:45:47 PM by Axon

  • rpp3po
  • [*][*][*][*][*]
  • Developer
Synchrotron
Reply #2
I don't know for sure that if I have fully understood what you want, but basically I don't see anything that would prevent one from doing this. The Correlator class already encapsulates the time shifting internally. You could just create all T seconds a new Correlator object and feed it with two 2T-sized integer(sample) arrays and ask it for a result (getCrossCorrelation()). The class is even thread safe. There would still probably be a second of lag until the result would be available. That's depending on the number of samples within 2T. Cross-correlation is quite processor & memory bandwidth heavy. But in any case you should get every T seconds a result on average.

Also, can this do subsample delays? Is that even a concern with lossy encoders?


No, Synchrotron does not oversample and works at exactly the same precision (sample rate) as the input. For its intended main purpose (WAV file correction) subsample precision would not make a difference, since you can only apply correction in integer steps. I would guess that it doesn't make a difference for lossy encoding, either: The decoded signal is converted to series of PCM samples and you can't apply less than +/- one sample delay correction even if a value at subsample precision would be available.
  • Last Edit: 08 June, 2009, 08:51:15 PM by rpp3po

  • C.R.Helmrich
  • [*][*][*][*][*]
  • Developer
Synchrotron
Reply #3
Nice work, rpp3po!

Also, can this do subsample delays? Is that even a concern with lossy encoders?


No, Synchrotron does not oversample and works at exactly the same precision (sample rate) as the input. For its intended main purpose (WAV file correction) subsample precision would not make a difference, since you can only apply correction in integer steps.

Correct. Plus, at normal sampling rates of 32 kHz or more, sub-sample delays are inaudible. Actually, delays of one or two samples are probably also inaudible, but for blind listening tests, it is always better to restrict inter-stimulus delay to the microsecond range.

This, however, does not mean that lossy encoders do not create sub-sample delays. They in fact do at low bit rates because they downsample before encoding (from 44.1 to 32 kHz, for example). If you then upsample after decoding (or your sound card does so) to obtain the same sampling rate as the original file, this is likely to lead to a non-integer sample delay due to the anti-aliasing filter.

Chris
If I don't reply to your reply, it means I agree with you.

  • rpp3po
  • [*][*][*][*][*]
  • Developer
Synchrotron
Reply #4


*** please delete ***
  • Last Edit: 08 June, 2009, 05:50:08 PM by rpp3po

  • krabapple
  • [*][*][*][*][*]
Synchrotron
Reply #5
Looks to be intended for lossy vs lossless, but could Synchotron be used to 'align' two versions of the same lossless track (e.g., original and remastered version) for subsequent 'nulling' tests?

(I'm going to try it, just thought I'd ask too. ;>)


  • rpp3po
  • [*][*][*][*][*]
  • Developer
Synchrotron
Reply #6
As long as it is just different mastering of the same recording this should work perfectly. If parts of the track have been exchanged with material from other recording sessions your results may vary. Also the content's timing and length should not have been changed. Usual mastering steps as normalization, compression, stereo processing, and even slight reverbation and equalization should not harm much. The average correlation values will be lower, but the point of maximum correlation should still be identifiable for Synchrotron.

The current version is limited to detect at max 4096 samples delay. This is enough for common encoder delays. If about 1/10th of a second possible delay is not enough for your purpose, let me know. Also only 40000 samples are cross-correlated by the main program. If your second mastering is very different, changing that number could help, too.

It's no problem to increase these values, but it would considerably hurt performance, that's why they are preset moderately.
  • Last Edit: 08 June, 2009, 12:57:09 PM by rpp3po

  • Martel
  • [*][*][*][*][*]
Synchrotron
Reply #7
It's no problem to increase these values, but it would considerably hurt performance, that's why they are preset moderately.
This is just a brainstorming attempt based on some knowledge that I once possessed... 
Isn't it possible to do something like FFT of the two signals (you may have to reverse one of them, I don't remember exactly), do a dot product in the spectral domain, then IFFT to obtain the cross-correlation (complexity goes down from N^2 to like NlogN but you need power-of-two sample lengths)?
I apologize if I talk nonsense but this should be basically the same as applying a FIR filter (convolution in time domain ~ dot product in the spectral domain), only one of the signals is reversed in correlation compared to convolution.
IE4 Rockbox Clip+ AAC@192; HD 668B/HD 518 Xonar DX FB2k FLAC;

  • rpp3po
  • [*][*][*][*][*]
  • Developer
Synchrotron
Reply #8
The cross-correlation calculation itself could indeed work at O(n log n) complexity with your proposal, I guess. I'm using an integer only based approach right now without FFT conversion; a side product of this is the exact sample offset position for the highest correlation between both signals. Would your FFT method also output this offset or just two signals' overall cross-correlation value?
  • Last Edit: 08 June, 2009, 06:20:53 PM by rpp3po

  • Martel
  • [*][*][*][*][*]
Synchrotron
Reply #9
Oops, I mistook the English words (I'm not English, sorry). I did not mean dot product of the spectra but rather products of the corresponding spectral components which should yield a N-wide vector (series) on which you may do IFFT (you can't do that on a scalar which is a result of a dot product).

I guess the method yields a series of N cross-correlation values corresponding to different shift offsets. However, I'm not sure which part of the full cross-correlation series (2N - 2 samples, IIRC) that is.

This topic was like 5 minutes during a university class and it was 6 years ago. It wasn't particularly memorable, sorry.
IE4 Rockbox Clip+ AAC@192; HD 668B/HD 518 Xonar DX FB2k FLAC;

  • rpp3po
  • [*][*][*][*][*]
  • Developer
Synchrotron
Reply #10
Updated to 1.1
  • Added security check to not cut files with a cross correlation lower than 0.88.

I had just accidentally interchanged an original and lossy version on the command line and got strange results (like a correlation of 0.44). If this had happened while the --cut option was present, the program could have cut the original instead of the lossy file at an arbitrary position.
  • Last Edit: 24 June, 2009, 10:44:46 PM by rpp3po

Synchrotron
Reply #11
Plus, at normal sampling rates of 32 kHz or more, sub-sample delays are inaudible. Actually, delays of one or two samples are probably also inaudible, but for blind listening tests, it is always better to restrict inter-stimulus delay to the microsecond range.


Are you seriously claiming that you can reliably hear a difference between two files that are misaligned by 3 or more samples?

My experienced-based rule of thumb says that up to 1 mSec difference is innocious.

  • C.R.Helmrich
  • [*][*][*][*][*]
  • Developer
Synchrotron
Reply #12
Are you seriously claiming that you can reliably hear a difference between two files that are misaligned by 3 or more samples?

My experienced-based rule of thumb says that up to 1 mSec difference is innocious.

Yes, I am. Not for stationary passages, of course. But when you cut (for example, when defining a loop in a blind test) right within a sharp attack, say, a castanet or bass drum hit, and then loop that part, you can hear a difference. Example here. Both castanet excerpts are exactly 2 seconds long but offset by 3 samples. Hear for yourself. And this is not even the most obvious example I can come up with. You can always create something which has a clear instationarity between the cut boundaries in one stimulus, but not in the other.

Chris

Update: I just ABXed that successfully using foobar even without looping those two files. The first attack "plops" more in one item, and the "plop" seems to come from a different location in the stereo image.
  • Last Edit: 24 June, 2009, 04:57:59 PM by C.R.Helmrich
If I don't reply to your reply, it means I agree with you.

  • C.R.Helmrich
  • [*][*][*][*][*]
  • Developer
Synchrotron
Reply #13
Sorry, I meant discontinuity instead of instationarity. Also didn't realize that this thread is in the upload section  So here's my two above demo files.

Chris

[ Specified attachment is not available ]
If I don't reply to your reply, it means I agree with you.

  • rpp3po
  • [*][*][*][*][*]
  • Developer
Synchrotron
Reply #14
Updated to 1.2
  • Added --fullscan switch for complete file scans. That's slow and usually not needed for delay computation, but can be employed to get two files' overall cross correlation.
  • Slight refactoring. Increased precision.
  • jUnit test cases and test samples removed from source package.
  • Last Edit: 24 June, 2009, 08:50:16 PM by rpp3po