Understanding accurate rip: Different AR sums from different tools

Question 1: Am I correct that offset means that there are n bytes of zeros (0x00) prepended or missing at the beginning of the track?
Does the track still have the same length, i.e. is the same number of bytes missing or appended at the end? Or is the track then shorter or longer?

The offset means the audio data for the entire disc is shifted by that many samples (not bytes). Padding is inserted at the beginning of the first track or the end of the last track to maintain the correct length. The padding is usually zeroes, but some drives are capable of reading the actual data beyond the ends of the disc, which... is usually still zeroes.

Question 2: Zeros in the beginning shift the whole thing. Do zeros at the end influence the AR checksum or is that like adding zeros (no effect)?

Zeroes at the end also shift the whole thing, just in the opposite direction. An offset in either direction changes the AccurateRip checksums for all tracks. The checksum ignores the first and last 1/15 of a second of audio on the disc to account for different amounts of zero padding that may be inserted to fix the drive offset.

Question 3: Is it a good sign that the offsets of all tracks are the same?

Yes.

Question 4: How can AR version 1 have a different offset than AR v2?

A disc may have multiple pressings with different offsets. The tools you're using may only be able to report one pressing at a time, and may not select the same pressing for both V1 and V2.

Question 5: Can I correct this offset after the fact such that other tools can verify it, too?

Yes.

I assume AR normalises everything to a +0 read offset but TBH it's been a decade or more since I've used it.

It does, but AccurateRip's 0 offset may not actually be 0. I've considered pulling out a microscope and a copy of the Red Book spec to figure out the real offset using one of my CDs...

Unless there are multiple runs with different offsets I don't know if there's a way to reliably determine and account for write offset (unless there's a data track, data tracks have to be read aligned so when they exist write offset can be determined).

One of my CDs with gapless audio actually has small (inaudible) gaps caused by the tracks being split at millisecond boundaries instead of sector boundaries. Since each track starts at a sector boundary, I can determine the exact write offset for this particular CD. (Of course, this method doesn't work for all CDs, so it's ultimately useless.)

Single track mismatch (unless it's the first or last track which may be due to missing probably non-zero samples) is probably a scratch and all you can do is re-read many times and hope there's a consensus.

AccurateRip specifically avoids the first and last 1/15 of a second of audio on the disc to avoid this problem, and no CD drives are known to have an offset anywhere near that big, so a difference in any track is a mismatch. Occasionally different pressings will have the same offset but slightly different data, which can cause a single-track mismatch if you're the first person to rip that pressing.

I think (but can't recall fully, pinch of salt) that even audio sectors have checksums on the disc, however those checksums are normally handled internally by the cdrom and being audio it'll try to correct instead of error.

All sectors have a small amount of error-correcting codes built in, and the drive uses them to repair small errors with 100% confidence. When the drive encounters a larger error that can't be repaired with 100% confidence, it's up to the drive whether you receive an error instead of (or in addition to) the drive's repair attempt.

If you really want to go deep there are certain cdroms that can have custom firmware loaded that may allow all the raw data to be dumped that a tool can work with to try and smartly match with AR. My knowledge comes mostly from dumping console games, it may not apply to pure audio dumping but if it can then someone else should be able to fill in the gaps.

I'm not aware of any CD-based (infrared laser) disc formats that require such low-level access; as far as I know that sort of thing didn't start to show up in games until DVD-based (red laser) disc formats. I could be wrong though!

Re: Understanding accurate rip: Different AR sums from different tools

Reply #4 – 2022-12-28 22:26:44

Quote from: cid42 on 2022-12-27 23:02:29
I think (but can't recall fully, pinch of salt) that even audio sectors have checksums on the disc, however those checksums are normally handled internally by the cdrom and being audio it'll try to correct instead of error.
All sectors have a small amount of error-correcting codes built in, and the drive uses them to repair small errors with 100% confidence. When the drive encounters a larger error that can't be repaired with 100% confidence, it's up to the drive whether you receive an error instead of (or in addition to) the drive's repair attempt.

That's what I'm thinking of, the CIRC stuff that adds 1 byte of parity for every 3 of data. A drive with custom firmware would ideally not give an error or repair attempt for large errors but instead the raw data it read (and any debug data available) for some third party software to analyse (which would have the benefit of lots of compute power, possibly external information like AR checksums, possibly multiple reads but now with better data to work with). If the per-sector error detection/correction is very small it's unlikely to be of much benefit, then again if it can pinpoint where an error occurred more accurately than the info a standard drive error would output (?) it might allow some "medium" errors to be repairable.

ECMA-130 sections 16/17/18/19 details how sectors are actually encoded, outputting F2 frames to be decoded in software is what I think might theoretically be useful (section 19 is super low level, definitely out of scope for a drive to be able to output): https://www.ecma-international.org/publications-and-standards/standards/ecma-130/

Quote from: cid42 on 2022-12-27 23:02:29
If you really want to go deep there are certain cdroms that can have custom firmware loaded that may allow all the raw data to be dumped that a tool can work with to try and smartly match with AR. My knowledge comes mostly from dumping console games, it may not apply to pure audio dumping but if it can then someone else should be able to fill in the gaps.
I'm not aware of any CD-based (infrared laser) disc formats that require such low-level access; as far as I know that sort of thing didn't start to show up in games until DVD-based (red laser) disc formats. I could be wrong though!

Nothing normally requires that level of access AFAIK (it probably would have come in handy to perfectly defeat copy protection on PC games which was advanced even on CD-based games, but imperfect methods and workarounds were good enough).

Re: Understanding accurate rip: Different AR sums from different tools

Reply #5 – 2022-12-29 08:37:40

Excuse me for asking, but isn't the error correction irrelevant to reading a disc (of whatever type) via its SATA port?

I admit I only have a limited understanding of what goes on under the hood, but (logically) it seems to me the error correction would be in the process of reconstructing the audio for the analogue output from a drive, and the SATA data would be "raw".

Re: Understanding accurate rip: Different AR sums from different tools

Reply #6 – 2022-12-29 09:33:38

The audio data for a sector is 2352 bytes, it’s what we understand as the raw samples but it’s not directly written to disc like that (there's many layers of "more raw"). Paraphrasing ECMA-130 sections 16-19, those 2352 bytes are scattered into 98 F1 frames of 24 bytes each. Those F1 frames are turned into 106 (extra frames for addressing/sync?) F2 frames of 32 bytes each by adding 8 bytes of parity (correction data) to each frame. Those F2 frames are turned into F3 frames of 33 bytes by prepending a control byte, the control byte contains what we know as subchannel data P-W which is where the TOC and most extensions like text and graphics are stored. Finally the F3 frames are written to the disc by converting each 8 bit byte to a 14 bit encoding that limits the number of consecutive identical bits to a small range, because physical constraints demand that sort of encoding for the tech to work for whatever reason.

Normally I believe that the "raw" 2352 byte sectors and processed subchannel data is the lowest level a drive outputs to the user, to theoretically get a drive that outputs earlier in the process you'd need custom firmware on a flexible drive (like some plextor models) that uses probably a debug command to output the raw frames. I don't know enough about custom firmware to know how early in the process it's possible to dump from, custom firmware used for game dumping may not go that far.

Re: Understanding accurate rip: Different AR sums from different tools

Reply #7 – 2022-12-29 09:48:02

The offset means the audio data for the entire disc is shifted by that many samples (not bytes). ...

Thanks a lot for your comprehensive answer. The exact meaning of offset is now clearer. That also means that the length of a track stays the same with offsets, which makes sense because the identifier of the CD is derived from the table of contents and different track length lead to different identifiers.

Quote from: Porcus on 2022-12-29 10:41:50

A disc may have multiple pressings with different offsets. The tools you're using may only be able to report one pressing at a time, and may not select the same pressing for both V1 and V2.

Ah, so 31 reported ARv2 like me and perhaps only 10 might have reported AR1 like me, so ARv1 with offset 12 is more frequent (17) and thus shown by trackverify? ARv1 and v2 values are probably not from the same reporter. So these might be different pressings, can it also be that others have reported with a wrong (uncalibrated) offset?

I am quite confident that my offset setting is correct because the device is listed on the AR website and most of the well known CDs get results with 200 confidence. I am now only chasing the reasons for the other CDs.

Thanks again, for the answer. I will take a look at my collection again with this information and see if it explains the differences. I currently have those categories:

- good CD (all confidences > 5)
- good rare CD (confidence 1 .. 5, but not a well known CD)
- offset (as above but with offset)
- problem in track (single track or 2 tracks have a mismatch while the others match)
- not found (rare CD not in AR database)
- famous artist and CD but not in AR database -> probably the TOC and CD identifier is wrong? or just a rare pressing?
- different offsets per track

The last two, I still don't understand.

An example for the last case is this: Brothers in Arms by Dire Straits. This should be a popular album, but I get no match for ARv2 and very low confidence for ARv1 – and with varying (and sometimes high) offsets.

Code: [Select]

                                 AccurateRip V1             AccurateRip V2
Track                      Confidence Offset Checksum Confidence Offset Checksum
────────────────────────── ────────── ────── ──────── ────────── ────── ────────
01-Dire_Straits-So_Far_Aw…          4  -1979 D097E393                 0 2C5BA93C
02-Dire_Straits-Money_for…          4   -193 1ED89B7C                 0 B5FC95D3
03-Dire_Straits-Walk_of_L…          4  -1979 6E02AA4E                 0 BD199401
04-Dire_Straits-Your_Late…          4   -193 CAF3B9E3                 0 BA156925
05-Dire_Straits-Why_Worry…          4   -193 3DC30A26                 0 00F7AC29
06-Dire_Straits-Ride_Acro…          4  -1979 DE6E8135                 0 C059EA3C
07-Dire_Straits-The_Mans_…          4   -193 186BA23F                 0 C854F5F5
08-Dire_Straits-One_World…          4   -193 3CB75ABB                 0 66E225D1
09-Dire_Straits-Brothers_…          5  -1979 11D7B534                 0 DF802592

What happens here?

Re: Understanding accurate rip: Different AR sums from different tools

Reply #8 – 2022-12-29 10:41:50

AccurateRip's "zero" was taken from EAC, which appears to be 30 samples off. https://hydrogenaud.io/index.php/topic,50301.0.html

As for getting lower-level output from the drive ... how low did https://web.archive.org/web/20180113163201/perfectrip.cdfreaks.com/ go? Initial beta didn't make make for anything deeper than logged at https://hydrogenaud.io/index.php/topic,52237.msg562650.html#msg562650 , and the tool was abandoned before it got to being really useful. I never even got it to work, but I probably wasn't stubborn enough either.

Re: Understanding accurate rip: Different AR sums from different tools

Reply #9 – 2022-12-29 12:12:45

Looks like it didn't go deeper but instead monitored the CIRC correction (exposing if the drive decoding F2 was successful) so you'd know when and where things went awry, looks like that's what C2 error pointers do and exposing that data increases confidence in the rip. 0 errors seems to be required for audio according to this (data tracks can fall back to error correction within the 2352 bytes): https://en.wikipedia.org/wiki/C2_error

The following link makes a distinction between C2 and CU but it's probably just a difference in terminology, the former might use "C2" as error whereas this uses "C2" as a successful correction and "CU" as an error: https://qpxtool.sourceforge.io/glossar.html#cx

That state of the art tools use C2 instead of dumping frame data and decoding in software indicates that it's unlikely the frame data can be dumped by custom firmware, or the benefit is not there to do so. Also FWIW this is the main tool that dumps games and it can be used to dump audio CDs, not that you should (unless maybe it has copy protection) as it doesn't use AR: https://github.com/saramibreak/DiscImageCreator

Re: Understanding accurate rip: Different AR sums from different tools

Reply #10 – 2022-12-29 17:36:47

AccurateRip's "zero" was taken from EAC, which appears to be 30 samples off. https://hydrogenaud.io/index.php/topic,50301.0.html

As for getting lower-level output from the drive ... how low did https://web.archive.org/web/20180113163201/perfectrip.cdfreaks.com/ go? Initial beta didn't make make for anything deeper than logged at https://hydrogenaud.io/index.php/topic,52237.msg562650.html#msg562650 , and the tool was abandoned before it got to being really useful. I never even got it to work, but I probably wasn't stubborn enough either.

That is, users of EAC populate the AR database with checksums that have -30 offset? But if that was the problem, then a verified track with offset -30 should be very frequent, shouldn't it?

Re: Understanding accurate rip: Different AR sums from different tools

Reply #11 – 2022-12-29 18:49:29

Ah, so 31 reported ARv2 like me and perhaps only 10 might have reported AR1 like me, so ARv1 with offset 12 is more frequent (17) and thus shown by trackverify?

That's my best guess, based on what the program appears to be doing.

So these might be different pressings, can it also be that others have reported with a wrong (uncalibrated) offset?

It's possible, but most reports come from programs like EAC that require calibrating the offset before any reports can be uploaded.

- famous artist and CD but not in AR database -> probably the TOC and CD identifier is wrong? or just a rare pressing?

The most likely explanation is that you're doing something that sometimes causes wrong CD identifiers - maybe incorrect pregap handling?