Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: Understanding accurate rip: Different AR sums from different tools (Read 1839 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Understanding accurate rip: Different AR sums from different tools

Hi,

I am trying to understand how Accurate Rip works and in particular why I get mismatches from time to time.

There are tools which calculate the checksums and there are other tools which can also find out some offset. My guess is, they calculate the checksum for various offsets and compare with the AR database which one gets the highest confidence. Calculating different offsets can probably be done cheaper than actually calculating the checksum again and again due to the construction of the checksum.

Question 1: Am I correct that offset means that there are n bytes of zeros (0x00) prepended or missing at the beginning of the track?
Does the track still have the same length, i.e. is the same number of bytes missing or appended at the end? Or is the track then shorter or longer?

Question 2: Zeros in the beginning shift the whole thing. Do zeros at the end influence the AR checksum or is that like adding zeros (no effect)?


This works consistently for many of my CDs. Pipeline uses abcde v2.9.3 on Linux, the offset of my drive seems to be 6, cdparanoia options are "-v -O 6 -z -S 8". I produce FLAC files. I hope that is correct so far.

When the checksums don't match, I see those possible reasons:

1. if a single track is different, let's say track 7 out of 12, there are probably real errors in the FLAC file. There is probably not much I can do apart from trying to read it again. Correct?
2. if it is a rare CD and it's not in the database, there is nothing to compare against. No chance.
3. Many CDs show a mismatch for the first track and all others are fine with checkCD.pl or arflac.py. This seems to be an offset, as trackverify shows a consistent offset for all tracks and calculates different AR sums.

Example for this last case:

checkCD.pl

Code: [Select]
Track	Ripping Status		[Disc ID: 0015c36c-b40ab30d 00d9b3e4]
 1 ** Rip not accurate **  (confidence 200) [bc1bc698] [721e79dc]
 2 ** Rip not accurate **  (confidence 200) [36d1055c] [d6e7cb16]
 3 ** Rip not accurate **  (confidence 200) [a7b593fb] [749b790a]
 4 ** Rip not accurate **  (confidence 200) [9f31ee0b] [d6de6492]
 5 ** Rip not accurate **  (confidence 200) [5418b599] [4a96322b]
 6 ** Rip not accurate **  (confidence 200) [072bf062] [6b360d52]
 7 ** Rip not accurate **  (confidence 200) [e9af6adf] [4c760c78]
 8 ** Rip not accurate **  (confidence 200) [f1664ad7] [0b6ac440]
 9 ** Rip not accurate **  (confidence 200) [afd628a2] [c791b0d9]
 10 ** Rip not accurate **  (confidence 200) [364ef5f9] [ef476b69]
 11 ** Rip not accurate **  (confidence 200) [ee4edcab] [ec9b083b]
 12 ** Rip not accurate **  (confidence 200) [3f346a8a] [9235d60f]
 13 ** Rip not accurate **  (confidence 200) [127a479d] [85397434]
Your CD disc is possibly a different pressing to the one(s) stored in AccurateRip.
Track(s) Accurately Ripped: 0
**** Track(s) Not Ripped Accurately: 13 ****
Track(s) Not in Database: 0

trackverify -R

Code: [Select]
             AccurateRip V1             AccurateRip V2
Track Confidence Offset Checksum Confidence Offset Checksum
───── ────────── ────── ──────── ────────── ────── ────────
01-S…         17     12 A0CFC979         32      0 5C0DD1A9
02-S…         17     12 AAD5801F         32      0 62E8B697
03-S…         17     12 BCCA651D         31      0 5EFB197D
04-S…         17     12 2B403D76         32      0 4A25F84C
05-S…         17     12 DFDCF85F         32      0 BE45327A
06-S…         17     12 3CC666CD         32      0 3629CFD5
07-S…         17     12 4403CC29         32      0 F2217619
08-S…         17     12 0A434936         32      0 F28D81C7
09-S…         17     12 DBDE7E0E         32      0 9C276863
10-S…         17     12 B8F7A524         32      0 6D14A186
11-S…         17     12 6770D37E         31      0 749A1765
12-S…         17     12 177D6712         32      0 B9E6DF1A
13-S…         17     12 3F73ADAE         32      0 583E7508

Question 3: Is it a good sign that the offsets of all tracks are the same?

Question 4: How can AR version 1 have a different offset than AR v2?

Question 5: Can I correct this offset after the fact such that other tools can verify it, too?

Re: Understanding accurate rip: Different AR sums from different tools

Reply #1
Different CD-Rom drives have different read offsets, this means that when drives with a different read offset read the same audio sector they may return a relatively shifted result. The shift will be a multiple of 4 (one sample of each of L and R) and can be positive or negative. The bytes that are missing at the beginning/end are not necessarily zeroes. The number of sectors should not change as this is encoded in the TOC/subchannel, and a track should be a multiple of 588 samples always AFAIK as the minimum unit of time in a CD timestamp is a sector. As the tools need to be calibrated for the hardware used it's possible that bad config/submitted-results account for multiple possible checksum matches. I assume AR normalises everything to a +0 read offset but TBH it's been a decade or more since I've used it.

Different pressings can also have different write offsets, most notably I believe when the same CD has been manufactured in multiple factories. Unless there are multiple runs with different offsets I don't know if there's a way to reliably determine and account for write offset (unless there's a data track, data tracks have to be read aligned so when they exist write offset can be determined). IMO this is likely the more common source of multiple potential matches to compare against.

Single track mismatch (unless it's the first or last track which may be due to missing probably non-zero samples) is probably a scratch and all you can do is re-read many times and hope there's a consensus. Dust will also be the bane of your existence. I think (but can't recall fully, pinch of salt) that even audio sectors have checksums on the disc, however those checksums are normally handled internally by the cdrom and being audio it'll try to correct instead of error. If you really want to go deep there are certain cdroms that can have custom firmware loaded that may allow all the raw data to be dumped that a tool can work with to try and smartly match with AR. My knowledge comes mostly from dumping console games, it may not apply to pure audio dumping but if it can then someone else should be able to fill in the gaps.

Re: Understanding accurate rip: Different AR sums from different tools

Reply #2
Very interesting topic - unfortunately I cannot contribute, but I'm all ears... ;-)

Re: Understanding accurate rip: Different AR sums from different tools

Reply #3
Question 1: Am I correct that offset means that there are n bytes of zeros (0x00) prepended or missing at the beginning of the track?
Does the track still have the same length, i.e. is the same number of bytes missing or appended at the end? Or is the track then shorter or longer?
The offset means the audio data for the entire disc is shifted by that many samples (not bytes). Padding is inserted at the beginning of the first track or the end of the last track to maintain the correct length. The padding is usually zeroes, but some drives are capable of reading the actual data beyond the ends of the disc, which... is usually still zeroes.

Question 2: Zeros in the beginning shift the whole thing. Do zeros at the end influence the AR checksum or is that like adding zeros (no effect)?
Zeroes at the end also shift the whole thing, just in the opposite direction. An offset in either direction changes the AccurateRip checksums for all tracks. The checksum ignores the first and last 1/15 of a second of audio on the disc to account for different amounts of zero padding that may be inserted to fix the drive offset.

Question 3: Is it a good sign that the offsets of all tracks are the same?
Yes.

Question 4: How can AR version 1 have a different offset than AR v2?
A disc may have multiple pressings with different offsets. The tools you're using may only be able to report one pressing at a time, and may not select the same pressing for both V1 and V2.

Question 5: Can I correct this offset after the fact such that other tools can verify it, too?
Yes.


I assume AR normalises everything to a +0 read offset but TBH it's been a decade or more since I've used it.
It does, but AccurateRip's 0 offset may not actually be 0. I've considered pulling out a microscope and a copy of the Red Book spec to figure out the real offset using one of my CDs...

Unless there are multiple runs with different offsets I don't know if there's a way to reliably determine and account for write offset (unless there's a data track, data tracks have to be read aligned so when they exist write offset can be determined).
One of my CDs with gapless audio actually has small (inaudible) gaps caused by the tracks being split at millisecond boundaries instead of sector boundaries. Since each track starts at a sector boundary, I can determine the exact write offset for this particular CD. (Of course, this method doesn't work for all CDs, so it's ultimately useless.)

Single track mismatch (unless it's the first or last track which may be due to missing probably non-zero samples) is probably a scratch and all you can do is re-read many times and hope there's a consensus.
AccurateRip specifically avoids the first and last 1/15 of a second of audio on the disc to avoid this problem, and no CD drives are known to have an offset anywhere near that big, so a difference in any track is a mismatch. Occasionally different pressings will have the same offset but slightly different data, which can cause a single-track mismatch if you're the first person to rip that pressing.

I think (but can't recall fully, pinch of salt) that even audio sectors have checksums on the disc, however those checksums are normally handled internally by the cdrom and being audio it'll try to correct instead of error.
All sectors have a small amount of error-correcting codes built in, and the drive uses them to repair small errors with 100% confidence. When the drive encounters a larger error that can't be repaired with 100% confidence, it's up to the drive whether you receive an error instead of (or in addition to) the drive's repair attempt.

If you really want to go deep there are certain cdroms that can have custom firmware loaded that may allow all the raw data to be dumped that a tool can work with to try and smartly match with AR. My knowledge comes mostly from dumping console games, it may not apply to pure audio dumping but if it can then someone else should be able to fill in the gaps.
I'm not aware of any CD-based (infrared laser) disc formats that require such low-level access; as far as I know that sort of thing didn't start to show up in games until DVD-based (red laser) disc formats. I could be wrong though!

Re: Understanding accurate rip: Different AR sums from different tools

Reply #4
I think (but can't recall fully, pinch of salt) that even audio sectors have checksums on the disc, however those checksums are normally handled internally by the cdrom and being audio it'll try to correct instead of error.
All sectors have a small amount of error-correcting codes built in, and the drive uses them to repair small errors with 100% confidence. When the drive encounters a larger error that can't be repaired with 100% confidence, it's up to the drive whether you receive an error instead of (or in addition to) the drive's repair attempt.
That's what I'm thinking of, the CIRC stuff that adds 1 byte of parity for every 3 of data. A drive with custom firmware would ideally not give an error or repair attempt for large errors but instead the raw data it read (and any debug data available) for some third party software to analyse (which would have the benefit of lots of compute power, possibly external information like AR checksums, possibly multiple reads but now with better data to work with). If the per-sector error detection/correction is very small it's unlikely to be of much benefit, then again if it can pinpoint where an error occurred more accurately than the info a standard drive error would output (?) it might allow some "medium" errors to be repairable.

ECMA-130 sections 16/17/18/19 details how sectors are actually encoded, outputting F2 frames to be decoded in software is what I think might theoretically be useful (section 19 is super low level, definitely out of scope for a drive to be able to output): https://www.ecma-international.org/publications-and-standards/standards/ecma-130/

If you really want to go deep there are certain cdroms that can have custom firmware loaded that may allow all the raw data to be dumped that a tool can work with to try and smartly match with AR. My knowledge comes mostly from dumping console games, it may not apply to pure audio dumping but if it can then someone else should be able to fill in the gaps.
I'm not aware of any CD-based (infrared laser) disc formats that require such low-level access; as far as I know that sort of thing didn't start to show up in games until DVD-based (red laser) disc formats. I could be wrong though!
Nothing normally requires that level of access AFAIK (it probably would have come in handy to perfectly defeat copy protection on PC games which was advanced even on CD-based games, but imperfect methods and workarounds were good enough).

Re: Understanding accurate rip: Different AR sums from different tools

Reply #5
Excuse me for asking, but isn't the error correction irrelevant to reading a disc (of whatever type) via its SATA port?

I admit I only have a limited understanding of what goes on under the hood, but (logically) it seems to me the error correction would be in the process of reconstructing the audio for the analogue output from a drive, and the SATA data would be "raw".
It's your privilege to disagree, but that doesn't make you right and me wrong.

Re: Understanding accurate rip: Different AR sums from different tools

Reply #6
The audio data for a sector is 2352 bytes, it’s what we understand as the raw samples but it’s not directly written to disc like that (there's many layers of "more raw"). Paraphrasing ECMA-130 sections 16-19, those 2352 bytes are scattered into 98 F1 frames of 24 bytes each. Those F1 frames are turned into 106 (extra frames for addressing/sync?) F2 frames of 32 bytes each by adding 8 bytes of parity (correction data) to each frame. Those F2 frames are turned into F3 frames of 33 bytes by prepending a control byte, the control byte contains what we know as subchannel data P-W which is where the TOC and most extensions like text and graphics are stored. Finally the F3 frames are written to the disc by converting each 8 bit byte to a 14 bit encoding that limits the number of consecutive identical bits to a small range, because physical constraints demand that sort of encoding for the tech to work for whatever reason.

Normally I believe that the "raw" 2352 byte sectors and processed subchannel data is the lowest level a drive outputs to the user, to theoretically get a drive that outputs earlier in the process you'd need custom firmware on a flexible drive (like some plextor models) that uses probably a debug command to output the raw frames. I don't know enough about custom firmware to know how early in the process it's possible to dump from, custom firmware used for game dumping may not go that far.

Re: Understanding accurate rip: Different AR sums from different tools

Reply #7
The offset means the audio data for the entire disc is shifted by that many samples (not bytes). ...
Thanks a lot for your comprehensive answer. The exact meaning of offset is now clearer. That also means that the length of a track stays the same with offsets, which makes sense because the identifier of the CD is derived from the table of contents and different track length lead to different identifiers.

A disc may have multiple pressings with different offsets. The tools you're using may only be able to report one pressing at a time, and may not select the same pressing for both V1 and V2.

Ah, so 31 reported ARv2 like me and perhaps only 10 might have reported AR1 like me, so ARv1 with offset 12 is more frequent (17) and thus shown by trackverify? ARv1 and v2 values are probably not from the same reporter. So these might be different pressings, can it also be that others have reported with a wrong (uncalibrated) offset?

I am quite confident that my offset setting is correct because the device is listed on the AR website and most of the well known CDs get results with 200 confidence. I am now only chasing the reasons for the other CDs.

Thanks again, for the answer. I will take a look at my collection again with this information and see if it explains the differences. I currently have those categories:

- good CD (all confidences > 5)
- good rare CD (confidence 1 .. 5, but not a well known CD)
- offset (as above but with offset)
- problem in track (single track or 2 tracks have a mismatch while the others match)
- not found (rare CD not in AR database)
- famous artist and CD but not in AR database -> probably the TOC and CD identifier is wrong? or just a rare pressing?
- different offsets per track

The last two, I still don't understand.

An example for the last case is this: Brothers in Arms by Dire Straits. This should be a popular album, but I get no match for ARv2 and very low confidence for ARv1 – and with varying (and sometimes high) offsets.

Code: [Select]
                                 AccurateRip V1             AccurateRip V2
Track                      Confidence Offset Checksum Confidence Offset Checksum
────────────────────────── ────────── ────── ──────── ────────── ────── ────────
01-Dire_Straits-So_Far_Aw…          4  -1979 D097E393                 0 2C5BA93C
02-Dire_Straits-Money_for…          4   -193 1ED89B7C                 0 B5FC95D3
03-Dire_Straits-Walk_of_L…          4  -1979 6E02AA4E                 0 BD199401
04-Dire_Straits-Your_Late…          4   -193 CAF3B9E3                 0 BA156925
05-Dire_Straits-Why_Worry…          4   -193 3DC30A26                 0 00F7AC29
06-Dire_Straits-Ride_Acro…          4  -1979 DE6E8135                 0 C059EA3C
07-Dire_Straits-The_Mans_…          4   -193 186BA23F                 0 C854F5F5
08-Dire_Straits-One_World…          4   -193 3CB75ABB                 0 66E225D1
09-Dire_Straits-Brothers_…          5  -1979 11D7B534                 0 DF802592
What happens here?

Re: Understanding accurate rip: Different AR sums from different tools

Reply #8
AccurateRip's "zero" was taken from EAC, which appears to be 30 samples off. https://hydrogenaud.io/index.php/topic,50301.0.html

As for getting lower-level output from the drive ... how low did https://web.archive.org/web/20180113163201/perfectrip.cdfreaks.com/ go?  Initial beta didn't make make for anything deeper than logged at https://hydrogenaud.io/index.php/topic,52237.msg562650.html#msg562650 , and the tool was abandoned before it got to being really useful. I never even got it to work, but I probably wasn't stubborn enough either.

Re: Understanding accurate rip: Different AR sums from different tools

Reply #9
Looks like it didn't go deeper but instead monitored the CIRC correction (exposing if the drive decoding F2 was successful) so you'd know when and where things went awry, looks like that's what C2 error pointers do and exposing that data increases confidence in the rip. 0 errors seems to be required for audio according to this (data tracks can fall back to error correction within the 2352 bytes): https://en.wikipedia.org/wiki/C2_error

The following link makes a distinction between C2 and CU but it's probably just a difference in terminology, the former might use "C2" as error whereas this uses "C2" as a successful correction and "CU" as an error: https://qpxtool.sourceforge.io/glossar.html#cx

That state of the art tools use C2 instead of dumping frame data and decoding in software indicates that it's unlikely the frame data can be dumped by custom firmware, or the benefit is not there to do so. Also FWIW this is the main tool that dumps games and it can be used to dump audio CDs, not that you should (unless maybe it has copy protection) as it doesn't use AR: https://github.com/saramibreak/DiscImageCreator

Re: Understanding accurate rip: Different AR sums from different tools

Reply #10
AccurateRip's "zero" was taken from EAC, which appears to be 30 samples off. https://hydrogenaud.io/index.php/topic,50301.0.html

As for getting lower-level output from the drive ... how low did https://web.archive.org/web/20180113163201/perfectrip.cdfreaks.com/ go?  Initial beta didn't make make for anything deeper than logged at https://hydrogenaud.io/index.php/topic,52237.msg562650.html#msg562650 , and the tool was abandoned before it got to being really useful. I never even got it to work, but I probably wasn't stubborn enough either.

That is, users of EAC populate the AR database with checksums that have -30 offset? But if that was the problem, then a verified track with offset -30 should be very frequent, shouldn't it?

Re: Understanding accurate rip: Different AR sums from different tools

Reply #11
Ah, so 31 reported ARv2 like me and perhaps only 10 might have reported AR1 like me, so ARv1 with offset 12 is more frequent (17) and thus shown by trackverify?
That's my best guess, based on what the program appears to be doing.

So these might be different pressings, can it also be that others have reported with a wrong (uncalibrated) offset?
It's possible, but most reports come from programs like EAC that require calibrating the offset before any reports can be uploaded.

- famous artist and CD but not in AR database -> probably the TOC and CD identifier is wrong? or just a rare pressing?
The most likely explanation is that you're doing something that sometimes causes wrong CD identifiers - maybe incorrect pregap handling?

- different offsets per track
Do they usually have the same confidence? Maybe you're seeing two pressings with the same confidence, and the displayed pressing is randomly selected per track instead of per disc.

That is, users of EAC populate the AR database with checksums that have -30 offset? But if that was the problem, then a verified track with offset -30 should be very frequent, shouldn't it?
All tools use the same offset. It just happens that the offset was a guess based on incomplete data, and the guess turned out to (probably) be 30 samples away from the correct number.

Re: Understanding accurate rip: Different AR sums from different tools

Reply #12
- famous artist and CD but not in AR database -> probably the TOC and CD identifier is wrong? or just a rare pressing?
The most likely explanation is that you're doing something that sometimes causes wrong CD identifiers - maybe incorrect pregap handling?

That's a very interesting route! I see this problem more frequently with albums that contain 2 or more CDs and more frequently with different pregaps (I encountered 180/75 and 183/75 seconds). But not consistently. So there seems to be a correlation but it's not the only factor.
In addition, I am not sure how tools (like trackverify or checkCD.pl) which operate on FLAC files can know about the pregap at all.
I copied the CD with abcde which uses cdparanoia and derived the TOC with abcde-musicbrainz-tool. Any ideas where I could look or which tools I could try to find the correct identifier?

- different offsets per track
Do they usually have the same confidence? Maybe you're seeing two pressings with the same confidence, and the displayed pressing is randomly selected per track instead of per disc.

Yes, see the Dire Straits example above.


Re: Understanding accurate rip: Different AR sums from different tools

Reply #14
No, I am not sure. I never thought of it because I bought it in a regular store.

 

Re: Understanding accurate rip: Different AR sums from different tools

Reply #15
After some time, I'm coming back to this topic. With some more investigation, I found out that most of my CDs are accurately ripped. The issue was indeed an incorrect pregap handling as suggested by @Octocontrabass:

Quote
The most likely explanation is that you're doing something that sometimes causes wrong CD identifiers - maybe incorrect pregap handling?

The tools I used assume a pregap of 150 sectors (2.0 s). But some discs have a pregap of 182 or 183 (or rarely other values). I knew this before but now I realized that
Code: [Select]
cdrip-tools/arverify.py
can calculate the correct AR-Disc-ID with the -a option (e.g. -a 33 for a pregap of 183). That way, all my tracks are accurately ripped with some understandable exceptions: a few very rare discs from unknown artists and some tracks which are damaged (cdparanoia has problems reading them).

I wrote the TOC to the metadata. So now I can derive the disc ID from the metadata of any individual track and validate the AR checksum.

So problem solved for me. Thanks for the pregap hint, @Octocontrabass !