Different Checksums

Topic: Different Checksums (Read 13429 times) previous topic - next topic

0 Members and 2 Guests are viewing this topic.

Different Checksums

Reply #25 – 2015-09-05 16:53:49

Maybe the list has expanded but submissions have historically been limited to rips* from dBpoweramp and EAC.

(*) This is not to be confused with verification of rips done after the fact. The information submitted to the database was generated at the time of extraction.

Different Checksums

Reply #26 – 2015-09-06 19:06:13

Quote from: korndawg on 2015-09-05 16:37:35

Also, is there any way to detect a manufacturing defect?

If a pristine disc "behaves like a scratched one", no matter what drive - that is an indication. (Use software which reports so-called C2 errors, which is where the cross-interleaved error correction lies.)
(Some copy protection schemes work by intentionally tampering with C2 errors. That is, they give you a product so fragile that ripping software will find errors all the time.)

Now if you have two CDs of the same title, and both yield the same error, then you can be fairly sure it is from the press. Even more if you rip them on two different drive; if discI and discII both yield result X on drive A and both yield result Y on drive B, you can be fairly sure.
Of course, if it is a popular title and the issue is at one spot only, then you should get a high AR score on all but that track, and possibly still an AR verification on that track, but with lower score. Here is a possible one:

Disc 2, track 5. I match 89 out of hundredandfortysomething. Obviously it isn't just my CD. Trying to verify it now, seven years later, CUETools yields something that starts out like this:

Code: [Select]

[CUETools log; Date: 06.09.2015 19:41:09; Version: 2.1.6]
[CTDB TOCID: s3Ytf3x_6L3t.odNauMuBZHcwtU-] found.
Track | CTDB Status
  1   | (1044/1048) Accurately ripped
  2   | (1044/1048) Accurately ripped
  3   | (1043/1048) Accurately ripped
  4   | (1045/1048) Accurately ripped
  5   | (798/1048) Accurately ripped, or (223/1048) differs in 6 samples @03:56:27, or (6/1048) differs in 6 samples @03:56:27
  6   | (1044/1048) Accurately ripped
  7   | (1035/1048) Accurately ripped, or (6/1048) differs in 130 samples @02:34:05,02:37:14,02:40:08,02:44:35,02:45:54,04:12:41,04:15:69,05:55:31,05:56:21,05:58:32,06:02:22,06:04:01,06:07:66,06:08:23-06:08:24,06:08:72,06:09:45-06:09:46,06:10:19,06:11:08,06:11:57,06:11:73,06:13:04,06:13:36,06:13:52,06:14:42,06:14:57-06:14:58,06:15:47,06:16:04-06:16:05,06:16:21,06:16:37,06:16:53,06:17:11,06:17:43,06:18:16,06:18:32,06:18:48-06:18:49,06:21:33-06:21:34,06:23:29,06:25:41
  8   | (1042/1048) Accurately ripped
  9   | (1043/1048) Accurately ripped
 10   | (1040/1048) Accurately ripped
 11   | (1041/1048) Accurately ripped
 12   | (1030/1048) Accurately ripped, or (6/1048) differs in 1 samples @01:31:72
 13   | (1021/1048) Accurately ripped
[AccurateRip ID: 001e606b-013a2eac-aa10d30d] found.
Track   [  CRC   |   V2   ] Status
 01     [8315cfaf|ac0b4c26] (200+200/1184) Accurately ripped
 02     [47867f94|f61a68cb] (200+200/1187) Accurately ripped
 03     [5226ca61|d4b65351] (200+200/1183) Accurately ripped
 04     [4719e9bc|494184a0] (200+200/1179) Accurately ripped
 05     [6151788d|9b46414f] (200+200/1511) Accurately ripped

(AR reports anything > 200 as 200.) I can only guess that one of the pressings got some bits wrong. Notice this track has been submitted about 28 percent more times. (Which by itself only proves that it has been ripped more times, but - ruling out people inserting this disc because they only want this track - shows that users find their rips suspicious.)

Different Checksums

Reply #27 – 2015-09-06 19:17:12

Have you used CTDB to "correct" the track and then checked what the differences were?

Different Checksums

Reply #28 – 2015-09-06 19:36:56

Quote from: greynol on 2015-09-06 19:17:12

Have you used CTDB to "correct" the track and then checked what the differences were?

Tried the "repair" right now. The 137-samples-different and the 6-samples-different rip agree on this particular track. Difference to mine:

Differences found: 6 values, starting at 3:56.361837, peak: 0.0031738 at 3:56.361859, 1ch
Detected offset as 0 samples.

What can one infer from this? -25 dB, is that what one could expect to get by interpolating? (Hm ... maybe I should open a wave editor?)

BTW, the other differences to the 137 are about the same.

Track 7:
Differences found: 130 values, starting at 2:34.078639, peak: 0.0232849 at 2:37.196054, 1ch
Detected offset as 0 samples.

Track 12:
Differences found: 1 value, starting at 1:31.961043, peak: 0.0046997 at 1:31.961043, 2ch
Detected offset as 0 samples.

Different Checksums

Reply #29 – 2015-09-06 19:42:17

Yes, with a wave editor. I should have specified earlier, sorry.

Different Checksums

Reply #30 – 2015-09-06 23:16:18

At least for the first sample where they disagree, then visually it looks like my (majority) version is interpolated as simple average while the other is slightly more sinusoidal. Files, 1024 samples each: https://www.sendspace.com/file/hpefg7

1024 because that is what ffmpeg returns; I tried to export the first twelve (per channel) samples with Audacity - then all for sudden fb2k reports sixteen different samples, not six. Is this normal?

Different Checksums

Reply #31 – 2015-09-06 23:41:36

Disable dithering in Audacity.

It would be interesting to look at the spectrograms, but it requires several thousand of samples before and after this place.

Different Checksums

Reply #32 – 2015-09-07 07:48:37

Quote from: lvqcl on 2015-09-06 23:41:36

Disable dithering in Audacity.

*facepalm* thx ...

Quote from: lvqcl on 2015-09-06 23:41:36

It would be interesting to look at the spectrograms, but it requires several thousand of samples before and after this place.

Full second: https://www.sendspace.com/file/p5nod7

Different Checksums

Reply #33 – 2015-09-07 16:49:13

So it's possible to see some arefact in 05TGGITS_000.wav_3_56_to_57.wav, but not in 05TGGITS_137.wav_3_56_to_57.wav.

Spectrograms can be created by SoX:

Code: [Select]

sox infile.wav --null spectrogram -w Kaiser -o outfile.png

Different Checksums

Reply #34 – 2015-09-07 20:23:44

Quote from: lvqcl on 2015-09-07 16:49:13

So it's possible to see some arefact in 05TGGITS_000.wav_3_56_to_57.wav, but not in 05TGGITS_137.wav_3_56_to_57.wav.

Yep, it was that clear. The 000 is my physical CD, and the majority vote in AR/CTDB (double checked now to be sure).
I tried the last difference on the CD, and it was the other way around. Could be that the "good" one was the secondmost in CTDB (its track 5 is bit-identical to the "137").

I found the (same physical) disc re-ripped with CUETools/EAC/dBpoweramp. No sign of any errors or interpolation. So the likely explanation is ... what? An error in the glass master should have triggered C1/C2? So have they burnt it to a CD and sent it off to a pressing plant who ripped in burst mode before making the glass master?

(Mod, should this branch be split off the thread?)

Different Checksums

Reply #35 – 2015-09-07 22:12:48

Quote from: Porcus on 2015-09-07 20:23:44

I tried the last difference on the CD

Well the remaining differences were in one track, so:

* In this one, I see no spike artifacts:

Code: [Select]

Track | CTDB Status
  1   | (1049/1053) Accurately ripped
  2   | (1049/1053) Accurately ripped
  3   | (1048/1053) Accurately ripped
  4   | (1050/1053) Accurately ripped
  5   | (245/1053) Accurately ripped, or (778/1053) differs in 6 samples @03:56:27
  6   | (1049/1053) Accurately ripped
  7   | (1040/1053) Accurately ripped, or (6/1053) differs in 130 samples @02:34:05,02:37:14,02:40:08,02:44:35,02:45:54,04:12:41,04:15:69,05:55:31,05:56:21,05:58:32,06:02:22,
06:04:01,06:07:66,06:08:23-06:08:24,06:08:72,06:09:45-06:09:46,06:10:19,06:11:08,06:11:57,06:11:73,06:13:04,06:13:36,06:13:52,06:14:42,06:14:57-06:14:58,06:15:47,06:16:04-06:16:05,06:16:21,06:16:37,06:16:53,06:17:11,06:17:43,06:18:16,06:18:32,06:18:48-06:18:49,06:21:33-06:21:34,06:23:29,06:25:41
  8   | (1047/1053) Accurately ripped
  9   | (1048/1053) Accurately ripped
 10   | (1045/1053) Accurately ripped
 11   | (1046/1053) Accurately ripped
 12   | (1035/1053) Accurately ripped, or (6/1053) differs in 1 samples @01:31:72
 13   | (1024/1053) Accurately ripped
[AccurateRip ID: 001e606b-013a2eac-aa10d30d] found.
Track   [  CRC   |   V2   ] Status
 01     [8315cfaf|ac0b4c26] (200+200/1184) Accurately ripped
 02     [47867f94|f61a68cb] (200+200/1187) Accurately ripped
 03     [5226ca61|d4b65351] (200+200/1183) Accurately ripped
 04     [4719e9bc|494184a0] (200+200/1179) Accurately ripped
 05     [03bfbf7a|3db47f89] (179+158/1511) Accurately ripped
 06     [5580a89d|49310d1e] (200+200/1171) Accurately ripped
 07     [2e61b880|a5386ae5] (200+200/1172) Accurately ripped
 08     [0f03e4f2|1cd0d62f] (200+200/1181) Accurately ripped
 09     [4e088bd3|58934bac] (200+200/1178) Accurately ripped
 10     [2e348ea5|8c0a3aaf] (200+200/1172) Accurately ripped
 11     [d42de47d|8be60f10] (200+200/1171) Accurately ripped
 12     [31b0304b|f37e78bc] (200+200/1164) Accurately ripped
 13     [d72532e1|9cc8a770] (200+200/1136) Accurately ripped'

[cutting]


Track Peak [ CRC32  ] [W/O NULL] 
 --  100,0 [35879A55] [C6CDF1FA]           
 01   56,4 [BA732B5A] [0465C5C3]           
 02   84,0 [D26B081B] [8750F6F4]           
 03   85,3 [8D8B617D] [89413F20]           
 04   98,3 [83B90991] [8D073175]           
 05   85,7 [5E80E84B] [DEB9DD79]           
 06  100,0 [846CD5B6] [2B2FE479]           
 07   89,0 [52A1D786] [FD5A6455]           
 08   84,8 [1F5FE3F8] [006AE727]           
 09   87,4 [9B1BCFA9] [BB732C66]           
 10   87,8 [44265A02] [6B5B48BB]           
 11   81,4 [9088CB85] [49C1417D]           
 12   90,5 [5F8A20E5] [324EF438]           
 13  100,0 [42FE8B0F] [9C1039E0]

* Mine is the "778" which differs in six samples in track 5. That is the one I posted, where lvqcl points out artifacts.
* The last one agrees with the top one on track 5, but has 131 samples disagreeing elsewhere: 130 in track 7 (visible spikes) and 1 in track 12 (ditto).

Could a CDTB score of 6/1053 be, uh, polluted by downloaders?

Different Checksums

Reply #36 – 2015-09-07 22:35:15

I think Spoon would be able to tell you the make and model of the drive used for each submission in the AR database, which might help to clear things up.

Different Checksums

Reply #37 – 2015-09-08 00:37:59

I do not easily have that info to hand, drives do not make it into the final database.

Different Checksums

Reply #38 – 2015-09-08 00:58:39

Oh well.

So that Porcus knows where I was going with this, I wouldn't rule out the possibility that the data on the CD didn't exploit a weakness/bug with a specific drive, ripping program (possibly configuration related) or combination of both. I've raised issues in the past about the possibility of the AR database becoming populated with downloads that were in error, though that would land dead last on my list of possible reasons.

Different Checksums

Reply #39 – 2015-09-08 10:42:56

Quote from: greynol on 2015-09-08 00:58:39

I wouldn't rule out the possibility that the data on the CD didn't exploit a weakness/bug with a specific drive, ripping program (possibly configuration related) or combination of both.

I agree with the general statement, and one can still not rule it out coumpletely after having tried quite a few combinations ['] though it appears less likely:
Back then I was armed with three drives and dBpoweramp (Reference) with EAC as backup. Everything would be ripped with dBpoweramp from a Sony VGP-XL1B2 carousel with a Matshita drive that dBpoweramp could unfortunately not get C2 information from over the firewire; oddballs would go into the laptop's HL-DT-ST drive (with C2) and the stubborn ones into a PX-230A. At this stage, EAC would also have been invoked.

CUETools was not an option in 2008, therefore I tried it yesterday - on two drives, the above HL-DT-ST and a new one (TSSTCorp). Also I tried dBpoweramp disconnected from the 'net (I might not have thought of back then that this way I could force it into secure mode) and for the hell of it, EAC on the old HL-DT-ST.

There is no sign that any of the applications had any suspicion whatsoever. (I do not know how to get CUETools to report C2 errors, but I sat staring at a burst to see whether it triggered any retry.)

['] Disclaimer: I do not have the logs for more than the initial rip (of the whole CD) as I never managed to produce a difference - but I did notice this curious result, so I must have invoked something like my usual workflow. I could possibly even have tried a fourth drive, which did generally suck and was usually not involved when I tried to get reasonable results out of troublemaking discs. But this time the issue was to provoke an error.

Different Checksums

Reply #40 – 2015-09-08 12:34:35

Sure, but who's to say there would have to be some indication of a problem? I would expect it to happen without any obvious sign of trouble other than the AR numbers. I also wouldn't assume there was any problem with the disc itself.

It could also simply be that there's a different pressing created with errant PCM data (EFM and CIRC are perfectly OK) which has no offset relative to the other version.

Finally, without comparing the two versions and identifying interpolatation, time shifts, bit-flips, dropped half samples and what ever other possible type of error, how is anyone to know which version is correct? Majority rule? I think that's for suckers (pointing to EAC's old bug when over reading is disabled).

Different Checksums

Reply #41 – 2015-09-08 15:45:21

I don't completely follow what you refer to at each statement ... :

Quote from: greynol on 2015-09-08 12:34:35

Sure, but who's to say there would have to be some indication of a problem? I would expect it to happen without any obvious sign of trouble other than the AR numbers.

Not sure what you mean. If you mean that it must be expected from time to time irrespective of whether one has AR figures to spot it then ... well, obviously AR does not define what is on the master, so I suppose you had something else in mind?

Quote from: greynol on 2015-09-08 12:34:35

I also wouldn't assume there was any problem with the disc itself.

Since I get no error information from the rippers? Or, generally? If given only the information that two rips of the same title differ in only a handful of samples, then I would indeed make the first guess that there is a problem with (at least) one of them, so again I suppose you had something else in mind?

Quote from: greynol on 2015-09-08 12:34:35

It could also simply be that there's a different pressing created with errant PCM data (EFM and CIRC are perfectly OK) which has no offset relative to the other version.

That is what I put my two cents on in this case, but then: where does it originate?

Quote from: greynol on 2015-09-08 12:34:35

Finally, without comparing the two versions and identifying interpolatation, time shifts, bit-flips, dropped half samples and what ever other possible type of error, how is anyone to know which version is correct?

Closing in on "if there is only one version, ...?" But in this case we have two.

Quote from: greynol on 2015-09-08 12:34:35

Majority rule?

If the spectrogram spikes are to be read as indication of error (whatever "error" means), then the majority is "wrong" on this particular title.

Different Checksums

Reply #42 – 2015-09-08 16:04:57

Quote from: Porcus on 2015-09-08 15:45:21

Quote from: greynol on 2015-09-08 12:34:35
Sure, but who's to say there would have to be some indication of a problem? I would expect it to happen without any obvious sign of trouble other than the AR numbers.

Not sure what you mean. If you mean that it must be expected from time to time irrespective of whether one has AR figures to spot it then ... well, obviously AR does not define what is on the master, so I suppose you had something else in mind?

You mentioned ripping multiple ways in hopes of getting some kind of alert from the ripping program about a problem. I'm telling you it might be that two different people with the same supposed pressing are getting consistently different results on their respective systems without any such warnings.

Quote from: Porcus on 2015-09-08 15:45:21

Quote from: greynol on 2015-09-08 12:34:35
I also wouldn't assume there was any problem with the disc itself.

Since I get no error information from the rippers? Or, generally? If given only the information that two rips of the same title differ in only a handful of samples, then I would indeed make the first guess that there is a problem with (at least) one of them, so again I suppose you had something else in mind?

I'm simply trying to show you that there are many other scenarios that you haven't yet entertained.

Quote from: Porcus on 2015-09-08 15:45:21

Quote from: greynol on 2015-09-08 12:34:35
It could also simply be that there's a different pressing created with errant PCM data (EFM and CIRC are perfectly OK) which has no offset relative to the other version.

That is what I put my two cents on in this case, but then: where does it originate?

With all the different possibilities I dished out (which can't possibly be an exhaustive list), why would you think I know the answer?

Quote from: Porcus on 2015-09-08 15:45:21

If the spectrogram spikes are to be read as indication of error (whatever "error" means), then the majority is "wrong" on this particular title.

Sure, but don't always expect to see spikes in a spectrogram.

Different Checksums

Reply #43 – 2015-09-08 18:43:14

I have a few more questions after reading the last several posts, all of which were fascinating!

1) Will different pressings of the same CD generate the same AccurateRip checksum? If so, but one of the pressings contains a manufacturing defect within a specific track, will there be a second entry for that track within the AccurateRip database in order to account for the manufacturing defect?

3) While working with cdparanoia in the past I have noticed that some troublesome tracks (perhaps like "The Great Gig In The Sky") generate "jitter" errors and displays "-" signs within the cdparanoia progress bar. However, ripping the troublesome track using a different range (for example ripping the track separately with "cdparanoia --batch 5" instead of "cdparanoia --batch 1-13" or "cdparanoia --batch") eliminates the "jiter" errors and "-"signs within the progress bar and seems to produce a "pristine" file. Supposedly cdparanoia corrects jitter errors (not to mention streaming errors represented by "+" signs within the progress bar), but the corrected file always generates a different file checksum (using sha1sum) than the pristine file. Would the AccurateRip checksums of these files match? Does morituri account for cdparanoia "jitter" errors such as this?

2) Will directly editing FLAC files within Audacity maintain audio quality? Should I disable dithering as well? Typically I only fire up Audacity in order to fade in/out live tracks.

Thanks again for your help!

Different Checksums

Reply #44 – 2015-09-08 18:57:55

Quote from: korndawg on 2015-09-08 18:43:14

2) Will directly editing FLAC files within Audacity maintain audio quality? Should I disable dithering as well? Typically I only fire up Audacity in order to fade in/out live tracks.

You are not directly editing the FLAC file. Audacity first extracts the PCM data and operates on that. The final audio quality is a function of what you have Audacity do to the data.

If most of the track is not changed then I would disable dithering to avoid noise being added unnecessarily.

Different Checksums

Reply #45 – 2015-09-08 19:09:17

Quote from: korndawg on 2015-09-08 18:43:14

I have a few more questions after reading the last several posts, all of which were fascinating!

I suggest you read some posts on the same subject in the CD Hardware/Software forum, as little here hasn't been discussed before.

Quote from: korndawg on 2015-09-08 18:43:14

Will different pressings of the same CD generate the same AccurateRip checksum?

Generally speaking, no. Different pressings usually differ by at least an offset. Different pressings can also have slightly different track timings, though AccurateRip will treat these as entirely different discs.

Quote from: korndawg on 2015-09-08 18:43:14

If so, but one of the pressings contains a manufacturing defect within a specific track, will there be a second entry for that track within the AccurateRip database in order to account for the manufacturing defect?

When multiple checksums for a track are available then each one will have been submitted by more than one unique user ID. If a manufacturing defect results in different but consistent checksums depending on the drive and/or ripping algorithm then it is likely that multiple checksums will appear in the database, provided the title is popular enough.

Quote from: korndawg on 2015-09-08 18:43:14

Supposedly cdparanoia corrects jitter errors (not to mention streaming errors represented by "+" signs within the progress bar), but the corrected file always generates a different file checksum (using sha1sum) than the pristine file. Would the AccurateRip checksums of these files match?

It should be assumed that tracks with different sha1 checksums will also have different AccurateRip checksums. AR ignores some data at the beginning of the first track on the disc and at the end of the last track on the disc so that these tracks can still be checked with drives that have different offsets which cannot overread; otherwise they might generate different checksums.

Quote from: korndawg on 2015-09-08 18:43:14

Will directly editing FLAC files within Audacity maintain audio quality? Should I disable dithering as well? Typically I only fire up Audacity in order to fade in/out live tracks.

Please start a new thread for these questions as they are no longer on-topic.

Different Checksums

Reply #46 – 2015-09-08 19:13:30

Quote from: pdq on 2015-09-08 18:57:55

Quote from: korndawg on 2015-09-08 18:43:14
2) Will directly editing FLAC files within Audacity maintain audio quality? Should I disable dithering as well? Typically I only fire up Audacity in order to fade in/out live tracks.

You are not directly editing the FLAC file. Audacity first extracts the PCM data and operates on that. The final audio quality is a function of what you have Audacity do to the data.

If most of the track is not changed then I would disable dithering to avoid noise being added unnecessarily.

Yeah and without performing any operations, Audacity will also change the entire contents of a file if you load it and then save it when using the default settings.

Again, this really needs its own topic (not that it hasn't also been discussed on multiple occasions).