AR confidence and one person's opinion of 'best practices'

Topic: AR confidence and one person's opinion of 'best practices' (Read 14571 times) previous topic - next topic

0 Members and 1 Guest are viewing this topic.

AR confidence and one person's opinion of 'best practices'

Reply #25 – 2010-01-30 17:57:54

Quote from: _m²_ on 2010-01-30 17:52:48

I claim that you can't give any serious warranties on AR being secure. (God related ones are not serious for me.)
You seemed to be wanting to contradict it and I don't see how can you do it w/out actually giving them and proving seriousness.

Despite the claim of being a mathematician you have repeatedly shown a lack of understanding regarding what the "CRC" implementation problem was/is, its impact on the validity of the AR system, and the probabilities of the weaknesses impacting the verification of extracted audio.
This whole subthread started with an unsubstantiated claim on your part as to the worthlessness of AR. I challenge you to defend that statement. Do not try and turn the onus around upon me. For I am no more in the business of disproving every wacky claim which comes down the pike than I am in the business of disproving the existence of God.

AR confidence and one person's opinion of 'best practices'

Reply #26 – 2010-01-30 18:00:25

After some PM imparted confusion... I think it is clear who's the soap and who's the douche.

AR confidence and one person's opinion of 'best practices'

Reply #27 – 2010-01-30 18:28:18

Quote from: _m²_ on 2010-01-30 17:22:12

What's left after correction? Can it be 1 (possibly approximated) error?

It doesn't work that way. If number of errors is small, they are 100% corrected. If it's larger, but not very large, they are detected (C2). If there's really a lot of errors, the whole error correction system breaks down and returns random results.

AR confidence and one person's opinion of 'best practices'

Reply #28 – 2010-01-30 19:24:07

Quote from: Soap on 2010-01-30 17:57:54

a lack of understanding regarding what the "CRC" implementation problem was/is

Do you mean the fact that I didn't know implementation details?
I never said to know them and made statements that are correct regardless of them.

Quote from: Soap on 2010-01-30 17:57:54

its impact on the validity of the AR system

You refused to even try to prove I'm wrong, so why do you claim it now?
If you understand something, you can prove it. If you can't you don't. You don't understand the data coming to your program, yet you make assumptions about it.
I don't say that I understand it better than you do, I certainly don't. It's just that you don't understand it well enough to make your claims reliable.

Quote from: Soap on 2010-01-30 17:57:54

and the probabilities of the weaknesses impacting the verification of extracted audio.

You didn't challenge any calculation yet.

Quote from: Soap on 2010-01-30 17:57:54

This whole subthread started with an unsubstantiated claim on your part as to the worthlessness of AR.

Minor correction: not worthlessness but unreliability. There are ways of using a set of unreliable things to get a reliable one.

Quote from: Soap on 2010-01-30 17:57:54

I challenge you to defend that statement. Do not try and turn the onus around upon me. For I am no more in the business of disproving every wacky claim which comes down the pike than I am in the business of disproving the existence of God.

I never expected you to seriously challenge my statement. I'm not saying about the asking for proves, but even the main point: You can't warrant reliability.

For example, if you had a good reliability assurance plan and showed execution logs (incl. testing logs), it could be good enough and wouldn't take you much time. But you didn't and I don't believe you even have them. I don't believe you treated reliability seriously during the development process. Why?

-Because your testing didn't find such a serious issue for long. Actually the post suggest that sb. else found it.
-Because you didn't fix the problem, though it's known for almost 2 years already.
-You don't seem even know how serious it is. I made some approximation and you didn't come up with your own, just pointed something that might or might not be a flaw in mine. It makes me think you didn't analyse it yourself. For almost 2 years. Again, I don't claim my analysis is perfect, just that you didn't even analyse severity of a known reliability problem.

Seriously, I haven't seen anything positive about your take on reliability yet. It might be time for you, come and impress.
Somehow I don't think you will...

Quote from: Gregory S. Chudov on 2010-01-30 18:28:18

Quote from: _m²_ on 2010-01-30 17:22:12
What's left after correction? Can it be 1 (possibly approximated) error?

It doesn't work that way. If number of errors is small, they are 100% corrected. If it's larger, but not very large, they are detected (C2). If there's really a lot of errors, the whole error correction system breaks down and returns random results.

Your answer would be much better with some sources. But I take it, it doesn't cover all cases, but the remaining ones (like that too many errors can go undetected) seem insignificant.

AR confidence and one person's opinion of 'best practices'

Reply #29 – 2010-01-30 20:03:50

Quote from: _m²_ on 2010-01-30 19:24:07

-Because your testing didn't find such a serious issue for long. Actually the post suggest that sb. else found it.
-Because you didn't fix the problem, though it's known for almost 2 years already.
-You don't seem even know how serious it is. I made some approximation and you didn't come up with your own, just pointed something that might or might not be a flaw in mine. It makes me think you didn't analyse it yourself. For almost 2 years. Again, I don't claim my analysis is perfect, just that you didn't even analyse severity of a known reliability problem.

I'm soap, not spoon. Your reading comprehension is lacking.

If this whole thing had started out with a question rather than the bold statement maybe I wouldn't have been such a dick about it.
"What are the implications regarding AR's less-than-ideal 97% coverage in its CRC?" would have fostered so much more interesting discussion than the overly simplistic "Hmm...I'm hugely disappointed with AccurateRip now. No matter what confidence level, it can't warrant more than 97% correctness."

AR confidence and one person's opinion of 'best practices'

Reply #30 – 2010-01-30 20:15:54

Quote from: Soap on 2010-01-30 20:03:50

Quote from: _m²_ on 2010-01-30 19:24:07
-Because your testing didn't find such a serious issue for long. Actually the post suggest that sb. else found it.
-Because you didn't fix the problem, though it's known for almost 2 years already.
-You don't seem even know how serious it is. I made some approximation and you didn't come up with your own, just pointed something that might or might not be a flaw in mine. It makes me think you didn't analyse it yourself. For almost 2 years. Again, I don't claim my analysis is perfect, just that you didn't even analyse severity of a known reliability problem.

I'm soap, not spoon. Your reading comprehension is lacking.

Indeed, you certainly talk like an author and managed to confuse me. And it seems that not only me.
So large part of my previous post is irrelevant. But the main things are still valid: I see some very bad things about reliability management in AR and in fact I can't see anything good about it. Actually it looks to me that it's not even a goal.

Anyway, it's a huge OT, could some mod split it out please.

AR confidence and one person's opinion of 'best practices'

Reply #31 – 2010-01-30 20:22:57

Quote from: _m²_ on 2010-01-30 20:15:54

Indeed, you certainly talk like an author and managed to confuse me. And it seems that not only me.

I see no other confusion regarding who is who in the thread. rpp3po unintentionally confused me.

AR confidence and one person's opinion of 'best practices'

Reply #32 – 2010-01-30 20:29:55

Quote from: Soap on 2010-01-30 20:22:57

Quote from: _m²_ on 2010-01-30 20:15:54
Indeed, you certainly talk like an author and managed to confuse me. And it seems that not only me.

I see no other confusion regarding who is who in the thread. rpp3po unintentionally confused me.

OK, it looked like it. I stand corrected about it.

AR confidence and one person's opinion of 'best practices'

Reply #33 – 2010-01-30 20:32:05

This was ... boring. For me anyone claiming that AR is unreliable without any evidence should be warned, just like newbies which claim that codec X is better than codec Y without ABXing. The possible flaws of the actual AR implementation are known & there is nothing new here.

Sorry, _m²_ but Soap has nothing to prove.

PS: ... and if you're really a mathematician, then I have only one advice for you, you'd better use calc.exe

AR confidence and one person's opinion of 'best practices'

Reply #34 – 2010-01-30 20:44:39

Quote from: sauvage78 on 2010-01-30 20:32:05

This was ... boring. For me anyone claiming that AR is unreliable without any evidence should be warned, just like newbies which claim that codec X is better than codec Y without ABXing. The possible flaws of the actual AR implementation are known & there is nothing new here.

You included a brief of the evidence already.

Quote from: sauvage78 on 2010-01-30 20:32:05

Sorry, _m²_ but Soap has nothing to prove.

Indeed, he doesn't though I don't feel sorry about it.

Quote from: sauvage78 on 2010-01-30 20:32:05

PS: ... and if you're really a mathematician, then I have only one advice for you, you'd better use calc.exe

I don't know what do you mean by these words, but it looks like some personal excursion. Strange...it seems that you talked about warnings soon before.

AR confidence and one person's opinion of 'best practices'

Reply #35 – 2010-01-30 20:51:58

_m²_

Quote

I don't know what do you mean by these words

... it means that I wouldn't give you the nobel of math I trust my calculator better !

AR confidence and one person's opinion of 'best practices'

Reply #36 – 2010-01-30 20:55:17

Quote from: sauvage78 on 2010-01-30 20:51:58

_m²_
Quote
I don't know what do you mean by these words

... it means that I wouldn't give you the nobel of math I trust my calculator better !

Nobody would. Not the least because there's no Nobel prize for mathematicians.

AR confidence and one person's opinion of 'best practices'

Reply #37 – 2010-01-30 20:56:11

I know, that's why I said that At last I am sure you know it, it's a good start for a mathematician But I wouldn't give you the Abel prize either anyway ...

AR confidence and one person's opinion of 'best practices'

Reply #38 – 2010-01-30 22:05:33

Quote from: Gregory S. Chudov on 2010-01-30 18:28:18

If there's really a lot of errors, the whole error correction system breaks down and returns random results.

Oh really? The firmware calls on a random number generator to fill missing data? Nonsense/utter rubbish!

Anyway, it would appear that I helped instigate this shitstorm. I suppose a thread split is in order just to make this part of the discussion more visible; I don't thin it's harmed the original topic which is essentially moot: lossless is lossless.

I'll see what I can do, but I don't really have much time at the moment beyond reading the replies.

In the meantime, this should help pacify some of the alarmism:

Quote from: Gregory S. Chudov on 2010-01-30 16:43:40

Second, if your rip matches several pressings with different offsets, the probability of non-detected error decreases with each such pressing, so even if it was 3%, it would be 0.09% with two matching pressings.

...though I'm not 100% positive that this part is always necessarily true:

Quote from: Gregory S. Chudov on 2010-01-30 16:43:40

If more bits were read incorrectly, the resulting sector will have many invalid non consequent bits, which would definitely affect AccurateRip CRC.

It seems to imply that it is impossible for a rip to have only one errant sample. I have my doubts about this.

Even if it is true, it doesn't address consistent errors that arise from buggy firmware/software/pressing defects, though I wouldn't begin to guess at how many entries are at stake here but I have a feeling the number is pretty damn low.

AR confidence and one person's opinion of 'best practices'

Reply #39 – 2010-01-30 22:35:02

When the amount of errors is beyond the capacity of error correction code, if the error wasn't detected, result will be pseudo-random, because error correction will still try to do it's job, not knowing that it's pointless in this case. And this 'correction' will result in original sequence (containing errors) "corrected" at all the wrong places. So yes, it is practically impossible for a rip to have only one errant sample.

UPD: If i understand the specification right, the typical length of error burst on output should be a multiple of 28 bytes, in most cases - 28*28 bytes.

AR confidence and one person's opinion of 'best practices'

Reply #40 – 2010-01-30 22:35:37

Quote from: hellokeith on 2010-01-30 07:32:28

Quote from: greynol on 2010-01-30 00:25:56
So 5 is just a warm and fuzzy feeling for you and is not based on any technical understanding of AccurateRip.

Now slow down there Tex. I have a basic understanding of how AccurateRip works, otherwise I wouldn't go to the trouble of using it as a basis for some rather time-intensive rip and re-rip and re-rip again sessions on a handful of favorite CD's which are scratched to high hell because I played them a thousand times over the years.

I didn't say that you don't have a basic understanding of how AccurateRip works, I'm saying that your comment about a confidence of 5 is not based on any technical understanding of AccurateRip. Face the facts, it isn't.

Quote

Has Spoon since recanted this?

Not to my knowledge

Quote

Has this number dropped substantially?

The hash algorithm hasn't changed.

Quote

Have we thrown out the Test & Copy procedure crc's?

These were shown not to be as reliable as once thought for a long time now. T&C with one drive and one mode is not necessary if you have an AR match. T&C with two significantly different drives and possibly utilizing different modes may reveal a problem that might otherwise be hidden by AR, however. Search for posts authored by Jean Tourrilhes if you're interested in more details.

AR confidence and one person's opinion of 'best practices'

Reply #41 – 2010-01-30 22:40:53

Quote from: Gregory S. Chudov on 2010-01-30 22:35:02

When the amount of errors is beyond the capacity of error correction code, if the error wasn't detected, result will be pseudo-random, because error correction will still try to do it's job, not knowing that it's pointless in this case. And this 'correction' will result in original sequence (containing errors) "corrected" at all the wrong places. So yes, it is practically impossible for a rip to have only one errant sample.

Yet it is not uncommon for the same drive to give consistent errors, so I really wouldn't push the point about the data being random. ~~As this relates to the problem with AR hash calculations, people should feel far more at ease.~~

EDIT: Regarding this idea that it is practically impossible, I recently stumbled across a rip that had just one sample in just one channel in error. The CRC for the audio data of the track matched that in the log, indicating that there was no post-ripping corruption.

AR confidence and one person's opinion of 'best practices'

Reply #42 – 2010-01-30 23:07:17

Consistent errors are likely to be caused by tracking errors, when laser jumps track and starts to read from the wrong place, which is in fact the most common source of ripping errors. This definitely cannot produce isolated erroneous samples.

AR confidence and one person's opinion of 'best practices'

Reply #43 – 2010-01-31 00:25:16

My comment about consistent errors was separate from my concern that an error might only affect a single sample.

Concerning the cause of consistent errors, I am pretty sure I've seen errors interpolated the same way twice which was clearly not the result of a tracking error. To reiterate my previous point, I don't recall the interpolated data being isolated to a single sample.

AR confidence and one person's opinion of 'best practices'

Reply #44 – 2010-01-31 00:28:05

Thanx for the topic split, Greynol. I assume full responsibility for the trainwreck of introducing EAC / dBpoweramp / AccurateRip to the original discussion when it didn't really need treatment of those subjects. Lossless is lossless; source discussions go elsewhere.

Quote from: greynol on 2010-01-30 22:35:37

I didn't say that you don't have a basic understanding of how AccurateRip works, I'm saying that your comment about a confidence of 5 is not based on any technical understanding of AccurateRip. Face the facts, it isn't.

I feel like we may be mincing words here, or I'm missing something subtle that you are saying.

Is it mathematically possible - for 2 different people, with 2 different PC's, with 2 unique instances of the same CD album, in different parts of the world, at different times, with 2 different optical drives - to accidentally / randomly / coincidentally come up with the same T&C track values and the same AR confidence levels? Yes, it is possible. I am not a mathematician, but I recognize with all those variables the low probability of this occurrence. Assuming the T&C values (however flawed the basis of these might be) matched at the time of AR submission, is 5 a technically sufficient confidence level number? Yes. Why? Because taking into account all the previously mentioned variables, and assuming those 2 people re-ripped, with the same coincidental values and managed to re-submit to AR, there is yet a 3rd person (5th AR submission) to compare. At 5 matches, we are well into the astronomical level of probabilities (or gross negligence in the programming of AR, or conspiracy theories, or voodoo curses).

AR confidence and one person's opinion of 'best practices'

Reply #45 – 2010-01-31 00:28:06

Quote from: Soap on 2010-01-30 17:18:08

Implementation failure is only on the right channel. Please describe a read error style which does not impact both channels.

If you're talking about every 65,536th sample being ignored in the AR hash, this goes for both left and right channels, since they share the same address and the address is the reason they they are nulled.

AR confidence and one person's opinion of 'best practices'

Reply #46 – 2010-01-31 00:38:36

Quote from: hellokeith on 2010-01-31 00:28:05

Assuming the T&C values (however flawed the basis of these might be) matched at the time of AR submission, is 5 a technically sufficient confidence level number? Yes.

Why not 4? Why not 6? As I said earlier, ignoring the concerns about imperfection of the algorithm and consistent errors resulting from software/firmware/manufacturing defect, all you need is a confidence of 1 provided it was not your submission to rule out the possibility of a consistent error resulting from the combination of your physical disc and your particular drive.

Quote from: hellokeith on 2010-01-31 00:28:05

Because taking into account all the previously mentioned variables, and assuming those 2 people re-ripped, with the same coincidental values and managed to re-submit to AR, there is yet a 3rd person (5th AR submission) to compare.

This is a lack of understanding on your part. The third person submitting agreeing results results in a confidence of 3, not 5! EAC hashes (T&C) have nothing to do with AR hashes. As far as the rip generating the hash, it is only the C part of T&C that is submitted to the AR database. Furthermore, AR has a mechanism in place to prevent people from submitting results for any particular track on any particular disc more than once.

AR confidence and one person's opinion of 'best practices'

Reply #47 – 2010-01-31 01:23:51

Quote from: greynol on 2010-01-31 00:38:36

This is a lack of understanding on your part.

No, I believe you are not carefully reading what I am writing.

Quote

The third person submitting agreeing results results in a confidence of 3, not 5!

AR has a mechanism in place to prevent people from submitting results for any particular track on any particular disc more than once.

No, read what I wrote. and managed to re-submit to AR. Just as it is possible (though not probable) for coincidental values, it is possible to resubmit: same CD, desktop at home and laptop at work, for example. Again, not probable that an error would slip through all four of these unique submissions, but adding a 3rd person making a 5th unique submission which matches means that regardless of the scratches / errors occurred in the first four submissions, all five rips are good identical rips, as opposed to five different rips with coincidental matching values.

Quote

EAC hashes (T&C) have nothing to do with AR hashes. As far as the rip generating the hash, it is only the C part of T&C that is submitted to the AR database.

Yes I am aware that T&C is an EAC mechanism. Nevertheless, it is yet another set of variables introduced to reduce the likelihood of coincidental matches. If I get unmatching T&C, then I immediately change modes and re-rip, repeating this process until I do get matching T&C's. It is a local "sanity check", and as you said, does have some value.

AR confidence and one person's opinion of 'best practices'

Reply #48 – 2010-01-31 19:33:41

I have to pose the obvious questions to you gain, hellokeith, why not 4? Why not 6? Why can't someone submit a result to the database three times instead of just twice?

Please don't evade this time.

I agree that we have a problem when people re-submit from different computers with the same disc and most likely also needing the same make (if not model) of drive. I already addressed this point. If the submission isn't yours then you need not worry except for the exceptions I mentioned earlier which do not lend themselves to the ludicrous conclusion that X confidence means bad and X+1 means good.

AR confidence and one person's opinion of 'best practices'

Reply #49 – 2010-02-01 07:22:53

Quote

Please don't evade this time.

You are a hard fellow to please! I thought I had provided sufficient logic for my reasoning, but apparently not.

Quote

Why can't someone submit a result to the database three times instead of just twice?

Indeed they can. More to follow..

Quote

I agree that we have a problem when people re-submit from different computers with the same disc..

If the submission isn't yours then you need not worry..

X confidence means bad and X+1 means good..

The issue of re-submission is valid; you and I both agree. Does that make a confidence level of 1 invalid? No. Yet that first submission could have been me 6 months ago before my hard drive crashed, and now I am re-submitting from essentially the same computer with the same scratched up disc. How about 2 or 3 or 4? Well we have both agreed that the same person ripping the same disc on multiple PC's is not necessarily a rare occurrence. So why is 5 a magic number? For me (which by the way I explicitly caveated "for me" in the OP) the combination of someone re-ripping the disc multiple times PLUS the rare but real-world possibility of just coincidence (or bad luck), 5 confidence level means that even if 2 people managed to resubmit their disc each two times and get coincidental values, there is a 3rd person (might be me on my current submission).

So here we have it. AR does not provide any indication of the source of submissions. Should it? Ultimately that is a question only the maintainers of AR will answer, so my speculation is really quite irrelevant. But if it did..

On 2006 June 28 at 14:47:19 this album was submitted from IP 213.49.53.xxx from an Optiarc drive with offset -7.

All I have to do is lookup that class C IP block for a general geolocation (city level) and look at the drive type, and I know immediately if it was me or a friend or a complete stranger in Botswana. See 2 entries with all different field values, and you would know immediately that this is no coincidence.

Until then, I will stick with 5. You probably didn't want to hear this, but I actually feel even better with a threshold of 5 after reading the thread you linked on AR CRC problems.

Notice