Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: AR confidence and one person's opinion of 'best practices' (Read 14571 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

AR confidence and one person's opinion of 'best practices'

Reply #25
I claim that you can't give any serious warranties on AR being secure. (God related ones are not serious for me.)
You seemed to be wanting to contradict it and I don't see how can you do it w/out actually giving them and proving seriousness.

Despite the claim of being a mathematician you have repeatedly shown a lack of understanding regarding what the "CRC" implementation problem was/is, its impact on the validity of the AR system, and the probabilities of the weaknesses impacting the verification of extracted audio. 
This whole subthread started with an unsubstantiated claim on your part as to the worthlessness of AR.  I challenge you to defend that statement.  Do not try and turn the onus around upon me.  For I am no more in the business of disproving every wacky claim which comes down the pike than I am in the business of disproving the existence of God.
Creature of habit.

AR confidence and one person's opinion of 'best practices'

Reply #26
After some PM imparted confusion... I think it is clear who's the soap and who's the douche.

AR confidence and one person's opinion of 'best practices'

Reply #27
What's left after correction? Can it be 1 (possibly approximated) error?

It doesn't work that way. If number of errors is small, they are 100% corrected. If it's larger, but not very large, they are detected (C2). If there's really a lot of errors, the whole error correction system breaks down and returns random results.
CUETools 2.1.6

AR confidence and one person's opinion of 'best practices'

Reply #28
a lack of understanding regarding what the "CRC" implementation problem was/is

Do you mean the fact that I didn't know implementation details?
I never said to know them and made statements that are correct regardless of them.

its impact on the validity of the AR system

You refused to even try to prove I'm wrong, so why do you claim it now?
If you understand something, you can prove it. If you can't you don't. You don't understand the data coming to your program, yet you make assumptions about it.
I don't say that I understand it better than you do, I certainly don't. It's just that you don't understand it well enough to make your claims reliable.

and the probabilities of the weaknesses impacting the verification of extracted audio.

You didn't challenge any calculation yet.

This whole subthread started with an unsubstantiated claim on your part as to the worthlessness of AR.

Minor correction: not worthlessness but unreliability. There are ways of using a set of unreliable things to get a reliable one.

I challenge you to defend that statement.  Do not try and turn the onus around upon me.  For I am no more in the business of disproving every wacky claim which comes down the pike than I am in the business of disproving the existence of God.

I never expected you to seriously challenge my statement. I'm not saying about the asking for proves, but even the main point: You can't warrant reliability.

For example, if you had a good reliability assurance plan and showed execution logs (incl. testing logs), it could be good enough and wouldn't take you much time. But you didn't and I don't believe you even have them. I don't believe you treated reliability seriously during the development process. Why?

-Because your testing didn't find such a serious issue for long. Actually the post suggest that sb. else found it.
-Because you didn't fix the problem, though it's known for almost 2 years already.
-You don't seem even know how serious it is. I made some  approximation and you didn't come up with your own, just pointed  something that might or might not be a flaw in mine. It makes me think  you didn't analyse it yourself. For almost 2 years. Again, I don't  claim my analysis is perfect, just that you didn't even analyse severity of a known reliability problem.

Seriously, I haven't seen anything positive about your take on reliability yet. It might be time for you, come and impress.
Somehow I don't think you will...

What's left after  correction? Can it be 1 (possibly approximated) error?

It  doesn't work that way. If number of errors is small, they are 100%  corrected. If it's larger, but not very large, they are detected (C2).  If there's really a lot of errors, the whole error correction system  breaks down and returns random results.


Your answer  would be much better with some sources. But I take it, it doesn't cover  all cases, but the remaining ones (like that too many errors can go  undetected) seem insignificant.

AR confidence and one person's opinion of 'best practices'

Reply #29
-Because your testing didn't find such a serious issue for long. Actually the post suggest that sb. else found it.
-Because you didn't fix the problem, though it's known for almost 2 years already.
-You don't seem even know how serious it is. I made some  approximation and you didn't come up with your own, just pointed  something that might or might not be a flaw in mine. It makes me think  you didn't analyse it yourself. For almost 2 years. Again, I don't  claim my analysis is perfect, just that you didn't even analyse severity of a known reliability problem.

I'm soap, not spoon.  Your reading comprehension is lacking.

If this whole thing had started out with a question rather than the bold statement maybe I wouldn't have been such a dick about it. 
"What are the implications regarding AR's less-than-ideal 97% coverage in its CRC?"  would have fostered so much more interesting discussion than the overly simplistic "Hmm...I'm hugely disappointed with AccurateRip now. No matter what confidence level, it can't warrant more than 97% correctness."
Creature of habit.

AR confidence and one person's opinion of 'best practices'

Reply #30
-Because your testing didn't find such a serious issue for long. Actually the post suggest that sb. else found it.
-Because you didn't fix the problem, though it's known for almost 2 years already.
-You don't seem even know how serious it is. I made some  approximation and you didn't come up with your own, just pointed  something that might or might not be a flaw in mine. It makes me think  you didn't analyse it yourself. For almost 2 years. Again, I don't  claim my analysis is perfect, just that you didn't even analyse severity of a known reliability problem.

I'm soap, not spoon.  Your reading comprehension is lacking. 



Indeed, you certainly talk like an author and managed to confuse me. And it seems that not only me.
So large part of my previous post is irrelevant. But the main things are still valid: I see some very bad things about reliability management in AR and in fact I can't see anything good about it. Actually it looks to me that it's not even a goal.

Anyway, it's a huge OT, could some mod split it out please.

AR confidence and one person's opinion of 'best practices'

Reply #31
Indeed, you certainly talk like an author and managed to confuse me. And it seems that not only me.

I see no other confusion regarding who is who in the thread.  rpp3po unintentionally confused me.
Creature of habit.

AR confidence and one person's opinion of 'best practices'

Reply #32
Indeed, you certainly talk like an author and managed to confuse me. And it seems that not only me.

I see no other confusion regarding who is who in the thread.  rpp3po unintentionally confused me.


OK, it looked like it. I stand corrected about it.

AR confidence and one person's opinion of 'best practices'

Reply #33
This was ... boring. For me anyone claiming that AR is unreliable without any evidence should be warned, just like newbies which claim that codec X is better than codec Y without ABXing. The possible flaws of the actual AR implementation are known & there is nothing new here.

Sorry, _m²_  but Soap has nothing to prove.

PS: ... and if you're really a mathematician, then I have only one advice for you, you'd better use calc.exe

AR confidence and one person's opinion of 'best practices'

Reply #34
This was ... boring. For me anyone claiming that AR is unreliable without any evidence should be warned, just like newbies which claim that codec X is better than codec Y without ABXing. The possible flaws of the actual AR implementation are known & there is nothing new here.

You included a brief of the evidence already.

Sorry, _m²_  but Soap has nothing to prove.

Indeed, he doesn't though I don't feel sorry about it.

PS: ... and if you're really a mathematician, then I have only one advice for you, you'd better use calc.exe

I don't know what do you mean by these words, but it looks like some personal excursion. Strange...it seems that you talked about warnings soon before.

AR confidence and one person's opinion of 'best practices'

Reply #35
_m²_
Quote
I don't know what do you mean by these words

... it means that I wouldn't give you the nobel of math  I trust my calculator better !

AR confidence and one person's opinion of 'best practices'

Reply #36
_m²_
Quote
I don't know what do you mean by these words

... it means that I wouldn't give you the nobel of math  I trust my calculator better !


Nobody would. Not the least because there's no Nobel prize for mathematicians.

AR confidence and one person's opinion of 'best practices'

Reply #37
I know, that's why I said that  At last I am sure you know it, it's a good start for a mathematician  But I wouldn't give you the Abel prize either anyway ... 

AR confidence and one person's opinion of 'best practices'

Reply #38
If there's really a lot of errors, the whole error correction system breaks down and returns random results.

Oh really?  The firmware calls on a random number generator to fill missing data?  Nonsense/utter rubbish!

Anyway, it would appear that I helped instigate this shitstorm.  I suppose a thread split is in order just to make this part of the discussion more visible; I don't thin it's harmed the original topic which is essentially moot: lossless is lossless.

I'll see what I can do, but I don't really have much time at the moment beyond reading the replies.

In the meantime, this should help pacify some of the alarmism:
Second, if your rip matches several pressings with different offsets, the probability of non-detected error decreases with each such pressing, so even if it was 3%, it would be 0.09% with two matching pressings.

...though I'm not 100% positive that this part is always necessarily true:
If more bits were read incorrectly, the resulting sector will have many invalid non consequent bits, which would definitely affect AccurateRip CRC.
It seems to imply that it is impossible for a rip to have only one errant sample.  I have my doubts about this.

Even if it is true, it doesn't address consistent errors that arise from buggy firmware/software/pressing defects, though I wouldn't begin to guess at how many entries are at stake here but I have a feeling the number is pretty damn low.

AR confidence and one person's opinion of 'best practices'

Reply #39
When the amount of errors is beyond the capacity of error correction code, if the error wasn't detected, result will be pseudo-random, because error correction will still try to do it's job, not knowing that it's pointless in this case. And this 'correction' will result in original sequence (containing errors) "corrected" at all the wrong places. So yes, it is practically impossible for a rip to have only one errant sample.

UPD: If i understand the specification right, the typical length of error burst on output should be a multiple of 28 bytes, in most cases - 28*28 bytes.
CUETools 2.1.6

AR confidence and one person's opinion of 'best practices'

Reply #40
So 5 is just a warm and fuzzy feeling for you and is not based on any technical understanding of AccurateRip.


Now slow down there Tex.  I have a basic understanding of how AccurateRip works, otherwise I wouldn't go to the trouble of using it as a basis for some rather time-intensive rip and re-rip and re-rip again sessions on a handful of favorite CD's which are scratched to high hell because I played them a thousand times over the years.
I didn't say that you don't have a basic understanding of how AccurateRip works, I'm saying that your comment about a confidence of 5 is not based on any technical understanding of AccurateRip.  Face the facts, it isn't.

Quote
Has Spoon since recanted this?
Not to my knowledge

Quote
Has this number dropped substantially?
The hash algorithm hasn't changed.

Quote
Have we thrown out the Test & Copy procedure crc's?
These were shown not to be as reliable as once thought for a long time now.  T&C with one drive and one mode is not necessary if you have an AR match.  T&C with two significantly different drives and possibly utilizing different modes may reveal a problem that might otherwise be hidden by AR, however.  Search for posts authored by Jean Tourrilhes if you're interested in more details.

AR confidence and one person's opinion of 'best practices'

Reply #41
When the amount of errors is beyond the capacity of error correction code, if the error wasn't detected, result will be pseudo-random, because error correction will still try to do it's job, not knowing that it's pointless in this case. And this 'correction' will result in original sequence (containing errors) "corrected" at all the wrong places. So yes, it is practically impossible for a rip to have only one errant sample.
Yet it is not uncommon for the same drive to give consistent errors, so I really wouldn't push the point about the data being random.  As this relates to the problem with AR hash calculations, people should feel far more at ease.

EDIT: Regarding this idea that it is practically impossible, I recently stumbled across a rip that had just one sample in just one channel in error.  The CRC for the audio data of the track matched that in the log, indicating that there was no post-ripping corruption.

AR confidence and one person's opinion of 'best practices'

Reply #42
Consistent errors are likely to be caused by tracking errors, when laser jumps track and starts to read from the wrong place, which is in fact the most common source of ripping errors. This definitely cannot produce isolated erroneous samples.
CUETools 2.1.6

AR confidence and one person's opinion of 'best practices'

Reply #43
My comment about consistent errors was separate from my concern that an error might only affect a single sample.

Concerning the cause of consistent errors, I am pretty sure I've seen errors interpolated the same way twice which was clearly not the result of a tracking error.  To reiterate my previous point, I don't recall the interpolated data being isolated to a single sample.

AR confidence and one person's opinion of 'best practices'

Reply #44
Thanx for the topic split, Greynol.  I assume full responsibility for the trainwreck of introducing EAC / dBpoweramp / AccurateRip to the original discussion when it didn't really need treatment of those subjects.  Lossless is lossless; source discussions go elsewhere.

I didn't say that you don't have a basic understanding of how AccurateRip works, I'm saying that your comment about a confidence of 5 is not based on any technical understanding of AccurateRip.  Face the facts, it isn't.


I feel like we may be mincing words here, or I'm missing something subtle that you are saying.

Is it mathematically possible - for 2 different people, with 2 different PC's, with 2 unique instances of the same CD album, in different parts of the world, at different times, with 2 different optical drives - to accidentally / randomly / coincidentally come up with the same T&C track values and the same AR confidence levels? Yes, it is possible.  I am not a mathematician, but I recognize with all those variables the low probability of this occurrence.  Assuming the T&C values (however flawed the basis of these might be) matched at the time of AR submission, is 5 a technically sufficient confidence level number? Yes. Why? Because taking into account all the previously mentioned variables, and assuming those 2 people re-ripped, with the same coincidental values and managed to re-submit to AR, there is yet a 3rd person (5th AR submission) to compare.  At 5 matches, we are well into the astronomical level of probabilities (or gross negligence in the programming of AR, or conspiracy theories, or voodoo curses).

AR confidence and one person's opinion of 'best practices'

Reply #45
Implementation failure is only on the right channel.  Please describe a read error style which does not impact both channels.

If you're talking about every 65,536th sample being ignored in the AR hash, this goes for both left and right channels, since they share the same address and the address is the reason they they are nulled.

AR confidence and one person's opinion of 'best practices'

Reply #46
Assuming the T&C values (however flawed the basis of these might be) matched at the time of AR submission, is 5 a technically sufficient confidence level number? Yes.
Why not 4?  Why not 6?  As I said earlier, ignoring the concerns about imperfection of the algorithm and consistent errors resulting from software/firmware/manufacturing defect, all you need is a confidence of 1 provided it was not your submission to rule out the possibility of a consistent error resulting from the combination of your physical disc and your particular drive.

Because taking into account all the previously mentioned variables, and assuming those 2 people re-ripped, with the same coincidental values and managed to re-submit to AR, there is yet a 3rd person (5th AR submission) to compare.
This is a lack of understanding on your part.  The third person submitting agreeing results results in a confidence of 3, not 5! EAC hashes (T&C) have nothing to do with AR hashes.  As far as the rip generating the hash, it is only the C part of T&C that is submitted to the AR database.  Furthermore, AR has a mechanism in place to prevent people from submitting results for any particular track on any particular disc more than once.

AR confidence and one person's opinion of 'best practices'

Reply #47
This is a lack of understanding on your part.

No, I believe you are not carefully reading what I am writing.

Quote
The third person submitting agreeing results results in a confidence of 3, not 5!

AR has a mechanism in place to prevent people from submitting results for any particular track on any particular disc more than once.

No, read what I wrote.  and managed to re-submit to AR.  Just as it is possible (though not probable) for coincidental values, it is possible to resubmit: same CD, desktop at home and laptop at work, for example.  Again, not probable that an error would slip through all four of these unique submissions, but adding a 3rd person making a 5th unique submission which matches means that regardless of the scratches / errors occurred in the first four submissions, all five rips are good identical rips, as opposed to five different rips with coincidental matching values.


Quote
EAC hashes (T&C) have nothing to do with AR hashes.  As far as the rip generating the hash, it is only the C part of T&C that is submitted to the AR database.

Yes I am aware that T&C is an EAC mechanism.  Nevertheless, it is yet another set of variables introduced to reduce the likelihood of coincidental matches.  If I get unmatching T&C, then I immediately change modes and re-rip, repeating this process until I do get matching T&C's.  It is a local "sanity check", and as you said, does have some value.

AR confidence and one person's opinion of 'best practices'

Reply #48
I have to pose the obvious questions to you gain, hellokeith, why not 4?  Why not 6?  Why can't someone submit a result to the database three times instead of just twice?

Please don't evade this time.

I agree that we have a problem when people re-submit from different computers with the same disc and most likely also needing the same make (if not model) of drive.  I already addressed this point.  If the submission isn't yours then you need not worry except for the exceptions I mentioned earlier which do not lend themselves to the ludicrous conclusion that X confidence means bad and X+1 means good.

AR confidence and one person's opinion of 'best practices'

Reply #49
Quote
Please don't evade this time.

You are a hard fellow to please!  I thought I had provided sufficient logic for my reasoning, but apparently not.


Quote
Why can't someone submit a result to the database three times instead of just twice?

Indeed they can.  More to follow..


Quote
I agree that we have a problem when people re-submit from different computers with the same disc..

If the submission isn't yours then you need not worry..

X confidence means bad and X+1 means good..


The issue of re-submission is valid; you and I both agree.  Does that make a confidence level of 1 invalid? No.  Yet that first submission could have been me 6 months ago before my hard drive crashed, and now I am re-submitting from essentially the same computer with the same scratched up disc.  How about 2 or 3 or 4? Well we have both agreed that the same person ripping the same disc on multiple PC's is not necessarily a rare occurrence.  So why is 5 a magic number? For me (which by the way I explicitly caveated "for me" in the OP) the combination of someone re-ripping the disc multiple times PLUS the rare but real-world possibility of just coincidence (or bad luck), 5 confidence level means that even if 2 people managed to resubmit their disc each two times and get coincidental values, there is a 3rd person (might be me on my current submission).

So here we have it.  AR does not provide any indication of the source of submissions.  Should it? Ultimately that is a question only the maintainers of AR will answer, so my speculation is really quite irrelevant.  But if it did..

On 2006 June 28 at 14:47:19 this album was submitted from IP 213.49.53.xxx from an Optiarc drive with offset -7.

All I have to do is lookup that class C IP block for a general geolocation (city level) and look at the drive type, and I know immediately if it was me or a friend or a complete stranger in Botswana.  See 2 entries with all different field values, and you would know immediately that this is no coincidence.

Until then, I will stick with 5.  You probably didn't want to hear this, but I actually feel even better with a threshold of 5 after reading the thread you linked on AR CRC problems.