What's it for?You probably heard about AccurateRip, a wonderfull database of CD rip checksums, which helps you make sure your CD rip is an exact copy of original CD. What it can tell you is how many other people got the same data when copying this CD. CUETools Database is an extension of this idea.What are the advantages? * The most important feature is the ability not only to detect, but also correct small amounts of errors that occured in the ripping process. * It's free of the offset problems. You don't even need to set up offset correction for your CD drive to be able to verify and what's more important, submit rips to the database. Different pressings of the same CD are treated as the same disc by the database, it doesn't care. * Verification results are easier to deal with. There are exactly three possible outcomes: rip is correct, rip contains correctable errors, rip is unknown (or contains errors beyond repair). * If there's a match, you can be certain it's really a match, because in addition to recovery record database uses a well-known CRC32 checksum of the whole CD image (except for 10*588 offset samples in the first and last seconds of the disc). This checksum is used as a rip ID in CTDB. What are the downsides and limitations? * CUETools DB doesn't bother with tracks. Your rip as a whole is either good/correctable, or it isn't. If one of the tracks is damaged beyound repair, CTDB cannot tell which one. * If your rip contains errors, verification/correction process will involve downloading about 200kb of data, which is much more than it takes for AccurateRp. * Verification process is slower than with AR. * Database was just born and at the moment contains much less CDs than AR. How many errors can a rip contain and still be repairable? * That depends. The best case scenario is when there's one continuous damaged area up to 30-40 sectors (about half a second) long. * The worst case scenario is 4 non-continuous damaged sectors in (very) unlucky positions. What information does the database contain per each submission? * CD TOC (Table Of Contents), i.e. length of every track. * Offset-finding checksum, i.e. small (16 byte) recovery record for a set of samples throughout the CD, which allows to detect the offset difference between the rip in database and your rip, even if your rip contains some errors. * CRC32 of the whole disc (except for some leadin/leadout samples). * Submission date, artist, title. * 180kb recovery record, which is stored separately and accessed only when verifying a broken rip or repairing it.
Is there a point in better identification of where the damage is, when the database is unable to fix it?
Discs don't have to pass AR before being added to the CTDB, AR is used only as a kind of proof that there is a physical CD with such content when adding with CUETools.CD Rippers can add CDs to CTDB even if AR doesn't know them. There is already a number of CDs in database submitted by CUERipper, some of them have confidence 1 - that means they didn't pass AR check or weren't found in AR.
Row size doesn't have such impact on performance, so it can be easily extended in the future, so that popular CDs can have larger recovery records.
Not for RS repair, however for the ripper, this would allow re-ripping of the part of the disc where CRCs do not match and therefore are the problem areas.
My reason for suggesting that the DB should only include AR confirmed discs is to verify that the correction data will fix a disc to the correct state. Also, it may help limit the size of the database by only adding correct discs.
I would argue that less popular discs may warrant more data as they may be less replaceable...
How is meta-data handled in your database since this info is also saved?
The fact it is not track based is a real issue, to make it track based it would have to store 10x more correction data, which would make it un-practical.
Also, have you thought of keeping track of how many errors are in the "average" disc to get a better idea of how much error recover to keep in the DB.
In the future, WILL THERE BE something that may detect by disc ID, which catalogue number & country a pressing actually is.
We certainly don't have CUETools "correcting" a track that was ripped correctly that has no audible glitch with data that came from a later generation pressing that has an audible glitch.
A tool to scan audio for clicks/pops characteristic of scratches or DRM would be a useful tool. Scanning rips, since most people can't/don't listen to a rip to check it, would be very helpful.
QuoteThe fact it is not track based is a real issue, to make it track based it would have to store 10x more correction data, which would make it un-practical.Would it be reasonable to consider adding smaller track based correction files to the database?
Quote from: Eli on 07 April, 2010, 07:11:21 PMQuoteThe fact it is not track based is a real issue, to make it track based it would have to store 10x more correction data, which would make it un-practical.Would it be reasonable to consider adding smaller track based correction files to the database? Mr. Spoon is right. Average CD consists of 8-9 tracks, and each track requires the same amount of correction data as the whole disc. Making correction data 10 times smaller will make it useless, it won't be able to fix any significant glitch. And keeping the same amount of correction data for each track would make database too large. If we can allow for 10 times more database space, we could instead make larger correction records for the whole disc.Besides, CTDB is mostly aimed at CD archiving, making sure you have the exact copy of your CDs on your HD. If you rip one track from a CD, it's much less important to have a bit-exact copy.
After my last post I thought about this some more.
License:GNU General Public License (GPL), GNU Library or Lesser General Public License (LGPL)
Current size was chosen so that if database contained as many entries as AccurateRip, it would fit on a 1TB drive.
methinks the record companies would have a little issue with the sampling and reconstruction of their copywritten audio
@EliHow much does such a HD drive cost? You answer that.How many new CD releases per month? 1000? Recovery data submission bandwidth is under control.There is also no need to have a track based DB (10x size) for archival purposes.