HydrogenAudio

Lossless Audio Compression => FLAC => Topic started by: Lodum007 on 2023-09-06 04:49:35

Title: Help! How Do You Automate Verifying A Large FLAC Music Library?
Post by: Lodum007 on 2023-09-06 04:49:35
I am hoping someone can give me advice on how to automate testing my library.
I have about 3,000 FLAC albums, and a dozen or so "WAV" file albums.  It's about 40,000 songs.

Today I found that one album encoded years ago had track 5 playing the audio of track 6 and vice versa.  Also, one song on the same album was corrupted and only played for the first 40 seconds.

The first error could be someone misnaming the metadata.  The second problem seems to be a corrupt file but no error showed up on the FLAC file when scanned with the "Audio Tester" program.

I occasionally find small errors in the library.  The other day I found a WAV file in the library cut off 1 minute through. 

I want the library to be perfect so that all audio is okay and the track names always match the audio they go with. 

Is there an automated tool that can reliably scan all of one's FLAC and WAV files and verify that they have no errors? 

If a song is supposed to be 3:30 but is only 3:15 I could find the error by manually checking the length of each song against Apple Music.   These albums are common and well-known.  
Is there some automated way to accomplish this so I don't need to manually inspect 40,000 songs?

Advice on how to do this sensibly is appreciated.

Cheers!
Title: Re: Help! How Do You Automate Verifying A Large FLAC Music Library?
Post by: ktf on 2023-09-06 07:23:37
With MusicBrainz Picard, you can scan your audio files against the MusicBrainz database. It will give each file a color depending on how much it matches their database. If the tags differ slightly, it will be green, if the tags differ significantly, it will be green-orange. If the file length differs a lot, it will mark the file orange etc.

However, it could be your way of tagging files differs from what MusicBrainz uses. In that case, it will mark a file a differing, but only because for example you only list the most important artist and MusicBrainz lists all artists or something like that. Even then, this could still be useful.
Title: Re: Help! How Do You Automate Verifying A Large FLAC Music Library?
Post by: SimBun on 2023-09-06 07:44:22
A CUETools verify pass is want you want, although if you didn't use a ripper that supported AccurateRip/CTDB then you might not like the results.

CUETools uses a couple of databases to compare your rip to that of others to make sure it's correct. It does rely on a number of tags in order to identify albums and the ordering therein, but it won't check that the tags are correct.

Whilst in much of the documentation you'll find it states to point it to a CUE sheet, this isn't required, just point it to a folder with music files, and ultimately the root of your music directory once you've tested it.

There's a guide on how to use the verify feature here (https://captainrookie.com/how-to-use-cuetools-verification-log-feature/).
Title: Re: Help! How Do You Automate Verifying A Large FLAC Music Library?
Post by: MrRom92 on 2023-09-06 12:03:04
CueTools will only work for files ripped from CDs - not a viable option for a batch checkup if the library has downloaded audio from internet sources mixed in.

In foobar you can right click the selected files and hit verify audio integrity - while this isn’t going to highlight tagging errors, user errors from creator the file, or anything related to metadata, it will at least verify that the audio content of whatever is encoded isn’t corrupt or erroneous in some way. And it should run pretty quickly. System dependent of course. Faster drive helps.
Title: Re: Help! How Do You Automate Verifying A Large FLAC Music Library?
Post by: birdie on 2023-09-06 15:57:20
This will only show completely broken files (i.e. with decoding errors):

Code: [Select]
find /path/to/audio -type f -iname '*.fla*' -print0 | xargs -0 flac -wst

Unfortunately it will not catch bit flip errors. FLAC doesn't include any form of CRC, so bitrot is very real and bound to occur if you don't create hashsums after encoding files. I always do and I even create PAR2 recovery data for the files that are very important to me.
Title: Re: Help! How Do You Automate Verifying A Large FLAC Music Library?
Post by: Bogozo on 2023-09-06 16:10:20
As for finding corrupted FLAC files, Audiotester will detect them just fine. 

The second problem seems to be a corrupt file but no error showed up on the FLAC file when scanned with the "Audio Tester" program.
Have you any examples of corrupted FLAC files that are not detected by Audiotester?
By Audiotester i mean this Audiotester (http://www.vuplayer.com/files/audiotester.zip)  :))

Here was foobar2000 mentioned, but beware: some files that have corrupted headers, can be just silently ignored by it (i.e. not added to playlist at all), so they will not be detected by verification.

FLAC doesn't include any form of CRC
Are you sure you know what you are talking about?
Title: Re: Help! How Do You Automate Verifying A Large FLAC Music Library?
Post by: birdie on 2023-09-06 16:26:39
FLAC doesn't include any form of CRC
Are you sure you know what you are talking about?

I'm making stuff up as always. You're the expert, so I'll just walk away. Never thought that technical audio forums are conducive to insults. Looks like I was wrong.
Title: Re: Help! How Do You Automate Verifying A Large FLAC Music Library?
Post by: Porcus on 2023-09-06 18:18:25
It isn't obvious that the track is wrong just because its length doesn't match a different version of the same release.


Anyway, without even touching the topic of metadata, a few things to be aware of:

FLAC has an audio checksum. (All sane lossless compressed formats offer some kind of checksumming, at least optionally. ALAC is not sane.) * .wav doesn't have that, but if file ends where it shouldn't (according to headers or mid-sample) it is possible to detect.
But: If you have reconverted FLAC -> FLAC and ignored errors, or converted possibly corrupted WAVE to FLAC, then you have OK'ed the error and FLAC's checksum cannot help you.
And: The checksum does of course not say anything about rip errors. If you got the wrong data out of the CD, FLAC cannot know.

Audiotester.exe is a tool everybody needs, and even if it includes a slightly old version of FLAC, it will detect the errors in question. (Newer FLAC will detect more errors that seem to be off-topic here.)

AccurateRip retro-verification is good if it does indeed verify. Download and install CUETools and https://www.dbpoweramp.com/Help/perfecttunes/accuraterip.htm from the creator of AccurateRip. But even files that are just fine, may fail to be verified as such in the following circumstances:
It isn't from a CD. (Even if it was offered for download, it could be from a CD. I have purchased files with AccurateRip tags in them, from the label that also sold the CDs.) 
It doesn't find the actual CD. For that, CUETools needs either an EAC log, or suitable tags (CDTOC or AccurateRip tags - no use in album title), or if not: it must have the most common pregap length. Also, if track number tags are wrong, you are in trouble - and if one track is damaged to the point it does not give the right length.
It is not CDDA. For example if you made the mistake to decode HDCD. But PerfectTunes sometimes still helps.

Even if CUETools reports Inaccurate, then the information that it finds the CD can tell you that at least the files are not truncated.

And as pointed out: foobar2000 might - at least in several versions of it - fail to find wrong files because they are so corrupted it fails to recognize them. To fb2k they are "not even wrong", when they are so destroyed they are not even audio files.

Question: Is there any tool that can correct the length of a Windows Media Player rip?
Title: Re: Help! How Do You Automate Verifying A Large FLAC Music Library?
Post by: Replica9000 on 2023-09-06 19:40:07
This will only show completely broken files (i.e. with decoding errors):

Code: [Select]
find /path/to/audio -type f -iname '*.fla*' -print0 | xargs -0 flac -wst

Unfortunately it will not catch bit flip errors. FLAC doesn't include any form of CRC, so bitrot is very real and bound to occur if you don't create hashsums after encoding files. I always do and I even create PAR2 recovery data for the files that are very important to me.

I did a test on this a few weeks ago. 
https://hydrogenaud.io/index.php/topic,124563.msg1031741.html#msg1031741
Title: Re: Help! How Do You Automate Verifying A Large FLAC Music Library?
Post by: SimBun on 2023-09-06 19:42:30
It doesn't find the actual CD. For that, CUETools needs either an EAC log, or suitable tags (CDTOC or AccurateRip tags - no use in album title), or if not: it must have the most common pregap length.
I don't know what you mean by "it must have the most common pregap length", but if you use detailed logging (Settings > Advanced > CTDB > Detailed log: True) CUETools will include PREGAP matching in its results.

Code: [Select]
[CUETools log; Date: 06/09/2023 19:33:47; Version: 2.2.0]
[CTDB TOCID: sFAMEDF3BfD1IjO41YeTSDwphVg-] found.
        [ CTDBID ] Status
        [dc986ecf] (534/582) Has pregap length 00:01:00, Accurately ripped
        [2922f516] (001/582) No match
        [fdcfdc08] (002/582) Accurately ripped
        [ba4df3cc] (001/582) Has pregap length 00:01:00, No match
        [ff03366d] (001/582) Has pregap length 00:01:00, No match
        [9095d52d] (001/582) Has pregap length 00:01:00, No match

So our current rip only matches with 2 CTDB results (that have the same pregap), but if we plug 00:01:00 into the PREGAP field in CUETools - or more correctly create a CUE sheet with this information in - the AccurateRip results change from:
Code: [Select]
[AccurateRip ID: 00093df5-003f5f50-6007bc08] found.
Track   [  CRC   |   V2   ] Status
 01     [1a0003cc|41212817] (0+0/5) No match
 02     [5e8d42ae|ac7ec965] (0+0/5) No match
 03     [7ac5870f|47765fa9] (0+0/5) No match
 04     [2904de6f|48e887d2] (0+0/5) No match
 05     [e5511351|ccf644d3] (0+0/5) No match
 06     [3c4e7d19|a34875ab] (0+0/5) No match
 07     [f6091cda|b41d0f27] (0+0/5) No match
 08     [4b12f944|2c7615c2] (0+0/5) No match
to
Code: [Select]
[AccurateRip ID: 00094098-003f6c7e-5f07bc08] found.
Track   [  CRC   |   V2   ] Status
 01     [1a0003cc|41212817] (043+057/394) Accurately ripped
 02     [5e8d42ae|ac7ec965] (043+057/396) Accurately ripped
 03     [7ac5870f|47765fa9] (043+059/396) Accurately ripped
 04     [2904de6f|48e887d2] (044+058/396) Accurately ripped
 05     [e5511351|ccf644d3] (045+058/402) Accurately ripped
 06     [3c4e7d19|a34875ab] (043+057/394) Accurately ripped
 07     [f6091cda|b41d0f27] (044+058/392) Accurately ripped
 08     [4b12f944|2c7615c2] (045+057/388) Accurately ripped
Title: Re: Help! How Do You Automate Verifying A Large FLAC Music Library?
Post by: Porcus on 2023-09-06 19:45:38
Hm. But prepended index 00's need to be two seconds, right?
Anyway: Try! What is "Accurate", is ... accurate.
Title: Re: Help! How Do You Automate Verifying A Large FLAC Music Library?
Post by: SimBun on 2023-09-06 20:09:44
Hm. But prepended index 00's need to be two seconds, right?
I think two seconds is in the spec (for track 1), but the CUE "standard" only records deviations from that; typically they're 00:00:32 or 00:00:33 but I have pregaps ranging from 00:00:05 to 00:02:00.
Title: Re: Help! How Do You Automate Verifying A Large FLAC Music Library?
Post by: Lodum007 on 2023-09-07 00:18:47
Thank you all for your good suggestions and ideas. 
It's fabulous to have so much helpful input!

I've attached 2 files representative of what I'm trying to solve.

(For info, these tracks were loaded in my system by a friend around 15 years ago.  I don't know what went wrong at the time or how they were loaded.)

Track 4 - 04_Joan Of Arc is supposed to be 7:57.  The version I have here is only :26 in length and cuts off.
I ran (vu.com) Audio Tester program on my library and this file doesn't show up as corrupt.

I tried the track in MusicBrainz Picard and CueTools and they don't show any problem.

Track 5 - 05 Ain't No Cure For Love is the wrong song and actually has the audio for track 6 "Coming Back To You".  The files are reversed in my library.  I have no idea how they could be loaded that way but that's what I found.

I hope I don't have too many errors in the library as I did handle any corrupt files using Audio Tester. 
But I am trying to root out any other errors or erroneous metadata in many thousands of songs.

Any further advice on how to go about this is welcome.   I hope the attached files help.
Thanks!


1 attachment removed TOS  9 (https://hydrogenaud.io/index.php/topic,3974.html) - Please keep music clips under 30 seconds
Title: Re: Help! How Do You Automate Verifying A Large FLAC Music Library?
Post by: Lodum007 on 2023-09-07 04:41:58
An additional question on this:

Is there some audio software that can look at a FLAC or WAV file's supposed length and compare it with the real length of the file?
For example, I have a song that gives a supposed length of 7:09 in the media browser, but when you try to play the song it stops abruptly about 55 seconds in and won't play past that.  The file has become corrupted but how you do know unless you try to play it?
 
There must be software that would be able to scan all of one's audio files for this type of problem.
Even if it took a couple of days for the software to read through my library it would be worth it.
Any advice on that?  I have Audio Tester which generally works well for FLAC. 
Any other software that tests both FLAC and WAV?
Title: Re: Help! How Do You Automate Verifying A Large FLAC Music Library?
Post by: SimBun on 2023-09-07 07:44:28
Track 4 - 04_Joan Of Arc is supposed to be 7:57.  The version I have here is only :26 in length and cuts off.
I ran (vu.com) Audio Tester program on my library and this file doesn't show up as corrupt.
I ran the file through foobar (as previously mentioned) and the following warning was observed:
Code: [Select]
Warning: File contains ID3v2 garbage
Warning: Garbage at the end of file (ID3 tag?)
It's not an audio problem though so maybe it was a ripping error. Normal file checkers won't find these.

I tried the track in MusicBrainz Picard and CueTools and they don't show any problem.
If your files are from CD rips then CUETools will be your best bet, so could you include the log from a verification pass as if CUETools say's it's ok, then it's ok.


EDIT: Testing with flac though did result in errors, so not sure what foobar is doing (I had assumed it would do the same).
Code: [Select]
flac.exe -wst "d:\downloads\04_joan_of_arc.flac"
04_joan_of_arc.flac: *** Got error code 0:FLAC__STREAM_DECODER_ERROR_STATUS_LOST_SYNC

04_joan_of_arc.flac: ERROR during decoding
                     state = FLAC__STREAM_DECODER_END_OF_STREAM
Looking more closely at the foobar verify result it did have a status of 'Decoded with minor problems' but I think that's in relation to the metadata Warning. Mp3tag confirmed that there is ID3 data in the FLAC, I assume this was ripped with EAC with the 'Add ID3 tag' enabled.
Title: Re: Help! How Do You Automate Verifying A Large FLAC Music Library?
Post by: Porcus on 2023-09-07 09:29:39
I ran the file through foobar (as previously mentioned) and the following warning was observed:
Code: [Select]
Warning: File contains ID3v2 garbage
Warning: Garbage at the end of file (ID3 tag?)
It's not an audio problem though so maybe it was a ripping error.
Might be that one has used ExactAudioCopy and checked the box for ID3 tags. Then it will do things like that. Can be removed with Mp3tag, IIRC.

Normal file checkers won't find these.
Reference flac.exe started detecting those by release 1.4, which still is quite new. Audiotester.exe uses an older flac.exe, and I bet it is not alone about that.
Title: Re: Help! How Do You Automate Verifying A Large FLAC Music Library?
Post by: SimBun on 2023-09-07 11:33:13
Normal file checkers won't find these.
Reference flac.exe started detecting those by release 1.4, which still is quite new. Audiotester.exe uses an older flac.exe, and I bet it is not alone about that.
This comment was made after I'd used foobar2000 (latest) Verify and assumed it would have caught such errors. Given there were none I thought a possibility was that the truncation could have happened at the ripping stage, somehow, which normal file checkers wouldn't catch - assuming something converted it from a truncated WAV to FLAC.

Seems to me that CUETools (if from CD) or FLAC -t is the way to go. In terms of the metadata, good luck!
Title: Re: Help! How Do You Automate Verifying A Large FLAC Music Library?
Post by: Case on 2023-09-07 12:22:44
EDIT: Testing with flac though did result in errors, so not sure what foobar is doing (I had assumed it would do the same).
Code: [Select]
flac.exe -wst "d:\downloads\04_joan_of_arc.flac"
04_joan_of_arc.flac: *** Got error code 0:FLAC__STREAM_DECODER_ERROR_STATUS_LOST_SYNC

04_joan_of_arc.flac: ERROR during decoding
                     state = FLAC__STREAM_DECODER_END_OF_STREAM
Seems to me that [...] FLAC -t is the way to go.
Sorry to disappoint, but the scary FLAC -t error message comes from the ID3v1 tag at the end. If you remove the tag FLAC -t finds nothing.
Title: Re: Help! How Do You Automate Verifying A Large FLAC Music Library?
Post by: birdie on 2023-09-07 12:28:00

FLAC has an audio checksum. (All sane lossless compressed formats offer some kind of checksumming, at least optionally. ALAC is not sane.) *

I've just changed over 100 random bits in my FLAC file and it's passed testing with zero issues.

I still don't understand what kind of checksum people here are talking about.
Title: Re: Help! How Do You Automate Verifying A Large FLAC Music Library?
Post by: Case on 2023-09-07 12:38:42
The encoded audio frames contain CRC checksums and the entire decoded audio is checked against MD5. If you edit bits in metadata or padding blocks, those obviously have no effect.

Edit: corrected "undecoded audio" to "decoded audio".
Title: Re: Help! How Do You Automate Verifying A Large FLAC Music Library?
Post by: cid42 on 2023-09-07 12:45:56

FLAC has an audio checksum. (All sane lossless compressed formats offer some kind of checksumming, at least optionally. ALAC is not sane.) *

I've just changed over 100 random bits in my FLAC file and it's passed testing with zero issues.

I still don't understand what kind of checksum people here are talking about.
There's normally an MD5 of the raw audio, which can be checked by a full decode. The MD5 is optional and many retailers omit it because they're sloppy idiots.

There's also crc8 of every frame header and crc16 of every frame which is not optional. It's very unlikely that you can change 100 bits of frame data without getting caught by the crc's. Metadata is not covered by crc so 100 bits of errors there will probably pass unless it hits the metadata chunk header creating an invalid chunk size/id. Old versions of ./flac -t checked less things, if you're sure the errors hit frame data try testing with v1.4.3.
Title: Re: Help! How Do You Automate Verifying A Large FLAC Music Library?
Post by: SimBun on 2023-09-07 13:26:43
Seems to me that [...] FLAC -t is the way to go.
Sorry to disappoint, but the scary FLAC -t error message comes from the ID3v1 tag at the end. If you remove the tag FLAC -t finds nothing.

It seems that 1.4.2 does indeed mask the tag "issue" with
Code: [Select]
FLAC__STREAM_DECODER_ERROR_STATUS_LOST_SYNC
Whilst 1.4.3 identifies it correctly
Code: [Select]
WARNING, ID3v2 tag found. This is non-standard and strongly discouraged

So CUETools or FLAC -t (using 1.4.3)/foobar is the way to go  :)
Title: Re: Help! How Do You Automate Verifying A Large FLAC Music Library?
Post by: birdie on 2023-09-07 13:28:27
OK, here's what I've found out by testing.

FLAC contains CRC only for audio data, everything else is not checksummed or verified.

At least a simple decoding test will give a guarantee that audio is intact. Tagging information and the built-in cover art are not verified.

My statement earlier about the need of hashing entire files and having PAR2 recovery for them holds true.
Title: Re: Help! How Do You Automate Verifying A Large FLAC Music Library?
Post by: Case on 2023-09-07 13:34:32
You edited the first 200 KB of the file and the file contains 594 KB artwork. All your bit edits affected the art.
Title: Re: Help! How Do You Automate Verifying A Large FLAC Music Library?
Post by: Chibisteven on 2023-09-07 14:01:08
As far as bad embedded images goes If it's a .png or .jpeg it'll load wrong if it loads at all with corrupted or missing parts, if it's a .bmp it'll be a single bad pixel unless the header is destroyed.  Hashing an entire file can be helpful in this regard but if you change the tags, the hash will be incorrect.
Title: Re: Help! How Do You Automate Verifying A Large FLAC Music Library?
Post by: SimBun on 2023-09-07 14:40:37
My statement earlier about the need of hashing entire files and having PAR2 recovery for them holds true.
Building on what @Chibisteven said, if you store the audio MD5 then you can make sure the audio is untouched even across edits. Obviously that doesn't replace the need to store the file hash to check for bit-rot e.t.c.

I've never considered PAR before, is there any advantage of using it versus a standard hash check (assuming backups are in place), it seems like just more files to store and backup.
Title: Re: Help! How Do You Automate Verifying A Large FLAC Music Library?
Post by: cid42 on 2023-09-07 15:30:18
Par2 doesn't just detect errors, it corrects them. You can set a target par2 size as a percentage of 5% say, then roughly speaking you can recover from a good chunk of data errors with recovery files that take up 1/20 the space of the originals. I keep par2 files on the same hardware as the original file as well as a copy of the par2 files on a USB stick (as well as redundancy but only for files I particularly care about).
Title: Re: Help! How Do You Automate Verifying A Large FLAC Music Library?
Post by: MrRom92 on 2023-09-07 16:13:39
Seems like a lot of extra work, would a BTRFS volume not simply do the trick since all files are checksummed by default?
Title: Re: Help! How Do You Automate Verifying A Large FLAC Music Library?
Post by: Porcus on 2023-09-07 17:44:22
Problem about par2 and friends, is that you want to alter tags.
There is a way to handle "text" tags (i.e. not pictures), assuming you do not alter filenames: foobar2000 with Case's foo_external_tags component. Export the tags to .tag files, and keep those versioned or backed up. Then you can keep the media files themselves unchaged.
Title: Re: Help! How Do You Automate Verifying A Large FLAC Music Library?
Post by: birdie on 2023-09-07 18:31:06
Seems like a lot of extra work, would a BTRFS volume not simply do the trick since all files are checksummed by default?

BTRFS guarantees that what you read is what you save on it but
1) It cannot correct storage read errors, so if a bit flips, a related data block will be read as empty (if I remember correctly)
2) It cannot guarantee that your data hasn't changed after it's read off the disk (it can be altered by CPU, RAM and IO interfaces)
As a consequence of the second point: btrfs doesn't care (it can't know) what you're storing: imagine you're copying data from another partition or device to your btrfs volume. Any transmission errors will go unnoticed.

If you care about your data, hashsums are a must. Some sort of recovery is desirable unless you have multiple copies of your data stored in different geographical locations.
Title: Re: Help! How Do You Automate Verifying A Large FLAC Music Library?
Post by: Lodum007 on 2023-09-07 22:32:05
I've been using Audio Tester to verify the integrity of the FLAC files in the library.  It works great because I can scan the entire FLAC library in one huge batch and get a report of any errors.

I also have some WAV files in this library.  Is there a program similar to Audio Tester that can be used to verify WAV file integrity?  I've searched but have not yet found anything workable.
Title: Re: Help! How Do You Automate Verifying A Large FLAC Music Library?
Post by: bennetng on 2023-09-08 04:57:18
For .wav you can try my software (oldsCool).
https://hydrogenaud.io/index.php/topic,114816.msg1026786.html#msg1026786

Keep in mind that the .wav format itself is not as robust as flac if you want to check against corruption. Length truncation can be detected, but not corrupted audio data. This means a song with exactly 3 minutes can be replaced with exactly 3 minutes of garbage and remains undetected.

Apart from audio length, oldsCool also checks other things which can potentially go wrong with .wav, check the bundled ReadMe file for more details.
Title: Re: Help! How Do You Automate Verifying A Large FLAC Music Library?
Post by: SimBun on 2023-09-08 07:49:15
I've been using Audio Tester to verify the integrity of the FLAC files in the library.  It works great because I can scan the entire FLAC library in one huge batch and get a report of any errors.
It feels like you haven't read any of the replies to your original posting.

I want the library to be perfect so that all audio is okay and the track names always match the audio they go with.
The only way to get anywhere near this is by starting with CUETools - assuming most of collection are rips from CDs.

If your original goals have changed and you now just want to weed out the obvious corruptions then carry on with AudioTester.
Title: Re: Help! How Do You Automate Verifying A Large FLAC Music Library?
Post by: Lodum007 on 2023-09-08 16:18:33
Thanks,

My goals for this haven't changed, but I haven't had success with an automated approach yet.  I tried CUETools with the first album (the album I posted a corrupt file from).  The CD is common and released several times since 1986.  CUETools scans it and gives a message that it is not in the database.  I don't see that the program is going to be my full solution for this reason.   (Possibly that one album not showing up in the database is only an anomaly, but it's the album that started this search for a solution.)

I tried CUETools on other albums and it does find them in its database and do the scan, which is great.  I will use that tool as I am sure it will help and find album errors I would otherwise miss.

Foobar and MusicBrainz recognize the corrupt album (probably from the metadata) but they will not give me the feedback needed to determine if a track is misnamed another one, or if a file is truncated.

I think my best bet right now (although painfully slow) is:

1) Scan all files to make sure the audio integrity is okay.  I'll try OldsCool for the WAV files and continue to use Audio Tester for the FLAC files.

2) Scan the albums with CUETools.

3) I'll visually inspect the length of my tracks against some known standard - like Apple Music or Discogs.  An obvious difference in length will then give me a red flag for trouble.

I may still be missing something.  I didn't know about CUETools until I received answers on this thread and there may be some additional method of handling step 3 in an automated fashion that I haven't understood. 

Again, thanks to everyone for the advice and support.
Title: Re: Help! How Do You Automate Verifying A Large FLAC Music Library?
Post by: SimBun on 2023-09-08 16:46:15
I tried CUETools with the first album (the album I posted a corrupt file from).  The CD is common and released several times since 1986.  CUETools scans it and gives a message that it is not in the database.  I don't see that the program is going to be my full solution for this reason.   (Possibly that one album not showing up in the database is only an anomaly, but it's the album that started this search for a solution.)
The fact that the disc couldn't be found was an indicator that something is wrong. In this case the track was truncated so the disc layout (track timings) didn't match. It did its job!

Foobar and MusicBrainz recognize the corrupt album (probably from the metadata) but they will not give me the feedback needed to determine if a track is misnamed another one, or if a file is truncated.
I don't know which album you're talking about, but foobar can only tell you if a file is corrupt or not, and in the joan_of_arc example it isn't, its just that it has an ID3 tag at the end. I'd have to see the MusicBraniz output to know what it's telling you, but I don't think Picard is particularly useful in a batch scenario like this.

What you should be doing is running CUETools against your entire collection and for every album that doesn't verify investigate separately. This will identify trucations, corruptions, incorrect track ordering e.t.c.
Foobar verifier, AudioTester and the WAV tester will only tell you about corruptions, they only see the individual files and have no understanding of an album/disc.

This assumes that your collection is from CD rips which you still haven't confirmed.

Why don't you create a new folder, place a handful of albums in there (including one that is corrupt) and run a CUETools pass against the entire folder. If you're happy you understand the results, then proceed with your entire collection and report back.
Title: Re: Help! How Do You Automate Verifying A Large FLAC Music Library?
Post by: Lodum007 on 2023-09-08 18:42:20
Alright.
About 95% of my library is from CD.  The other albums are digital downloads.
I tried one folder with CUETools and that went fine.  Now I've started running CUETools on the whole library (and that will probably run for several days).

Can you explain how CUETools checks one's file against its database?  Does it look at metadata for the album or artist name that it scans?  Or is it some other information contained in the file?  Understanding this would help me understand why the corrupt song I uploaded wasn't found in the database.   I am not familiar with the theory behind this.

I can see so far on my scan of the full library that CUETools did not locate several other albums, such as the "Slumdog Millionaire" soundtrack, which doesn't make sense to me.  That is a common album and was taken from a CD.  Same with A Tribe Called Quest "Anthology". 

As a note, I can see that a few albums I added extra metadata to in the album name field show up correctly in the CUETools database when scanned.  I find it interesting because my album name metadata did not correctly match a standard, but the software found the albums in the database and compared my files to them.  Seems good so far.

Title: Re: Help! How Do You Automate Verifying A Large FLAC Music Library?
Post by: SimBun on 2023-09-08 19:12:55
I tried one folder with CUETools and that went fine.  Now I've started running CUETools on the whole library (and that will probably run for several days).
I would hope not. The last time I scanned ~30,000 tracks it took 3-4 hours.

Can you explain how CUETools checks one's file against its database?  Does it look at metadata for the album or artist name that it scans?  Or is it some other information contained in the file?  Understanding this would help me understand why the corrupt song I uploaded wasn't found in the database.   I am not familiar with the theory behind this.
CUETools does use some metadata from your tags in order to identify a disc, and the order of the tracks within that disc. I'm guessing here, but I assume it uses AlbumArtist, Album, DiscNumber and TrackNumber.

For each identified disc a unique key is produced so that it can match with other rips from the database. The key includes - amongst other things - the number of tracks on the disc, and the length of each track.
The Full TOC from MusicBrainz (https://musicbrainz.org/cdtoc/iGICEPm_xFOMLezCi4lP_OMPes8-) is an example of such a key.

The next stage is to hash the audio of each track and compare that to the matching rips in the database. A single incorrect bit will produce an inaccurate result.

I can see so far on my scan of the full library that CUETools did not locate several other albums, such as the "Slumdog Millionaire" soundtrack, which doesn't make sense to me.  That is a common album and was taken from a CD.  Same with A Tribe Called Quest "Anthology".
In each folder will be a *.accurip file that gives you more detail. There could be any number of things wrong with it, including that it wasn't from a CD. Feel free to attach one for us to review.

As a note, I can see that a few albums I added extra metadata to in the album name field show up correctly in the CUETools database when scanned.  I find it interesting because my album name metadata did not correctly match a standard, but the software found the albums in the database and compared my files to them.  Seems good so far.
As mentioned before, it only uses the metadata in order to be able to identify a disc, so you could have put anything into the album field (as long as it's the same for every track) and it would have correctly grouped them and then perfomed the matching on the key of that "disc".

Hope that makes some sense.
Title: Re: Help! How Do You Automate Verifying A Large FLAC Music Library?
Post by: korth on 2023-09-08 19:21:20
CUETools has a board on the forum:
https://hydrogenaud.io/index.php/board,74.0.html
CUETools has a wiki:
http://cue.tools/wiki/
Title: Re: Help! How Do You Automate Verifying A Large FLAC Music Library?
Post by: Lodum007 on 2023-09-08 23:21:45
I looked through the documentation on CUETools.
I want to make sure I'm doing this right.  Specifically, I want to make sure I understand the results of the tests.
(My understanding of technical matters is "basic geek", but not "deep geek", with all proper respect to geeks.
There may be technical points taken for granted as understood by the CUETools user which I don't know.)

I found the physical CD that had corruption earlier this week and re-ripped it into FLAC (and deleted the old corrupt version).
I then scanned the album using CUETools and the results are below.
Could you please interpret them for me?   Should I be concerned about the errors that show up?
If so, what would I do about them?
Why are there errors in CTDB but no errors in AccurateRip?
What is the significance of track peak when doing this test? 

I'm trying to get a feel of what I am looking for when interpreting these results so I deal with them correctly.
Thank you.

_____

Code: [Select]
[CUETools log; Date: 9/8/2023 3:06:40 PM; Version: 2.2.4]
[CTDB TOCID: DOgqiNs3rS.8lt8asau6gAcRtdI-] found.
Track | CTDB Status
  1   | (1285/1306) Accurately ripped
  2   | (1284/1306) Accurately ripped
  3   | (1285/1306) Accurately ripped
  4   | (1279/1306) Accurately ripped, or (2/1306) differs in 3 samples @03:50:22
  5   | (1289/1306) Accurately ripped
  6   | (1290/1306) Accurately ripped
  7   | (1285/1306) Accurately ripped
  8   | (1284/1306) Accurately ripped
  9   | (1280/1306) Accurately ripped, or (2/1306) differs in 8157 samples @03:37:37-03:37:48
[AccurateRip ID: 000ec885-006c383a-7509bc09] found.
Track   [  CRC   |   V2   ] Status
 01     [5e455c1e|95af0aec] (000+000/891) No match
 02     [a49e3e2c|d6d45532] (000+000/889) No match
 03     [402d8aac|b56b7eab] (000+000/893) No match
 04     [488026c5|ba30d47e] (000+000/887) No match
 05     [c6cf3964|efec44c4] (000+000/885) No match
 06     [49445bd9|889c677a] (000+000/889) No match
 07     [1b86347e|03a043f7] (000+000/882) No match
 08     [51570a91|3ea2c932] (000+000/882) No match
 09     [38edc633|591bc423] (000+000/883) No match
Offsetted by 667:
 01     [cc2152c3] (084/891) Accurately ripped
 02     [d6224535] (084/889) Accurately ripped
 03     [d972a5ef] (086/893) Accurately ripped
 04     [8dc44b7e] (085/887) Accurately ripped
 05     [3f8114ab] (084/885) Accurately ripped
 06     [e7e47505] (084/889) Accurately ripped
 07     [a1bfa5c4] (085/882) Accurately ripped
 08     [e71a55cc] (082/882) Accurately ripped
 09     [2e0c30c8] (084/883) Accurately ripped
Offsetted by 1141:
 01     [c6ead5c9] (049/891) Accurately ripped
 02     [e7e8bb21] (049/889) Accurately ripped
 03     [503cd390] (049/893) Accurately ripped
 04     [121a7b19] (049/887) Accurately ripped
 05     [102251c8] (049/885) Accurately ripped
 06     [e939cdeb] (049/889) Accurately ripped
 07     [6c6d718b] (048/882) Accurately ripped
 08     [33ed6b64] (049/882) Accurately ripped
 09     [2ec40f01] (047/883) Accurately ripped
Offsetted by 1334:
 01     [1779ed7a] (031/891) Accurately ripped
 02     [9fe319e5] (032/889) Accurately ripped
 03     [1767b0e2] (031/893) Accurately ripped
 04     [4990c60a] (031/887) Accurately ripped
 05     [bb197074] (031/885) Accurately ripped
 06     [3748a318] (031/889) Accurately ripped
 07     [21b61ea9] (031/882) Accurately ripped
 08     [b23091fb] (031/882) Accurately ripped
 09     [b6f052c3] (030/883) Accurately ripped
Offsetted by 1502:
 01     [28538dba] (082/891) Accurately ripped
 02     [fdf58129] (083/889) Accurately ripped
 03     [4289c855] (082/893) Accurately ripped
 04     [8955cb17] (081/887) Accurately ripped
 05     [21fb5cdd] (081/885) Accurately ripped
 06     [0de7cf11] (082/889) Accurately ripped
 07     [bb715a5e] (080/882) Accurately ripped
 08     [bd7af470] (081/882) Accurately ripped
 09     [89ed15e0] (079/883) Accurately ripped
Offsetted by 2001:
 01     [c273b9ee] (113/891) Accurately ripped
 02     [db341a4c] (112/889) Accurately ripped
 03     [8f052f1a] (112/893) Accurately ripped
 04     [a20f18ed] (111/887) Accurately ripped
 05     [12f9539f] (111/885) Accurately ripped
 06     [971d8866] (112/889) Accurately ripped
 07     [dcf73fcf] (109/882) Accurately ripped
 08     [7466b1d2] (111/882) Accurately ripped
 09     [d4b460bb] (112/883) Accurately ripped
Offsetted by 1436:
 01     [15b13591] (000/891) No match (V2 was not tested)
 02     [9dbcc837] (000/889) No match (V2 was not tested)
 03     [c221499f] (000/893) No match (V2 was not tested)
 04     [bef1e814] (000/887) No match (V2 was not tested)
 05     [029824be] (000/885) No match (V2 was not tested)
 06     [2e38727f] (000/889) No match (V2 was not tested)
 07     [4baaf8d8] (000/882) No match (V2 was not tested)
 08     [28f94f2f] (000/882) No match (V2 was not tested)
 09     [2dc78980] (000/883) No match (V2 was not tested)

Track Peak [ CRC32  ] [W/O NULL]
 --   67.1 [759AC356] [1CC89E4E]          
 01   59.5 [10FDD893] [CEA1E7F4]          
 02   65.3 [843F58FB] [714E7D81]          
 03   35.3 [5B0078B4] [4B1A0519]          
 04   67.1 [FCE28D7C] [5D1C918F]          
 05   44.9 [AA7346BE] [F00B1AA4]          
 06   34.5 [F2317E6F] [D3CDE8C6]          
 07   32.8 [20DF2890] [ABFBD933]          
 08   32.2 [78F0BDEE] [B5E69D6F]          
 09   41.2 [2E24D7E2] [63A07730]          




MOD edit: moved log to code box
Title: Re: Help! How Do You Automate Verifying A Large FLAC Music Library?
Post by: korth on 2023-09-08 23:32:18
Quote
Why are there errors in CTDB but no errors in AccurateRip?
I see all 9 tracks 'Accurately ripped' with a minimum confidence of  (1279/1306)
Title: Re: Help! How Do You Automate Verifying A Large FLAC Music Library?
Post by: Lodum007 on 2023-09-08 23:33:41
Here's another one, Jazz At The Pawnshop

Code: [Select]
[CUETools log; Date: 9/8/2023 3:30:00 PM; Version: 2.2.4]
[CTDB TOCID: NdAQSzWbfFSO.9VA7W_D0kfWC.E-] found.
Track | CTDB Status
  1   | (32/35) Accurately ripped, or (1/35) differs in 8 samples @06:00:46
  2   | (33/35) Accurately ripped
  3   | (34/35) Accurately ripped
  4   | (25/35) Differs in 18 samples @06:04:18, or (1/35) differs in 18 samples @06:04:18, or (1/35) differs in 18 samples @06:04:18
  5   | (25/35) Differs in 61 samples @03:54:40, or (1/35) differs in 64 samples @03:54:40,05:10:62, or (1/35) differs in 61 samples @03:54:40
  6   | (25/35) Differs in 59 samples @03:28:27, or (1/35) differs in 90 samples @01:31:15,02:43:34,03:28:27,03:44:19,03:51:51,04:01:49,04:59:09,05:35:60,06:13:32, or (1/35) differs in 59 samples @03:28:27
  7   | (25/35) Differs in 34 samples @00:33:03, or (1/35) differs in 82 samples @00:29:00,00:33:03,01:01:21,02:46:30,03:07:38-03:07:39,03:34:31-03:34:33,03:55:31,04:44:24,04:53:61-04:53:62,04:56:56, or (1/35) differs in 34 samples @00:33:03
  8   | (25/35) Differs in 13 samples, or (1/35) differs in 408 samples, or (1/35) differs in 50 samples
  9   | (26/35) Accurately ripped, or (1/35) differs in 355 samples, or (1/35) differs in 78 samples
[AccurateRip ID: 0018eb0e-00b3eecd-7c104809] found.
Track   [  CRC   |   V2   ] Status
 01     [2241f32a|5853a614] (0+0/1) No match
 02     [cd3bbe9d|92ff42b4] (0+0/1) No match
 03     [55026de8|fb2b3420] (0+0/1) No match
 04     [ae0819e3|b1d5d300] (0+0/1) No match
 05     [305ba6af|e4305d6f] (0+0/1) No match
 06     [ac11c05d|c46e8a8c] (0+0/1) No match
 07     [0e4320b6|570d0698] (0+0/1) No match
 08     [e5de5632|e16abcfd] (0+0/1) No match
 09     [640cffb7|5f3b241c] (0+0/1) No match
Offsetted by -1364:
 01     [fdf645dc] (1/1) Accurately ripped
 02     [f49e3c79] (1/1) Accurately ripped
 03     [6170b22b] (1/1) Accurately ripped
 04     [6cad41ab] (0/1) No match (V2 was not tested)
 05     [9195ecd2] (0/1) No match (V2 was not tested)
 06     [c3c20415] (0/1) No match (V2 was not tested)
 07     [deae19ed] (0/1) No match (V2 was not tested)
 08     [dd679a8c] (0/1) No match (V2 was not tested)
 09     [a7dedfa6] (1/1) Accurately ripped

Track Peak [ CRC32  ] [W/O NULL]
 --  100.0 [8EA836AC] [4F3ED34B]          
 01   78.8 [ABA6B597] [5788EA0A]          
 02   74.0 [7FD094D9] [87D9DCE4]          
 03   89.8 [449C1759] [52BA85B7]          
 04   69.1 [170276F5] [4E835184]          
 05   69.1 [9C00A145] [627E6F8B]          
 06  100.0 [AC243CA1] [64DEEAEF]          
 07   69.0 [2A5BACC7] [0C662C84]          
 08   89.0 [46632C21] [705128C2]          
 09   59.9 [C803AC05] [2492F91F]          
Title: Re: Help! How Do You Automate Verifying A Large FLAC Music Library?
Post by: Lodum007 on 2023-09-08 23:36:11
Quote
Why are there errors in CTDB but no errors in AccurateRip?
I see all 9 tracks 'Accurately ripped' with a minimum confidence of  (1279/1306)

I see. I don't think I understand the software.  What does 1279/1306 signify?

What is the significance of this result?
4   | (1279/1306) Accurately ripped, or (2/1306) differs in 3 samples @03:50:22

Thanks!
Title: Re: Help! How Do You Automate Verifying A Large FLAC Music Library?
Post by: korth on 2023-09-08 23:38:29
Quote
I see. I don't think I understand the software.  What does 1279/1306 signify?
1279 rips match out of 1306

Quote
Here's another one, Jazz At The Pawnshop
Tracks 4-8 are not accurate but RIP is repairable.



Edit: looks like SimBun is taking it from here. I'm tagged out.
Title: Re: Help! How Do You Automate Verifying A Large FLAC Music Library?
Post by: SimBun on 2023-09-08 23:39:59
Code: [Select]
  4   | (1279/1306) Accurately ripped, or (2/1306) differs in 3 samples @03:50:22
This means that your rip of track 4 matched 1279 other users (from 1306 rips total) AND it differed by 3 samples from another 2 rips.

Code: [Select]
[AccurateRip ID: 000ec885-006c383a-7509bc09] found.
Track   [  CRC   |   V2   ] Status
 01     [5e455c1e|95af0aec] (000+000/891) No match
 02     [a49e3e2c|d6d45532] (000+000/889) No match
 03     [402d8aac|b56b7eab] (000+000/893) No match
 04     [488026c5|ba30d47e] (000+000/887) No match
 05     [c6cf3964|efec44c4] (000+000/885) No match
 06     [49445bd9|889c677a] (000+000/889) No match
 07     [1b86347e|03a043f7] (000+000/882) No match
 08     [51570a91|3ea2c932] (000+000/882) No match
 09     [38edc633|591bc423] (000+000/883) No match
This probably means that you haven't setup your ripping drive correctly. What software are you using? EAC for instance has sample offset correction that needs to be configured. Here's an old guide (https://captainrookie.com/how-to-install-and-setup-eac-and-rip-cds-to-flac/#EAC-installation) for EAC that might be of some use.
Title: Re: Help! How Do You Automate Verifying A Large FLAC Music Library?
Post by: Lodum007 on 2023-09-08 23:50:10
Thank you both.

I typically use EAC but didn't with this specific CD.  I'll use it in the future to make sure future rips don't have this problem.

Why does CTDB show errors on tracks 4 and 9 but AccurateRip doesn't?   Why would that be and should I care?

On the Jazz At The Pawnshop, how would one repair tracks 4-8?

Title: Re: Help! How Do You Automate Verifying A Large FLAC Music Library?
Post by: korth on 2023-09-09 00:00:41
CUETools has a board on the forum:
https://hydrogenaud.io/index.php/board,74.0.html
It doesn't help anyone else asking program-specific questions on a different board.

Quote
Why does CTDB show errors on tracks 4 and 9 but AccurateRip doesn't?
Those aren't errors. It is showing you that another 'recovery record' exists from 2 rips that had slightly different audio than your rip did. Your rip was Accurately Ripped so those 2 rips are not significant.

A recovery record is a 180KB file (about twice that size for more popular CDs), which is stored separately in the database and accessed only when verifying a rip that differs or when repairing a rip.
Title: Re: Help! How Do You Automate Verifying A Large FLAC Music Library?
Post by: SimBun on 2023-09-09 11:51:53
On the Jazz At The Pawnshop, how would one repair tracks 4-8?
In the CUETools Action section, instead of using 'Verify' you'd choose 'Encode' (to create a new copy) and then select 'repair' in the dropdown.

Once you've hit Go you'll be prompted for which set of repairs to make (if there are multiple possibilities), just select the ones that result in the most matches (25/35).
Title: Re: Help! How Do You Automate Verifying A Large FLAC Music Library?
Post by: korth on 2023-09-09 16:13:05
Quote
Why does CTDB show errors on tracks 4 and 9 but AccurateRip doesn't?
Those aren't errors. It is showing you that another 'recovery record' exists from 2 rips that had slightly different audio than your rip did. Your rip was Accurately Ripped so those 2 rips are not significant.

Now if the reverse were true and your rip matched the two
Code: [Select]
[CTDB TOCID: DOgqiNs3rS.8lt8asau6gAcRtdI-] found.
Track | CTDB Status
  1   | (1285/1306) Accurately ripped
  2   | (1284/1306) Accurately ripped
  3   | (1285/1306) Accurately ripped
  4   | (2/1306) Accurately ripped, or (1279/1306) differs in 3 samples @03:50:22
  5   | (1289/1306) Accurately ripped
  6   | (1290/1306) Accurately ripped
  7   | (1285/1306) Accurately ripped
  8   | (1284/1306) Accurately ripped
  9   | (2/1306) Accurately ripped, or (1280/1306) differs in 8157 samples @03:37:37-03:37:48
One might lean toward repairing the rip to match the higher confidence.

Note: Having a confidence of 2 doesn't mean the rip is less accurate. 2 vs 2000 is popularity not accuracy.
Title: Re: Help! How Do You Automate Verifying A Large FLAC Music Library?
Post by: Porcus on 2023-09-09 17:45:40
Note that - if you are a bit careless - a confidence of 2-out-of-many might both be your own.
Title: Re: Help! How Do You Automate Verifying A Large FLAC Music Library?
Post by: Lodum007 on 2023-09-09 21:45:06
Thank you all for the helpful input.
I've been working with CUETools to scan the library as recommended.

I have various questions about the type of error messages coming up and what to do about them.
All the questions specifically relate to CUETools, so I've posted the next set of questions in a new CUETools thread.

https://hydrogenaud.io/index.php/topic,124732.0.html