Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: Help! How Do You Automate Verifying A Large FLAC Music Library? (Read 5783 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Help! How Do You Automate Verifying A Large FLAC Music Library?

I am hoping someone can give me advice on how to automate testing my library.
I have about 3,000 FLAC albums, and a dozen or so "WAV" file albums.  It's about 40,000 songs.

Today I found that one album encoded years ago had track 5 playing the audio of track 6 and vice versa.  Also, one song on the same album was corrupted and only played for the first 40 seconds.

The first error could be someone misnaming the metadata.  The second problem seems to be a corrupt file but no error showed up on the FLAC file when scanned with the "Audio Tester" program.

I occasionally find small errors in the library.  The other day I found a WAV file in the library cut off 1 minute through. 

I want the library to be perfect so that all audio is okay and the track names always match the audio they go with. 

Is there an automated tool that can reliably scan all of one's FLAC and WAV files and verify that they have no errors? 

If a song is supposed to be 3:30 but is only 3:15 I could find the error by manually checking the length of each song against Apple Music.   These albums are common and well-known.  
Is there some automated way to accomplish this so I don't need to manually inspect 40,000 songs?

Advice on how to do this sensibly is appreciated.

Cheers!

Re: Help! How Do You Automate Verifying A Large FLAC Music Library?

Reply #1
With MusicBrainz Picard, you can scan your audio files against the MusicBrainz database. It will give each file a color depending on how much it matches their database. If the tags differ slightly, it will be green, if the tags differ significantly, it will be green-orange. If the file length differs a lot, it will mark the file orange etc.

However, it could be your way of tagging files differs from what MusicBrainz uses. In that case, it will mark a file a differing, but only because for example you only list the most important artist and MusicBrainz lists all artists or something like that. Even then, this could still be useful.
Music: sounds arranged such that they construct feelings.

Re: Help! How Do You Automate Verifying A Large FLAC Music Library?

Reply #2
A CUETools verify pass is want you want, although if you didn't use a ripper that supported AccurateRip/CTDB then you might not like the results.

CUETools uses a couple of databases to compare your rip to that of others to make sure it's correct. It does rely on a number of tags in order to identify albums and the ordering therein, but it won't check that the tags are correct.

Whilst in much of the documentation you'll find it states to point it to a CUE sheet, this isn't required, just point it to a folder with music files, and ultimately the root of your music directory once you've tested it.

There's a guide on how to use the verify feature here.

Re: Help! How Do You Automate Verifying A Large FLAC Music Library?

Reply #3
CueTools will only work for files ripped from CDs - not a viable option for a batch checkup if the library has downloaded audio from internet sources mixed in.

In foobar you can right click the selected files and hit verify audio integrity - while this isn’t going to highlight tagging errors, user errors from creator the file, or anything related to metadata, it will at least verify that the audio content of whatever is encoded isn’t corrupt or erroneous in some way. And it should run pretty quickly. System dependent of course. Faster drive helps.

Re: Help! How Do You Automate Verifying A Large FLAC Music Library?

Reply #4
This will only show completely broken files (i.e. with decoding errors):

Code: [Select]
find /path/to/audio -type f -iname '*.fla*' -print0 | xargs -0 flac -wst

Unfortunately it will not catch bit flip errors. FLAC doesn't include any form of CRC, so bitrot is very real and bound to occur if you don't create hashsums after encoding files. I always do and I even create PAR2 recovery data for the files that are very important to me.

Re: Help! How Do You Automate Verifying A Large FLAC Music Library?

Reply #5
As for finding corrupted FLAC files, Audiotester will detect them just fine. 

The second problem seems to be a corrupt file but no error showed up on the FLAC file when scanned with the "Audio Tester" program.
Have you any examples of corrupted FLAC files that are not detected by Audiotester?
By Audiotester i mean this Audiotester  :))

Here was foobar2000 mentioned, but beware: some files that have corrupted headers, can be just silently ignored by it (i.e. not added to playlist at all), so they will not be detected by verification.

FLAC doesn't include any form of CRC
Are you sure you know what you are talking about?

Re: Help! How Do You Automate Verifying A Large FLAC Music Library?

Reply #6
FLAC doesn't include any form of CRC
Are you sure you know what you are talking about?

I'm making stuff up as always. You're the expert, so I'll just walk away. Never thought that technical audio forums are conducive to insults. Looks like I was wrong.

Re: Help! How Do You Automate Verifying A Large FLAC Music Library?

Reply #7
It isn't obvious that the track is wrong just because its length doesn't match a different version of the same release.


Anyway, without even touching the topic of metadata, a few things to be aware of:

FLAC has an audio checksum. (All sane lossless compressed formats offer some kind of checksumming, at least optionally. ALAC is not sane.) * .wav doesn't have that, but if file ends where it shouldn't (according to headers or mid-sample) it is possible to detect.
But: If you have reconverted FLAC -> FLAC and ignored errors, or converted possibly corrupted WAVE to FLAC, then you have OK'ed the error and FLAC's checksum cannot help you.
And: The checksum does of course not say anything about rip errors. If you got the wrong data out of the CD, FLAC cannot know.

Audiotester.exe is a tool everybody needs, and even if it includes a slightly old version of FLAC, it will detect the errors in question. (Newer FLAC will detect more errors that seem to be off-topic here.)

AccurateRip retro-verification is good if it does indeed verify. Download and install CUETools and https://www.dbpoweramp.com/Help/perfecttunes/accuraterip.htm from the creator of AccurateRip. But even files that are just fine, may fail to be verified as such in the following circumstances:
It isn't from a CD. (Even if it was offered for download, it could be from a CD. I have purchased files with AccurateRip tags in them, from the label that also sold the CDs.) 
It doesn't find the actual CD. For that, CUETools needs either an EAC log, or suitable tags (CDTOC or AccurateRip tags - no use in album title), or if not: it must have the most common pregap length. Also, if track number tags are wrong, you are in trouble - and if one track is damaged to the point it does not give the right length.
It is not CDDA. For example if you made the mistake to decode HDCD. But PerfectTunes sometimes still helps.

Even if CUETools reports Inaccurate, then the information that it finds the CD can tell you that at least the files are not truncated.

And as pointed out: foobar2000 might - at least in several versions of it - fail to find wrong files because they are so corrupted it fails to recognize them. To fb2k they are "not even wrong", when they are so destroyed they are not even audio files.

Question: Is there any tool that can correct the length of a Windows Media Player rip?

Re: Help! How Do You Automate Verifying A Large FLAC Music Library?

Reply #8
This will only show completely broken files (i.e. with decoding errors):

Code: [Select]
find /path/to/audio -type f -iname '*.fla*' -print0 | xargs -0 flac -wst

Unfortunately it will not catch bit flip errors. FLAC doesn't include any form of CRC, so bitrot is very real and bound to occur if you don't create hashsums after encoding files. I always do and I even create PAR2 recovery data for the files that are very important to me.

I did a test on this a few weeks ago. 
https://hydrogenaud.io/index.php/topic,124563.msg1031741.html#msg1031741

Re: Help! How Do You Automate Verifying A Large FLAC Music Library?

Reply #9
It doesn't find the actual CD. For that, CUETools needs either an EAC log, or suitable tags (CDTOC or AccurateRip tags - no use in album title), or if not: it must have the most common pregap length.
I don't know what you mean by "it must have the most common pregap length", but if you use detailed logging (Settings > Advanced > CTDB > Detailed log: True) CUETools will include PREGAP matching in its results.

Code: [Select]
[CUETools log; Date: 06/09/2023 19:33:47; Version: 2.2.0]
[CTDB TOCID: sFAMEDF3BfD1IjO41YeTSDwphVg-] found.
        [ CTDBID ] Status
        [dc986ecf] (534/582) Has pregap length 00:01:00, Accurately ripped
        [2922f516] (001/582) No match
        [fdcfdc08] (002/582) Accurately ripped
        [ba4df3cc] (001/582) Has pregap length 00:01:00, No match
        [ff03366d] (001/582) Has pregap length 00:01:00, No match
        [9095d52d] (001/582) Has pregap length 00:01:00, No match

So our current rip only matches with 2 CTDB results (that have the same pregap), but if we plug 00:01:00 into the PREGAP field in CUETools - or more correctly create a CUE sheet with this information in - the AccurateRip results change from:
Code: [Select]
[AccurateRip ID: 00093df5-003f5f50-6007bc08] found.
Track   [  CRC   |   V2   ] Status
 01     [1a0003cc|41212817] (0+0/5) No match
 02     [5e8d42ae|ac7ec965] (0+0/5) No match
 03     [7ac5870f|47765fa9] (0+0/5) No match
 04     [2904de6f|48e887d2] (0+0/5) No match
 05     [e5511351|ccf644d3] (0+0/5) No match
 06     [3c4e7d19|a34875ab] (0+0/5) No match
 07     [f6091cda|b41d0f27] (0+0/5) No match
 08     [4b12f944|2c7615c2] (0+0/5) No match
to
Code: [Select]
[AccurateRip ID: 00094098-003f6c7e-5f07bc08] found.
Track   [  CRC   |   V2   ] Status
 01     [1a0003cc|41212817] (043+057/394) Accurately ripped
 02     [5e8d42ae|ac7ec965] (043+057/396) Accurately ripped
 03     [7ac5870f|47765fa9] (043+059/396) Accurately ripped
 04     [2904de6f|48e887d2] (044+058/396) Accurately ripped
 05     [e5511351|ccf644d3] (045+058/402) Accurately ripped
 06     [3c4e7d19|a34875ab] (043+057/394) Accurately ripped
 07     [f6091cda|b41d0f27] (044+058/392) Accurately ripped
 08     [4b12f944|2c7615c2] (045+057/388) Accurately ripped

Re: Help! How Do You Automate Verifying A Large FLAC Music Library?

Reply #10
Hm. But prepended index 00's need to be two seconds, right?
Anyway: Try! What is "Accurate", is ... accurate.

Re: Help! How Do You Automate Verifying A Large FLAC Music Library?

Reply #11
Hm. But prepended index 00's need to be two seconds, right?
I think two seconds is in the spec (for track 1), but the CUE "standard" only records deviations from that; typically they're 00:00:32 or 00:00:33 but I have pregaps ranging from 00:00:05 to 00:02:00.

Re: Help! How Do You Automate Verifying A Large FLAC Music Library?

Reply #12
Thank you all for your good suggestions and ideas. 
It's fabulous to have so much helpful input!

I've attached 2 files representative of what I'm trying to solve.

(For info, these tracks were loaded in my system by a friend around 15 years ago.  I don't know what went wrong at the time or how they were loaded.)

Track 4 - 04_Joan Of Arc is supposed to be 7:57.  The version I have here is only :26 in length and cuts off.
I ran (vu.com) Audio Tester program on my library and this file doesn't show up as corrupt.

I tried the track in MusicBrainz Picard and CueTools and they don't show any problem.

Track 5 - 05 Ain't No Cure For Love is the wrong song and actually has the audio for track 6 "Coming Back To You".  The files are reversed in my library.  I have no idea how they could be loaded that way but that's what I found.

I hope I don't have too many errors in the library as I did handle any corrupt files using Audio Tester. 
But I am trying to root out any other errors or erroneous metadata in many thousands of songs.

Any further advice on how to go about this is welcome.   I hope the attached files help.
Thanks!


1 attachment removed TOS  9 - Please keep music clips under 30 seconds

Re: Help! How Do You Automate Verifying A Large FLAC Music Library?

Reply #13
An additional question on this:

Is there some audio software that can look at a FLAC or WAV file's supposed length and compare it with the real length of the file?
For example, I have a song that gives a supposed length of 7:09 in the media browser, but when you try to play the song it stops abruptly about 55 seconds in and won't play past that.  The file has become corrupted but how you do know unless you try to play it?
 
There must be software that would be able to scan all of one's audio files for this type of problem.
Even if it took a couple of days for the software to read through my library it would be worth it.
Any advice on that?  I have Audio Tester which generally works well for FLAC. 
Any other software that tests both FLAC and WAV?

Re: Help! How Do You Automate Verifying A Large FLAC Music Library?

Reply #14
Track 4 - 04_Joan Of Arc is supposed to be 7:57.  The version I have here is only :26 in length and cuts off.
I ran (vu.com) Audio Tester program on my library and this file doesn't show up as corrupt.
I ran the file through foobar (as previously mentioned) and the following warning was observed:
Code: [Select]
Warning: File contains ID3v2 garbage
Warning: Garbage at the end of file (ID3 tag?)
It's not an audio problem though so maybe it was a ripping error. Normal file checkers won't find these.

I tried the track in MusicBrainz Picard and CueTools and they don't show any problem.
If your files are from CD rips then CUETools will be your best bet, so could you include the log from a verification pass as if CUETools say's it's ok, then it's ok.


EDIT: Testing with flac though did result in errors, so not sure what foobar is doing (I had assumed it would do the same).
Code: [Select]
flac.exe -wst "d:\downloads\04_joan_of_arc.flac"
04_joan_of_arc.flac: *** Got error code 0:FLAC__STREAM_DECODER_ERROR_STATUS_LOST_SYNC

04_joan_of_arc.flac: ERROR during decoding
                     state = FLAC__STREAM_DECODER_END_OF_STREAM
Looking more closely at the foobar verify result it did have a status of 'Decoded with minor problems' but I think that's in relation to the metadata Warning. Mp3tag confirmed that there is ID3 data in the FLAC, I assume this was ripped with EAC with the 'Add ID3 tag' enabled.

Re: Help! How Do You Automate Verifying A Large FLAC Music Library?

Reply #15
I ran the file through foobar (as previously mentioned) and the following warning was observed:
Code: [Select]
Warning: File contains ID3v2 garbage
Warning: Garbage at the end of file (ID3 tag?)
It's not an audio problem though so maybe it was a ripping error.
Might be that one has used ExactAudioCopy and checked the box for ID3 tags. Then it will do things like that. Can be removed with Mp3tag, IIRC.

Normal file checkers won't find these.
Reference flac.exe started detecting those by release 1.4, which still is quite new. Audiotester.exe uses an older flac.exe, and I bet it is not alone about that.

Re: Help! How Do You Automate Verifying A Large FLAC Music Library?

Reply #16
Normal file checkers won't find these.
Reference flac.exe started detecting those by release 1.4, which still is quite new. Audiotester.exe uses an older flac.exe, and I bet it is not alone about that.
This comment was made after I'd used foobar2000 (latest) Verify and assumed it would have caught such errors. Given there were none I thought a possibility was that the truncation could have happened at the ripping stage, somehow, which normal file checkers wouldn't catch - assuming something converted it from a truncated WAV to FLAC.

Seems to me that CUETools (if from CD) or FLAC -t is the way to go. In terms of the metadata, good luck!

Re: Help! How Do You Automate Verifying A Large FLAC Music Library?

Reply #17
EDIT: Testing with flac though did result in errors, so not sure what foobar is doing (I had assumed it would do the same).
Code: [Select]
flac.exe -wst "d:\downloads\04_joan_of_arc.flac"
04_joan_of_arc.flac: *** Got error code 0:FLAC__STREAM_DECODER_ERROR_STATUS_LOST_SYNC

04_joan_of_arc.flac: ERROR during decoding
                     state = FLAC__STREAM_DECODER_END_OF_STREAM
Seems to me that [...] FLAC -t is the way to go.
Sorry to disappoint, but the scary FLAC -t error message comes from the ID3v1 tag at the end. If you remove the tag FLAC -t finds nothing.

Re: Help! How Do You Automate Verifying A Large FLAC Music Library?

Reply #18

FLAC has an audio checksum. (All sane lossless compressed formats offer some kind of checksumming, at least optionally. ALAC is not sane.) *

I've just changed over 100 random bits in my FLAC file and it's passed testing with zero issues.

I still don't understand what kind of checksum people here are talking about.

Re: Help! How Do You Automate Verifying A Large FLAC Music Library?

Reply #19
The encoded audio frames contain CRC checksums and the entire decoded audio is checked against MD5. If you edit bits in metadata or padding blocks, those obviously have no effect.

Edit: corrected "undecoded audio" to "decoded audio".

Re: Help! How Do You Automate Verifying A Large FLAC Music Library?

Reply #20

FLAC has an audio checksum. (All sane lossless compressed formats offer some kind of checksumming, at least optionally. ALAC is not sane.) *

I've just changed over 100 random bits in my FLAC file and it's passed testing with zero issues.

I still don't understand what kind of checksum people here are talking about.
There's normally an MD5 of the raw audio, which can be checked by a full decode. The MD5 is optional and many retailers omit it because they're sloppy idiots.

There's also crc8 of every frame header and crc16 of every frame which is not optional. It's very unlikely that you can change 100 bits of frame data without getting caught by the crc's. Metadata is not covered by crc so 100 bits of errors there will probably pass unless it hits the metadata chunk header creating an invalid chunk size/id. Old versions of ./flac -t checked less things, if you're sure the errors hit frame data try testing with v1.4.3.

Re: Help! How Do You Automate Verifying A Large FLAC Music Library?

Reply #21
Seems to me that [...] FLAC -t is the way to go.
Sorry to disappoint, but the scary FLAC -t error message comes from the ID3v1 tag at the end. If you remove the tag FLAC -t finds nothing.

It seems that 1.4.2 does indeed mask the tag "issue" with
Code: [Select]
FLAC__STREAM_DECODER_ERROR_STATUS_LOST_SYNC
Whilst 1.4.3 identifies it correctly
Code: [Select]
WARNING, ID3v2 tag found. This is non-standard and strongly discouraged

So CUETools or FLAC -t (using 1.4.3)/foobar is the way to go  :)

Re: Help! How Do You Automate Verifying A Large FLAC Music Library?

Reply #22
OK, here's what I've found out by testing.

FLAC contains CRC only for audio data, everything else is not checksummed or verified.

At least a simple decoding test will give a guarantee that audio is intact. Tagging information and the built-in cover art are not verified.

My statement earlier about the need of hashing entire files and having PAR2 recovery for them holds true.

Re: Help! How Do You Automate Verifying A Large FLAC Music Library?

Reply #23
You edited the first 200 KB of the file and the file contains 594 KB artwork. All your bit edits affected the art.

Re: Help! How Do You Automate Verifying A Large FLAC Music Library?

Reply #24
As far as bad embedded images goes If it's a .png or .jpeg it'll load wrong if it loads at all with corrupted or missing parts, if it's a .bmp it'll be a single bad pixel unless the header is destroyed.  Hashing an entire file can be helpful in this regard but if you change the tags, the hash will be incorrect.