Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: An argument against tag-centric music players (Read 13087 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

An argument against tag-centric music players

I'm really tweaked when I do tag work in foobar2000, and for whatever reason (usually relating to not enough tag buffer existing in the file etc), the entire fraction-of-a-gigabyte file gets rewritten to change a measly ~200 bytes. But I just realized the issue may be bigger than just pulling my chain.

The current unrecoverable error rate for hard drives is in the vicinity of 10^-15 bits. The current error rate for memory is around 10^-12 flips/bit*hr. Let's say that you have a 1TB (8e12 bit) FLAC music library, and you need to redo the tags on them. For whatever reason, your music player decides it needs to rewrite each file in total to get this done. (In my experience, this has to be done at least once - depending on how the creator of the file handled the tagging buffers - and may need to be done multiple times.) Your hard drive array has a bulk read/write speed of 50MB/s=400Mb/s for reading the file off disk, slapping on the new tag and writing it back out. Assume for the sake of argument that I/O is done with a 1MB buffer (this is either excessively high or excessively low depending on what you're looking at).

The total operation takes 8e12/400e6=2e5 seconds (about 5.6 hours) to complete.
The probability of an unrecoverable hard disk error occurring during this timeframe is 8e12*1e-15 = 0.8%.
The probability of an unrecoverable RAM error occurring during this timeframe is 8e6*2e5/3600*1e-12=0.044%.
The probability of the library being corrupted by either the hard disk or the RAM is 1-(1-0.8%)(1-0.044%)=0.84%.

Those are not odds I enjoy, especially because (with the joys of facets) I wind up retagging reasonably often. Of course, there are two extremely obvious solutions to this that are commonly proposed: a) verify files after modification/copying, and b) use RAID/ECC. But the latter is still rather expensive (compared to the baseline of just slapping a single 1TB drive up) and the former doesn't keep this sort of thing from happening in the first place. I'd like a solution that minimizes the error rate on the hardware I already own, thank you very much.

Rather.... I am wondering if there is a more direct, and clean, solution to this in the music player itself. iTunes already does it: it doesn't store its tag information in the file. It stores it in a database typically under My Documents. Of course, that causes all sorts of problems for people who want to pull their media away from iTunes or otherwise don't use it as a music manager - but for those of us who use the fb2k media library or another similar scheme, we only really need the tag information to be applied when we transcode or back stuff up. We never really need the file tags inside the program, and given what I'm pointing out, I think that the file-based approach has a significantly increased (if theoretical) risk of file corruption.

In short, I think we as a community should revisit the concept of database-oriented tagging as being superior to file-based tagging on reliability grounds. Database tagging obviously doesn't solve the problem entirely, but it is a far more effective use of hard disk activity. Databases can be backed up much more easily and can have more error-recovery information embedded in them than is typically available in music files.  I think that retag-triggered file rewrites occur far, far too often, and music libraries are advancing to a size where a rewrite should be avoided as much as possible.

An argument against tag-centric music players

Reply #1
You can just avoid the problem without discarding updating tags in files.

I rip CDs to Flac files with J. River MC 12.  MC 12 places some padding in the Flac file so that routine tag edits don't require rewriting the entire file.  Adding album art might require more padding.  Tag updates are very quick and don't require rewriting the whole file. 

When I started using MC 11 about 3 years ago, I noticed that tag updates to MP3 files were much slower than updates to Flac files. Now both are updated quickly.

I imagine that some other ripping programs insert padding in a similar way.

I've moved my music files from PC to PC several times and copied files from one drive to another more times.  Having up-to-date tag values lets me do that without any worries about breaking the link between the library database and the files themselves.

Keeping tag updates out of the music files does let you make the music files read-only. That can protect you against inadvertent changes.  However, archiving music files as you make them with initial tag values can provide similar protection.

Bill

An argument against tag-centric music players

Reply #2
Piped FLAC encoding in foobar2000 (i.e. FLAC is "told" the file is max size and not told the actual size) automatically adds a 64kB padding block which should be adequate for most tags.
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)

An argument against tag-centric music players

Reply #3
The total operation takes 8e12/400e6=2e5 seconds (about 5.6 hours) to complete.
The probability of an unrecoverable hard disk error occurring during this timeframe is 8e12*1e-15 = 0.8%.
The probability of an unrecoverable RAM error occurring during this timeframe is 8e6*2e5/3600*1e-12=0.044%.
The probability of the library being corrupted by either the hard disk or the RAM is 1-(1-0.8%)(1-0.044%)=0.84%.


The RAM number seems way too high to me. What's your source?

The hard disk number is correct though, as your conclusion that this a pretty high risk for large collections. But there is a very convenient solution: Put it on a NAS running ZFS. It employs transparent parity checking and copy-on-write. That means you can do a snapshot of your whole TB collection in about 1 sec. before tagging and it won't be larger than a MB in size. If your tagging computer corrupts a file due to a RAM defect, you just copy it over from your snapshot.

I have sold the service to set those up a couple of times.

EDIT: Just read your whole post again. You want a single disk solution. For a database to become more reliable we would need a rock solid file to database entry mapping, for example, by using checksums. Else you could lose your current tagging information when your directory structure gets modified (e.g. a backup program or NAS cropping long directory entries).

An argument against tag-centric music players

Reply #4
As a foobar user this probably won't be very helpful to you but I thought it worth mentioning that in winamp you can control this to some degree. It maintains a database, but you also have the option of writing tags to the file. Of course if you choose not to write to the file you'd have to remember to write the tags to the file before transcoding, even if you transcode from within winamp it will use the tags from the file, not the database.




Quote
especially because (with the joys of facets) I wind up retagging reasonably often

Just out of curiosity, what did you mean by this? What is the joy of facets you speak of, and why do you retag often?

 

An argument against tag-centric music players

Reply #5
I've done a lot of batch retagging. I'm only at about 650GB or so of music, but even so, I have never experienced the problem you are talking about, not even once.

Have you completely ruled out a faulty drive?

An argument against tag-centric music players

Reply #6
I've done a lot of batch retagging. I'm only at about 650GB or so of music, but even so, I have never experienced the problem you are talking about, not even once.

Have you completely ruled out a faulty drive?


If you read it again, there hasn't happened any problem yet. There is just a theoretical possibility of ~0.8%. So you're probably within the 124 of 125 cases were everything works fine.

An argument against tag-centric music players

Reply #7
If you read it again, there hasn't happened any problem yet. There is just a theoretical possibility of ~0.8%. So you're probably within the 124 of 125 cases were everything works fine.
So then, is the mentioned behaviour of foobar2000 actually true? It seems to me as though the best course of action here could be to simply change the behaviour, then the problem is solved. Furthermore, most of the major tagging systems use proper padding techniques to prevent whole-file rewrites.

An argument against tag-centric music players

Reply #8
Furthermore, most of the major tagging systems use proper padding techniques to prevent whole-file rewrites.


I agree. As long as new tags are smaller than the free space remaining inside the padding, there should be no problem with all apps that don't do full rewrites. Modified or added album art and embedded booklets should most of the time cause rewrites, though.

An argument against tag-centric music players

Reply #9
If you're so paranoid about tag editing causing full file rewrites, switch from FLAC to another format that stores tags at the end of the file so rewriting is never needed, such as WavPack or TAK or APE.
Microsoft Windows: We can't script here, this is bat country.

An argument against tag-centric music players

Reply #10
So have I got this correct? Ape and id3v1 tags go at the end of the file and id3v2 go at the begining. So this problm only (potentially) effect id3v2 tagged files, is that right.

An argument against tag-centric music players

Reply #11
In short, I think we as a community should revisit the concept of database-oriented tagging as being superior to file-based tagging on reliability grounds.

I've always preferred the concept of external tagging rather than relying on application-specific databases. The "ideal" scenario in my book is one in which tags are stored in an external XML file within the audio file's directory, but odds are few people share my opinion on that.

An argument against tag-centric music players

Reply #12
So have I got this correct? Ape and id3v1 tags go at the end of the file and id3v2 go at the begining. So this problm only (potentially) effect id3v2 tagged files, is that right.

  • ID3v2 tags - beginning of file (*)
  • APE tags - end of file (*)
  • ID3v1 tags - end of file
  • Vorbis comments (Vorbis, FLAC, Speex) - beginning of file
Respective specifications allow those tags to be placed both at the beginning and the end of a file, but mainstream implementations support only either case.
Microsoft Windows: We can't script here, this is bat country.

An argument against tag-centric music players

Reply #13
I'm really tweaked when I do tag work in foobar2000, and for whatever reason (usually relating to not enough tag buffer existing in the file etc), the entire fraction-of-a-gigabyte file gets rewritten to change a measly ~200 bytes. But I just realized the issue may be bigger than just pulling my chain.


This is why padding exists.  Use it properly and you won't have issues like this.

An argument against tag-centric music players

Reply #14
So, padding is not news to me. But I was somehow under the impression that its use with FLAC was universal - but it seems like every CD I rip to FLAC needs to be rewritten the first time I fix the tags in fb2k. After that first time, it does not appear to happen as often - but as I don't have a good way to check if this is due to padding or not, I can't really say what's going on with authority.

Is there some way I can check the padding of my files?

I am pulling the RAM size out of my butt. In theory the working memory set might be closer to 4k, but this depends a lot on what kind of data structures exist in memory and what needs to remain uncorrupted. If one includes the binaries and OS as part of the working set the figure could be much higher.

I agree that end-of-file tagging would solve this issue too. The data stream itself never needs to be rewritten. I guess this turns into a better reason to switch to TAK

An argument against tag-centric music players

Reply #15
I think you are over estimating the risk of bad data coming from your hard drive.  The error rate you quote is unrecoverable, not unreported.  Assuming the data was written correctly in the first place, the chance that a modern drive will return incorrect data without error is almost nil.  If you get an unrecoverable error your OS will report it when you try to access it, such as rewriting the tags.

ZFS will only help IF you have a mirror or raid-Z(2).  With a mirror or a raid-Z(2) ZFS can use it's built in CRC check to determine which copy of the data is correct.  If you only have a single drive, ZFS will report the error, but so will the drive so you have gained nothing.

An argument against tag-centric music players

Reply #16
This sounds like nothing more than a hobbyist's meddling speculation to me.

But rather than dreaming up new methods to perfectly apply the white flock to our "snow covered" trees at the end of the tunnel for our model train set, we worry about random bits flipping. 
elevatorladylevitateme

An argument against tag-centric music players

Reply #17
My library is quite huge: I have 2000 physical CD + 2500 ripped from public library and… some others. All files I have were encoded either with foobar2000 or dbPowerAmp CD ripper. I often updated my flac files in the three last years (by adding a new field to all files, by replaygain scanning, by scripting things, etc...) and I must say that tagging process is quite fast due to padding. Since flac 1.13 : « The default padding size is now 8K, or 64K if the input audio stream is more than 20 minutes long » (source). And it's easy to increase the size of padded area for specific needs (such as album art). If tagging wasn't that conveniant (and secure), I would either immediatly switch to another format (based on APEv2) or use masstagger tools with a lot of care.

So unless something goes wrong with a CD ripper or massconverter, users shouldn't encounter such massive writing time with its Flac library (or any other format based on APEv2 tagging).

An argument against tag-centric music players

Reply #18
I think you are over estimating the risk of bad data coming from your hard drive.  The error rate you quote is unrecoverable, not unreported.  Assuming the data was written correctly in the first place, the chance that a modern drive will return incorrect data without error is almost nil.  If you get an unrecoverable error your OS will report it when you try to access it, such as rewriting the tags.
My first inclination (and I'm speaking as a Windows software developer here) is to not trust error reporting on these matters, especially when it comes from the OS. That said, part of that stems from the fact that I've never really heard of people getting I/O errors of this nature, while I myself have experienced FLACs going bad for completely inexplicable reasons. And don't get me started about SMART, although that's not really pertinent to this discussion. Windows's disk error reporting otherwise is pretty good.

This sounds like nothing more than a hobbyist's meddling speculation to me.
Of course it is. This is a forum, after all. What were you expecting? If I wasn't taking the piss, I would have blogged it.
 
 
Quote
But rather than dreaming up new methods to perfectly apply the white flock to our "snow covered" trees at the end of the tunnel for our model train set, we worry about random bits flipping. 
That's among the most trite metaphors for data corruption that I've ever seen - as an idle concern for model railroad geeks. I'm not particularly sure I disagree with it, but it sure is impressive.

I'll get back to y'all with some more in depth analysis of my situation, because clearly I'm an exceptional case in having this issue. All I can say at this point is that I use EAC 0.99pb4, encode with FLAC 1.2.1, and apply RG tags in foobar2000 - and the tag write takes a hell of a lot more disk writing than it should. A lot of the (legal) free MP3s I get from classical sites and such also respond pretty poorly to RGing, genre tagging and the like.

Even if I'm misconfigured, though... 64k is not enough for album art. EOF tags do not help the formats that don't have them and have considerably reduced default padding sizes (MP3, Ogg etc). And my collection isn't composed entirely of FLACs that can be padded further. While I myself can avoid getting bit by this, there are some hypothetical situations I can't discount where I would be affected, I can see plenty of other people (especially non-computer types with large MP3 collections) getting bit.

Unless there is an extremely widespread push for EOF tagging formats, which I believe is unlikely, I think this will remain an intrinsic problem with these formats and players. This can be avoided by individual users, of course, but it raises the risk of corruption among a large group of users, only some of whom will take precautions.

And... given that iTunes (and now, apparantly, WinAmp) already does this, I don't think it's rocket science. All I'm basically proposing is a caching database for fb2k which intercepts all tag reads/writes inside the media library, which AFAIK does not presently exist, and then allow for that database to write those tags back out in the Converter. If I had a little more copious free time I'd consider writing it myself.

What I'm saying here is that, while the usefulness to any one individual user (let alone a clued user) of this technique is rather low, the overall corruption rate among all the users of a particular player (like fb2k) could/would be reduced considerably, among all the many petabytes of music that fb2k plays and rewrites. And that, in my case, I had a strong prejudice against this sort of db scheme in the past (based on terrible experiences with the Windows Registry and iTunes) - and I'm rapidly changing my views on the matter.

Again, if anybody knows of an easy way to inspect FLACs for this sort of thing (padding sizes etc), please let me know.

An argument against tag-centric music players

Reply #19
I'm also paranoid about getting files corrupted while they are being tagged. That's what i usually do
0) i written a simple export script for mp3tag which exports filename and md5 for audio part to .csv file
Code: [Select]
$filename(md5list.csv,ansi)
$loop(%_folderpath_rel%)
$loop(%_filename_ext%)%_filename_ext%            %_md5audio%
$loopend()$loopend()
So then when i want to tag a bunch of files
1) running this script on bunch of files i am going to tag and saving resulting .csv file as before.csv
2) making a backup of files i am going to tag
3) tagging them with mp3tag (or any other tool)
4) running the script on bunch of files i tagged and saving resulting .csv file as after.csv
5) if before.csv is bit-identical with after.csv (it's easy to check with total commander) then deleting backup of files otherwise investigating what's changed
6) now you can rename files with mp3tag if needed
Something like this. Totally paranoid i know ;-)

An argument against tag-centric music players

Reply #20
My first inclination (and I'm speaking as a Windows software developer here) is to not trust error reporting on these matters, especially when it comes from the OS. That said, part of that stems from the fact that I've never really heard of people getting I/O errors of this nature, while I myself have experienced FLACs going bad for completely inexplicable reasons. And don't get me started about SMART, although that's not really pertinent to this discussion. Windows's disk error reporting otherwise is pretty good.


I too have had files become corrupted for seemingly inexplicable reasons.  However, once I investigated it was always something other than the hard drive.  Bad cables, bad disk controllers, bad USB enclosures, bad network cards, bad RAM or software/OS failure are all more likely suspects for random data corruption. 

I'm not saying hard drives are perfect.  I've lost data to hard drive failure before.  Bad sectors give pretty obvious errors when trying to read data.  However, I have never had a drive silently return bad data that couldn't be attributed to some other failure.


An argument against tag-centric music players

Reply #21
So, I'm in the process of ripping and tagging the Messiaen Complete Edition, and all tag writes appear pretty damn fast. I ripped on two separate computers too (with slightly different EAC+flac installs). So that almost certainly appears to be some kind of false alarm. Sorry!

I think I may have seen this more with my classical MP3 downloads, but for now, I'll shut my hole. I'll just say for now that if this situation does occur commonly with one music server configuration or another, it should be a count against it (regardless of it being currently hypothetical).

I will also say that like washu I do not trust my cables much, but that should make my error risk calculations underestimated.

An argument against tag-centric music players

Reply #22
And what if you wanted to transfer music to another PC/device? If the tags are embedded in the files, then you don't have worry about retagging the files on the 2nd device. But if they're not, you may have a very annoying problem on your hands, especially if the device in question doesn't read from your database.

IMO the convenience is more than worth the risk. In my nearly 8 years of collecting music - I have nearly 13K songs now - I've had exactly ONE file corruption, and it was easily dealt with by removing the tag and retagging the file. I do a daily backup to an portable external drive that I take with me to work just in case my home gets wiped out for whatever reason ... I think that's enough for me
EAC>1)fb2k>LAME3.99 -V 0 --vbr-new>WMP12 2)MAC-Extra High