Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: Lossless audio with error correction. (Read 7876 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Lossless audio with error correction.

Bandwidth and particulary storage capacity have increased significantly over the years. Cost per megabyte of storage has probably decreased by more than a factor of 1000 since MP3 first 'went wild' (ie: escaped from Fraunhoffer).

Lossless has always been my preferance, but I stupidly encoded things to MP3 at a crappy bitrate back in the old days to save dollars, and now I regret it.

But to the topic at hand, I am interested in preserving my audio over the coming years, and every medium (except apparently one : an m-disc) degrades over time, some worse than others. It is all very nice storing things lossless as whatever-format-you-desire, or even storing the original PCM now that storage is so cheap. However, they are all subject to data degradation, with some formats being worse than others. Surprisingly, or perhaps not so, PCM is actually one of the most error tolerant!

I have attempted to find out any information on error tolerance of the major (and free) lossless codecs - WavPack, FLAC and Monkeys Audio. However, I haven't found much, and that which I have found all points towards these codecs only being able to 'handle' errors. Some formats replace a 'block' with silence, others bomb out with an error (which seems to be more player specific than codec specific). I (personally) think this is an absolutely terrible solution , as for example, a single bad byte would/could cause the loss of an entire 'block' when losslessly encoded, but only a single sample when stored 'raw'!!

Now to my point (finally)... what about using a lossless codec to compress an audio stream, and then use the saved space to add back some error correction? Bandwidth is getting pretty good, but storage space is now just so riduculously cheap!

The reason I suggest putting this functionality into the codec is so a data (audio) stream can be played back immediately without having to progress through a separate error correcting step, which would not only be painful, but leave the data in the middle step open to corruption. It would be far preferable to store an audio track on something that could be immediately played back.

I would be happy to open a discussion on...
1 - Whether this is possible without breaking compatability with an existing format.
2 - The pros and cons of convolutional/block error coding.
3 - If is is considered that the base error correction built into a format (eg: PRML for memory sticks & hard discs, EFM & parity encoding for DVDs) is 'enough'?
3 - If people think it is a stupid idea and I should stick with separate error correction (on top of base format correction) & audio compression steps.

Talking a little about (1), it seems that although WavPack and Monkeys Audio have some hardware support, it is fairly rare. FLAC seems to be pretty good for hardware support. So, for the FLAC format, could extra data be embedded into the file/stream that would allow a 'correction aware' decoder to correct errors, but allow a legacy decoder to ignore the extra data and still operate correctly?

Cheers,
MM.



Lossless audio with error correction.

Reply #1
I have attempted to find out any information on error tolerance of the major (and free) lossless codecs - WavPack, FLAC and Monkeys Audio. However, I haven't found much, and that which I have found all points towards these codecs only being able to 'handle' errors. Some formats replace a 'block' with silence, others bomb out with an error (which seems to be more player specific than codec specific). I (personally) think this is an absolutely terrible solution , as for example, a single bad byte would/could cause the loss of an entire 'block' when losslessly encoded, but only a single sample when stored 'raw'!!


Most types of failure on modern devices will take out an area of storage large compared to whatever the minimum frame size is in the format, so it really doesn't matter.  Particularly with with modern flash or magnetic storage, failure will most likely corrupt areas of storage much larger than individual files. 


Now to my point (finally)... what about using a lossless codec to compress an audio stream, and then use the saved space to add back some error correction? Bandwidth is getting pretty good, but storage space is now just so riduculously cheap!


This is probably not a great idea when you could just apply error tolerance to the entire storage more efficiently and without needing to update software or reencode files.  More practically, depending on your source of errors to be such that they are randomly distributed seems  foolish when very often storage devices fail such that large sections or the entire device become inaccessible.  A better idea would be to setup a system that is resilient against such errors.


Lossless audio with error correction.

Reply #2
A standard backup plan should suffice. 

It's really rare to get a corrupted bit or byte here or there...    One wrong bit in your bank account balance could just as easily be a million dollar error as a 1 cent error.  Right now we are sending these posts all over the world through wires, wireless, fiber optics, all kinds of routers hubs & switches and if you see a misspelling in this post, you can be pretty sure it's my human error and not some scrambled data somewhere...   

I'm going to quote Arnold B. Krueger from different context.
Quote
If perfect digital data storage, processing, and transmission were not the general rule, our whole modern society would collapse in a heartbeat.



Lossless audio with error correction.

Reply #3
Thankyou for your comments.

I believe my point has been missed a bit in the first reply... where we talk about storage device failure. Yes, a data storage 'block' is quite large (legacy was 512 bytes, but CD/DVD and modern hard disks use blocks that are much larger than that), but recovery from such a loss would (should actually) be considered a disaster recovery strategy, not an error correcting methodology (even though the loss of a 4K block in a 40M file is well within the realms and abilities of error correction).

The second post seems to address my concern a little better - although a backup plan is not really what I am talking about (similar to the first reply). Anyone without a data backup plan is just asking for trouble!

But when we start talking about the corruption of individual bits on their way here and there it gets much closer to my point. Errors in data transmission might well be uncommon, but certainly not negligible! Have you ever downloaded an MP3 and found it was full of sqawks and blips? Do you think it was encoded that way in the first place? Modern P2P software is much better than it was back in the days of Napster, but still, those errors must have occurred somewhere along the chain, certainly not at the head.

Admittedly, enthusiasm for this seems to be very low (if indeed there is any at all). However, the reason I suggested no.1 (adding data to a format - any format - that allows 'upmarket' players to correct, but remains compatable with existing players) is the main thrust. I guess I should have made that clearer from the start!

There would be absolutely no need to re-encode anything. All existing files would play fine, just not have the error robustness of later files.

I know there are some formats out there that allow additional data to be stored in the container. Bastardised though the format is, WAV is in fact one of those (which I am sure you both know). I believe a 'lossless' version of MP3 was created (MPC??) that played just fine on a standard MP3 player, but that held an additional CHUNK that contained all the information to turn the lossy MP3 back into a lossless original. It never caught on because the files were nearly as large as the original, and by that time there were much better lossless encoders out there.

Is the best way to preserve a recording to guard against bit corruption some sort of RAW format, leaving the data correction strategy completely up to the storage device firmware, and relying on backups for 'disaster recovery scenarios'? I've dealt with audio that has suffered from random bit errors, and by far the most robust (digital format) is WAV. Even back in the DAT days, audio sounded just fine when everything worked, but just like modern lossless encoders (the ones I've seen anyway), an error causes the loss of an entire encoding block! Not only noticeable, but extremely annoying, and irrecoverable.

I don't want to get involved in any sort of flame war over how bits and bytes are stored/transferred or the error correction used at the "physical" layer (ie: one of the layers undeneath what the user sees as data). What I would like to know is whether anyone thinks maintaining data integrity at a higher level for audio is worthwhile? And can it be done transparently?

Thanks for the comments guys. I'm glad that at least you've given it some thought!

Cheers,
MM

Lossless audio with error correction.

Reply #4
But when we start talking about the corruption of individual bits on their way here and there it gets much closer to my point.


Errors measured in bits don't really happen.

Is the best way to preserve a recording to guard against bit corruption some sort of RAW format, leaving the data correction strategy completely up to the storage device firmware, and relying on backups for 'disaster recovery scenarios'?


Yes. 






Lossless audio with error correction.

Reply #5
I have attempted to find out any information on error tolerance of the major (and free) lossless codecs


Do not. Let backup take care of it. Consider also parity protecting your drives as a supplement to (not substitute for!) backup.

You do want a checksummed file format, which can detect errors (and tell you when you need to replace a drive, and which drive has the "good" file). There are differences there, I know that FLAC does checksum and ALAC does not (neither does the mp4 file used to contain it).

I recently had a filesystem failure. Possibly the drive was about to fail, but at least surely the NTFS file system was - that was when I learned about Windows' "delayed write" (turn it off, turn it off, turn it off!).
What happened, is that it wrote portions of one file to another (by not updating the master file table).  One Deep Purple CD was overwritten to file segments that belonged to a handful of tracks, so that a couple of minutes of that file would play Deep Purple. Even if I hade an error tolerant codec/fileformat, it would never have tolerated and repaired a wrong couple of minutes in a single file.

Lossless audio with error correction.

Reply #6
While I agree with the other posters that the "bit-rot" type errors you are worried about are extremely unlikely, there are a couple of options that do what you want but not directly.

You could keep your backups in an archive format that supports recovery records.  This adds additional blocks to the archive that allow for recovery from some bit-level corruption.  RAR supports this, but doesn't do it by default. 

Another option is creating PAR recovery files for your music files.  The PAR files can then be used to repair any damage to the protected data, up to a certain point.  You can specify how much redundancy you want VS file size.  PAR was designed to work with Usenet, a service that commonly loses entire posts or blocks of files.


Lossless audio with error correction.

Reply #7
Store the data on a error correcting storage subsystem. ZFS has native checksumming of every block, and if you use redundancy in the pools it can detect the error and correct it entirely at the filesystem layer. There is no need for error detection and correction at the codec or application level when the system responsible for storing the data also guarantees it's integrity.

Lossless audio with error correction.

Reply #8
For home backups of my pictures, video and audio I use dvdisaster - it creates DVD (or CD, or BR) filled with correction data - of course, you will have to make image first, and leave it at least 20-30 % of space for correction to unleash the potential. But that is only first line of backup - images are backupped to online service, and music/video files are available as original cd/dvd/br.
AFAIK, there is no music format that nativly supports redundancy. dvdisaster creates compatible DVDs, you can read them everywhere.
Error 404; signature server not available.

Lossless audio with error correction.

Reply #9
Simply make copies stored as far away from the original bits as possible.
"I hear it when I see it."

Lossless audio with error correction.

Reply #10
Thankyou to everyone for reading & responding.

From (all) the replies I can see that support for such a venture is not there, so I will leave pushing the subject alone - but still read & respond to replies.

@hlloyge: Yes!! I also use DVDisater for augmenting my DVDs, and I use m-discs for the physical format. I love that it is completely transparent, although without going deep into the code, I don't know how it puts the RS02 data onto a disc... is it spread throughout the entire disc, or just tacked invisibly onto the end of the existing ISO? And in the world of error correction, does it make a difference? I love the program, and would highly recommend it to anyone, even if they do keep a plethora of backups!

On that note, I am not a muso writing my own music, so I only use DVDisaster for my own pictures and video. I would use this if I wrote my own music, but thought perhaps some format that could correct on-the-fly would be nice as well.

@washu: I looked at PAR, which adds correcton in a visible fashion (unlike DVDisaster - which, for those who don't know, only works invisibly on ISOs). I like the way it works, but (from what I can determine) the source files are broken up into multiple pieces, which is difficult and unweildy. I wasn't aware that RAR had a similar feature - that does sound interesting... a source file remains as one file... More reading required on my part. It still requires at least two steps to retrieve a file though... unRAR, transfer to media player, play...

@Porkus: I was actually trying to fiind out what happens when an error is encountered, not actually if the codec attempts to do anything interesting about it (correct - no, conceal - no?, skip - ??, silence - probably). From reports in other forums, it seems that the encoded block is just turned into zero samples (how many?). Without getting deep into the code of each codec, that sort of info is not on the home pages of the codec coders (at least I have not been able to find it).

I did have a look at the FLAC specification, and there is a lot of scope for additional data inside a FLAC container. However, I could not determine if it was allowable to interleave other chunks within a sequence of FRAMEs. The spec is quite clear on each type of chunk, and talks about multiple arbitary data chunks (some can appear only once, and some can appear multiple times), and a statement saying they appear at the start of a file, but no information on whether once FRAME chunks start appearing in the stream, if other chunks can randomly appear throughout. Obviously the guy/s at Xiph who designed the stream format know, but does anyone here know?

Anyway, this is as I said above, going to be my last post pushing for support. Although (if there is a need to) I will still respond to any additional posts.

Cheers and thanks all,
MM.

Lossless audio with error correction.

Reply #11
@washu: I looked at PAR, which adds correcton in a visible fashion (unlike DVDisaster - which, for those who don't know, only works invisibly on ISOs). I like the way it works, but (from what I can determine) the source files are broken up into multiple pieces, which is difficult and unweildy. I wasn't aware that RAR had a similar feature - that does sound interesting... a source file remains as one file... More reading required on my part. It still requires at least two steps to retrieve a file though... unRAR, transfer to media player, play...


PAR is just redundancy implemented in the file format.  It has to be visible to the user.  If you want it to be transparent, get a RAID, which is redundancy implemented at the hardware level. 

@Porkus: I was actually trying to fiind out what happens when an error is encountered, not actually if the codec attempts to do anything interesting about it (correct - no, conceal - no?, skip - ??, silence - probably).


You need to define which format and what type of errors you expect to encounter.  Small errors don't generally occur, but they'll typically not even be noticed by the decoder if they are much smaller than the frame size.  They'll then propagate through the decoding of that frame and perhaps produce an artifact (or not).  Errors much larger than a frame may prevent parsing of the file, which the decoder will notice.  Many formats like mp3 or wma just keep seeking until you find a valid frame.  Typical errors are going to corrupt many, many frames (and probably many files).  In this case error resilience is not really important though because the file will be mostly or entirely lost.

Lossless audio with error correction.

Reply #12
Obviously the guy/s at Xiph who designed the stream format know, but does anyone here know?

Sure: no. First the metadata, then the audio frames, no interleaving.
Music: sounds arranged such that they construct feelings.

Lossless audio with error correction.

Reply #13
@washu: I looked at PAR, which adds correcton in a visible fashion (unlike DVDisaster - which, for those who don't know, only works invisibly on ISOs). I like the way it works, but (from what I can determine) the source files are broken up into multiple pieces, which is difficult and unweildy. I wasn't aware that RAR had a similar feature - that does sound interesting... a source file remains as one file... More reading required on my part. It still requires at least two steps to retrieve a file though... unRAR, transfer to media player, play...


While PAR does support file splitting as you have found, that is not required or even the default.  Every PAR program I have used (Quickpar, Multipar & the command live) default to just creating the PAR files while leaving the originals intact.  I actually use it this way on my music collection and the files are as is and fully playable.  However, I am considering stopping creating PARs of my music as there has never been an error in nearly 10 years.

Lossless audio with error correction.

Reply #14
Mustardman:
It makes sense to parity-protect optical media which could be scratched and corrupt only a little bit of it - indeed, CDs do have that sort of stuff.
It does not make so much sense to parity-protect a single file on a hard drive against a bit/byte/sample falling out, because IF AND WHEN an error happens it will usually corrupt so much more that file-level parity protection is useless.

Most often an entire drive dies. In old days, sector failures were more common (formatting utilities would check sectors and every now and then a bad sector would be left out of a new drive). I've had cases where one or a very few files are corrupted (and the file system howls CRC check error at me) - then first thing was too copy out everything newer than most recent backup and then retire the drive. And - surprising me - I had the above issue where Windows overwrote file segments, and with it corrupting some 20 to 50 percent of the file. You do not let the codec correct that. You use backup.

As for what the codec does / requires ... it should specify how a proper file should be decoded, right? (In fact, AFAIK mp3 is defined through decoder behaviour.) What happens if it encounters a part which is garbage? There is a long way from specifying how to playback a proper stream, to how to playback various degrees of garbage.


(I do not say that parity-protecting audio streams is useless, in fact it could be useful for streaming ... or for *cough* torrenting *cough*.)

Lossless audio with error correction.

Reply #15
I don't see how parity is useful for a torrent given that each chunk is already verified by a cryptographic hash. Sneaking an error past a hashing function is far harder (virtually impossible) as compared to parity.

Lossless audio with error correction.

Reply #16
I don't see how parity is useful for a torrent given that each chunk is already verified by a cryptographic hash.


True, BitTorrent isn't as easily pollutable as WinMX/Kazaa (if musician Jens Johansson's account is accurate), but a parity-protected archive saves those cases where only nearly the full file is available - if you are almost done, you will have greater chances of all or all-but-one tracks working. And most likely leeches will seed for much longer still, even though they have reached a stage where they have everything they want. So ... well, not necessarily for legitimate uses, but ...

Lossless audio with error correction.

Reply #17
I don't see how parity is useful for a torrent given that each chunk is already verified by a cryptographic hash.


True, BitTorrent isn't as easily pollutable as WinMX/Kazaa (if musician Jens Johansson's account is accurate),


Yeah, but thats only because early P2P networks didn't verify files.

but a parity-protected archive saves those cases where only nearly the full file is available - if you are almost done, you will have greater chances of all or all-but-one tracks working.


I don't see how that is true.  The probability that a given file is 100% complete given randomly arriving blocks depends only on the number of blocks in the file and the fraction of blocks you have completed.  Using parity increases the file block count and the total number of blocks in proportion. 

Lossless audio with error correction.

Reply #18
The probability that a file is complete is 100% if you have all blocks that are part of that file and 0% if you don't. 

The probability of finishing the download of a file depends on the availability of each block. Bittorrent does this very well by spreading blocks "randomly". A sequential transfer, like some older file sharing protocols use, would dramatically lower that probability.

But this has nothing to do with error correction. I would keep the amount of transferred information as low as possible unless your transfer channel is unreliable and regularly flips bits, then you can add redundant information to correct those errors.
"I hear it when I see it."