Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: LONG-TERM audio archiving strategy? (Read 83014 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

LONG-TERM audio archiving strategy?

Reply #100
For me, I rip to a single file with cue.
I put that, the cue, logfile, and album art in an appropriately named folder.
I use FastSum to calculate a checksum for all the files.
I then use WinRAR to RAR the folder using a 3% recovery record.
Now, I place them in a holding directory until I have a DVD's worth, then burn to DVD.

I have had to retrieve CDs from the DVD and have not lost any as yet (5 years).


LONG-TERM audio archiving strategy?

Reply #101
Would it be impolite of me to ask why you have files for which you no longer own the CDs?

LONG-TERM audio archiving strategy?

Reply #102
Would it be impolite of me to ask why you have files for which you no longer own the CDs?
Why should he not have them?

LONG-TERM audio archiving strategy?

Reply #103
fb2k's file integrity verifier could help.

put the 2nd HDD in a mobile rack and turn it on only when you sync.
Most HDDs fail on startup; I'm not sure RAID 1 isn't a better solution than offline HDD backups.


yeah, electrical machines are sensitive for transient states (such as turn ons), but unless one is running his PC 24/7 it will be still in much fewer transient states than the 1st disk AND it will be running much less, so the backup hdd will still have MUCH better expected lifetime than if it was a raid disk.

LONG-TERM audio archiving strategy?

Reply #104
Who doesn't run a desktop 24/7 anymore?
elevatorladylevitateme

LONG-TERM audio archiving strategy?

Reply #105
Not really impolite but more a moot point
Most of the CDs are still mine but sit in a unmarked box in another country which i don't frequent too often, others have been scratched to make them unplayable or have been lost.

I appreciate the reply for foobars validation tool but i'm not conviced that it would be able to detect wheter the file has altered but yet it is undecodable and playable, that was why the checksum hash appeals, so that you can really verify if the file is identical to the orginal. The added benifit is that the checksums also validate any other files that accompany the rip and not just the audio files.

What i'd like to know more about is stuff like the par2 files and being able to add some sort of checksum hash (MD5, SHA, etc) to the file plus maybe a parity or similar file to be able to rebuild the file in case of corruption.

LONG-TERM audio archiving strategy?

Reply #106
I appreciate the reply for foobars validation tool but i'm not conviced that it would be able to detect wheter the file has altered but yet it is undecodable and playable, that was why the checksum hash appeals, so that you can really verify if the file is identical to the orginal.


foo_verifier does exactly the same. it calculates the checksum for the audio file and compares it to the checksum written into the audio file's meta data at the time of creation.

LONG-TERM audio archiving strategy?

Reply #107

Well i haven't found an ideal solution for me but the best plan i have come up with is this;

I use a program called "advanced checksum verifier", it allows you to recursively make a small file (ie md5sum.lst) that contains the md5 checksum of the files within a particular folder. It also allows you to verify later that all those files are the same based on their md5 checksum. I think the only thing i don't like is that it doesn't create one for just subfolders with files. While it is possible to re-run the program to create an MD5 for the whole tree it doesn't do it at the same time as recusively processing the subfolders. I imangine this might sound a little confusing but what it means is that if you have 5 subfolders with files in them, then each subfolder has a file md5sum.lst with the checksums of each file withing their respective subfolder, however if you delete 1 of the 5 subfolders, their is no checksum that will determine that folder&files are missing (without creating another checksum for the entire tree).

So this will take care of determining whether the files are corrupt or not.

Next i examined the error correction coding (ecc) programs out there, mainly; Quickpar, Multipar, ICEECC. I think there is room for some improvement in this area. I also considered winrar, 7zip that can add recovery data in case of corruption but its a considerable amount of time to compress audio files that are already compressed and a few other reasons as well that i chose not to use those. quickpar was easy to rule out since it doesn't recursively do folders. Between multipar and iceecc it was difficult - i don't like iceicc in the way that its a closed program and there is no documentation about it, development seems to have stopped, or at least no updates for 2 years. Its hard to feel comfortable knowing that in 10 years if needed this program will still be available and functional - but in anycase after comparing the 2 i do prefer iceicc over multipar - its considerable quicker and seems to handle repairs more easly than multipar. The ICEECC recovery file should of course be stored apart from its data source, the same as with the par2 files.

I wish that someone (even me if i get time) could develop an open source program that would perhaps merge the functionalty of checksumming and ECC so that we are not at the mercy of a bad stick of RAM.

LONG-TERM audio archiving strategy?

Reply #108
I have all my music ripped in flac so it takes quite some space. It's stored on my "server" at home currently without a backup

Harddrives are very cheap and now that at least the lower end NAS are falling in price as well I will shortly considering finding one that holds either 1 or 2 drives and place it at my brother's place for online incremental backup. I figured that I should be able to transfer something close to 20GB each day without interrupting normal daily use of bandwidth. I suppose that should be enough to make a mirror consistent with track/tag-changes*.

* Unfortunately nobody has developed a component that can backup tags from files, although I think that could be extremely useful. Not just will it reduce bandwidth I need to transfer file when only tags have changed, but also it could feature a history of tags from a certain file. It could rely on a unique ID, like the one generated by foo_biometrics.
Can't wait for a HD-AAC encoder :P


LONG-TERM audio archiving strategy?

Reply #110
Interesting. However, the NAS will have to support it and my Windows server seem to be a showstopper as well...
Can't wait for a HD-AAC encoder :P

LONG-TERM audio archiving strategy?

Reply #111
I've been burned enough that i've learned my lesson - even more recently with the corruption

I'm not going to get on a soapbox about it - but i think my story above speaks enough about it.

If you do a local (LAN) backup first and then do incrementals over the internet that should suffice - its hard to imangine 1 day adding so much data that couldn't be transferred in one night - i'd be more concerned about finding a storage system that can handle that kind of daily addition

I realize that ZFS is good for ECC but not everyone wants to run Solaris nor will it work on my NAS, and it doesn't stop corruption that may come from the transfer to or from other systems.

LONG-TERM audio archiving strategy?

Reply #112
I have my lossless audio on 3 hard disks. 2 in the Computer, another one stored in the basement. CD images with embedded cuesheets, additionally exported all cue's in case of file coruption. Bitmaps externally.

* Unfortunately nobody has developed a component that can backup tags from files, although I think that could be extremely useful. Not just will it reduce bandwidth I need to transfer file when only tags have changed, but also it could feature a history of tags from a certain file. It could rely on a unique ID, like the one generated by foo_biometrics.

You mean save tags externally from Audio Collection A) and apply it on backup Collection B (plus maybe making the file date unique) - or what is the idea? Maybe interesting idea, but what would the ID be? Quick and stupid would be e.g. simply the filename, slow and good a one way hash function over the whole audio portion.

Yes, backup is a problem especially if you're a bureaucrat and tags often.

Sometimes I precisely sync A + B (with Totalcommander), then change A ton of tags in a and copy them over to A (with fb2k). This becomes a desaster if the target playlist is not exactly the same. Then give all files a unique date/time in order to be able to sync with Totalcommanders sync tool.

LONG-TERM audio archiving strategy?

Reply #113
Unfortunately nobody has developed a component that can backup tags from files, although I think that could be extremely useful. Not just will it reduce bandwidth I need to transfer file when only tags have changed, but also it could feature a history of tags from a certain file. It could rely on a unique ID, like the one generated by foo_biometrics.


http://www.aboutmyip.com/AboutMyXApp/DeltaCopy.jsp

Quote
In general terms, DeltaCopy is an open source, fast incremental backup program. Let's say you have to backup one file that is 500 MB every night. A normal file copy would copy the entire file even if a few bytes have changed. DeltaCopy, on the other hand, would only copy the part of file that has actually been modified. This reduces the data transfer to just a small fraction of 500 MB saving time and network bandwidth.


LONG-TERM audio archiving strategy?

Reply #114
I had the idea of saving the tags as a seperate file - the same way that in photo's if you save an image in RAW all the changes ie contrast, red-eye reduction, metadata, etc. are all stored in a sidecar file. The orginal raw image is never altered. I think this idea has lots of merit but with some alterations for music files. However i think this is getting away from this topic so i think i will post my ideas into another post on tagging.

LONG-TERM audio archiving strategy?

Reply #115
I have my lossless audio on 3 hard disks. 2 in the Computer, another one stored in the basement. CD images with embedded cuesheets, additionally exported all cue's in case of file coruption. Bitmaps externally.

* Unfortunately nobody has developed a component that can backup tags from files, although I think that could be extremely useful. Not just will it reduce bandwidth I need to transfer file when only tags have changed, but also it could feature a history of tags from a certain file. It could rely on a unique ID, like the one generated by foo_biometrics.

You mean save tags externally from Audio Collection A) and apply it on backup Collection B (plus maybe making the file date unique) - or what is the idea? Maybe interesting idea, but what would the ID be? Quick and stupid would be e.g. simply the filename, slow and good a one way hash function over the whole audio portion.

Yes, backup is a problem especially if you're a bureaucrat and tags often.

I tag a lot and everyday  That's why I wanted a more safe way to keep them in case of harddrive crash.

My idea is simple and I believe it should have some support. I noticed that foo_biometrics creates what I would call a pretty unique ID for each track. I believe that this ID would not clash with any other song unless they are exactly bitidentical. Although it is something like 800 chars long, so i would short it a little from right (i use $right(%FINGERPRINT_FOOID%,8). With this ID you can always know which track a tag might belong to. Personally I have a mp3 copy of all my tracks on my work-laptop, and another great use for such backup/restore functionality would be to sync this collection with tags from my backup, or the other way round... I could rate music at work and sync it back to my primary flac collection

@viktor: I know that delta-copying isn't really cutting-edge technology, but show me something that works with simple FTP and are supported by current cheap NAS! If you go the more expensive way and invest in a Synology box and possibly QNAP, they have support for such things.
Can't wait for a HD-AAC encoder :P

LONG-TERM audio archiving strategy?

Reply #116
@viktor: I know that delta-copying isn't really cutting-edge technology, but show me something that works with simple FTP and are supported by current cheap NAS! If you go the more expensive way and invest in a Synology box and possibly QNAP, they have support for such things.


sync via ftp? that idea is defective by design  ftp is an unreliable protocol back from 1985. it's really obsolete and is unusable for today's needs. i wouldn't rely on it.

anyway, i tag only once  i collect my info and then tag the file. then it gets into my collection. until that it's just an unsorted download/rip. those in the collection are modified really rarely so it won't hurt much to sync them.

LONG-TERM audio archiving strategy?

Reply #117
sync via ftp? that idea is defective by design  ftp is an unreliable protocol back from 1985. it's really obsolete and is unusable for today's needs. i wouldn't rely on it.

And pretty much the only thing available on cheap NAS. I don't make the rules, I just follow them.

anyway, i tag only once  i collect my info and then tag the file. then it gets into my collection. until that it's just an unsorted download/rip. those in the collection are modified really rarely so it won't hurt much to sync them.

Unfortunately we have different needs. I don't see what you are trying to prove here?
Can't wait for a HD-AAC encoder :P

LONG-TERM audio archiving strategy?

Reply #118
@Odyssey: I haven't followed this thread in detail, but if you want to do incremental backups online then Deltacopy - Rsync for Windows may be something for you.

LONG-TERM audio archiving strategy?

Reply #119
sync via ftp? that idea is defective by design  ftp is an unreliable protocol back from 1985. it's really obsolete and is unusable for today's needs. i wouldn't rely on it.

And pretty much the only thing available on cheap NAS. I don't make the rules, I just follow them.

anyway, i tag only once  i collect my info and then tag the file. then it gets into my collection. until that it's just an unsorted download/rip. those in the collection are modified really rarely so it won't hurt much to sync them.

Unfortunately we have different needs. I don't see what you are trying to prove here?


maybe that you're lack of flexibility and want premium solutions on entry level stuff?

and anyway, i don't understand why would cheap NAS be capable only of ftp?