HydrogenAudio

Lossless Audio Compression => Lossless / Other Codecs => Topic started by: jcoalson on 2006-11-11 23:22:37

Title: lossless codec testing
Post by: jcoalson on 2006-11-11 23:22:37
it just occurred to me in another thread that all the comparisons of lossless codecs I've seen include only compression ratios and times.  but people, even ones who are very particular about ripping, exactness, etc. seem to take for granted that their codec of choice is lossless.  are there any large corpus tests to confirm this?

FLAC has a large test suite, but it is designed specifically to find problems with libFLAC.  even though it has found problems with MAClib (monkey's audio) and flake I don't think it is suitable as a complete test for other codecs.

(note that the comparison should not rely on the codec's own internal test features like CRC and MD5 checking.  for example in the FLAC test suite, the tests are 'round trip', i.e. they encode and decode, then compare orignal and decoded files themselves with other tools.)

Josh
Title: lossless codec testing
Post by: graue on 2006-11-12 06:09:32
I'm interested in what kinds of things are in the FLAC test suite. Where can I download it? I skimmed various parts of the FLAC website, but all I found was a FAQ answer saying the test suite is "published and comprehensive", but no link.
Title: lossless codec testing
Post by: jcoalson on 2006-11-12 06:42:21
it's part of the source code for the project, if you download the source release or check out from CVS, it's in the test/ directory.

Josh
Title: lossless codec testing
Post by: TBeck on 2006-11-12 11:47:06
We all know that it is impossible to prove that a software of some complexity is error free. Hence what to do to feel a bit more safe?

FLAC has a large test suite, but it is designed specifically to find problems with libFLAC.  even though it has found problems with MAClib (monkey's audio) and flake I don't think it is suitable as a complete test for other codecs.

I haven't looked at the FLAC test suite, but i assume that it creates specific critical files based upon the knowledge about the internal codec structure? For the experienced developer(s) it should be quite easy to define specific critical conditions for their codecs. Some of them may be useful for many of the other codecs too, for instance many fast changes of the signal characteristics (frequencies, amplitude) which need adaptions of the codec parameters (start a new sub frame, calculate new predictors, calculate new bit coder parameters, handle the transition between the states well). Most of the errors i found in TAK had been caused by such transitions.

Other generally useful extreme conditions could be files with extreme amplitudes (nearly silent, white noise of very high amplitude) or even synthetic sounds like a pure sine or rectangle.

I am quite sure that it would be possible to collect some generally critical files, which are likely to produce errors in different codecs.

On the other hand there will be critical conditions specific to some particular codec, which only the codec developer itself may know.

We could ask the codec developers for critical files and use them to create a test corpus.


seem to take for granted that their codec of choice is lossless.  are there any large corpus tests to confirm this?


Obviously a bigger test corpus is more likely to create errors.

But to repeat myself, even the best and biggest test corpus can not prove, that a codec is error free.

(note that the comparison should not rely on the codec's own internal test features like CRC and MD5 checking.  for example in the FLAC test suite, the tests are 'round trip', i.e. they encode and decode, then compare orignal and decoded files themselves with other tools.)


I agree that it is always better to use independend tests which have not to rely on the test object itself. But if you want to know, if your specific files can be decoded well and you don't want to perform a full blown external test (encode, decode, binary compare with external tools), the use of the codecs own verify function  seems to be the second best option.

My experience: Until now TAK's internal verify option (immediately decode each frame and compare it to the original wave data) could confirm any codec error which before had been detected by external comparisons. Here i see some advantage for asymmetric codecs: Because of their usually very high decoding speed such a verify function will not significantly affect encoding time.

And to speak against my own interests:

Use a codec with a big user base. If many people have tried it before without problems, you can be a bit more confident, that it works well...

But please give the newcomers a chance too... 
Title: lossless codec testing
Post by: jcoalson on 2006-11-14 21:47:59
wow, only 4 replies and 2 are mine?!?   

the FLAC suite includes some general streams, like "full-scale deflection" streams that bounce between the rails, and lots of tests on noise which is unpredictable and can violate many kinds of assumptions usually made about the input.  I think those are generally useful.  some have crashed monkey's audio.

I don't think that relying on the tool's own verification system is enough.  for example, that problem I found with flake would not be caught even if it had a self-verification system like flac (and I assume tak's which looks like flac's), because the input samples were corrupted before they even got to the encoder.

Josh
Title: lossless codec testing
Post by: rjamorim on 2006-11-14 22:23:07
Bryant has somewhat of a test suite for WavPack here:
http://www.rarewares.org/wavpack/test_suite.zip (http://www.rarewares.org/wavpack/test_suite.zip)

He can probably give you more information.


I didn't reply because I don't understand squat about lossless compression other than bitching at David for "MOER FEETURZ!". But I do believe this sort of information would be interesting at the Wiki comparison.
Title: lossless codec testing
Post by: AndyH-ha on 2006-11-15 03:01:31
I suppose this inquiry is about whether or not it is possible to break some of these encoders, not whether or not they really work at all. It is easy to verify for any given audio file by comparing the original with an encoded/decoded version. If you can't find any of these that are not bit identical, it seems not something to worry about unless you just like playing with theoretical questions.
Title: lossless codec testing
Post by: goodnews on 2006-11-15 03:20:59
I suppose this inquiry is about whether or not it is possible to break some of these encoders, not whether or not they really work at all. It is easy to verify for any given audio file by comparing the original with an encoded/decoded version. If you can't find any of these that are not bit identical, it seems not something to worry about unless you just like playing with theoretical questions.

Josh Coalson (who started this thread) is the developer of the popular FLAC lossless audio codec/file format.

He is getting ready soon to release version 1.1.3 (or perhaps 1.2 or 2.0 ) of his FLAC encoder and decoder suite with *lots of new stuff* and I'm sure that is why he is soliciting as many "test case files" as possible to stress-test his new version of the FLAC encoder and decoder. I'm glad to see that Josh hasn't rushed this out without thorough testing, as so many programs use/depend on the FLAC support code that Josh releases freely.

The last updated to FLAC version 1.1.2 was early February 2005, so Josh seems cautious about releases to ensure their quality/stability (unlike some others who update their programs every few months it seems). Keep up the good work Josh, and hopefully all this testing will produce a *great* new version of FLAC. I support changing it to version 1.2 or 2.0 and NOT 1.1.3, as this is not just a "minor point release" in my opinion. So many new features (love Album art support BTW). Whatever you name it, it is likely to be a hit!
Title: lossless codec testing
Post by: spoon on 2006-11-15 08:43:52
Wouldnt the best test be to feed the encoder with random data, bit depths / channels and repeat in an automated fashion?
Title: lossless codec testing
Post by: Acid8000 on 2006-11-15 09:22:02
Wouldnt the best test be to feed the encoder with random data, bit depths / channels and repeat in an automated fashion?


Do you mean trying to encode white noise? Sure, the compression ratio would be very poor, but at least it could be a somewhat useful test, I think.
Title: lossless codec testing
Post by: Synthetic Soul on 2006-11-15 10:34:11
it just occurred to me in another thread that all the comparisons of lossless codecs I've seen include only compression ratios and times.
...
(note that the comparison should not rely on the codec's own internal test features like CRC and MD5 checking.  for example in the FLAC test suite, the tests are 'round trip', i.e. they encode and decode, then compare orignal and decoded files themselves with other tools.)
For my TAK testing my scripts do compare the decoded wave with the source wave using FSUM to compare MD5 hashes.  I realise that this will not work in all situations, when codecs removed RIFF chunks, but thankfully for those that retain them this is a quick and easy way to check correctness.  Possibly not foolproof, but as good as I can do.

wow, only 4 replies and 2 are mine?!?   
I don't really know what to say.  When I encode using WavPack I immediately verify using WVUNPACK -vm to check whether WavPack's internal routines can see a problem.

As far as more intensive and extensive tests go, I just wouldn't know where to start - this is where I must rely on the technical knowledge and responsibility of the developer, and my peers.

Perhaps all the lossless developers that frequent this board should submit some samples that they know to be troublesome to a corpus, and then other users can test all samples on all participating codecs?

Aside from that you are talking about mass or large-scale testing, that I personaly just don't have the time or inclination for.  There's enough paranoia on this board already without this!
Title: lossless codec testing
Post by: smok3 on 2006-11-15 13:09:14
actually it would be nice if every lossless codec would report what happened with all the riff chuncks (after encoding).
Title: lossless codec testing
Post by: Klyith on 2006-11-16 23:20:28
It should be possible to add a second mode to a lossless encoder such that after every block (or n blocks) of audio is encoded, it immediately decodes the chunk back to test vs the original data. The encoder would need to keep the original data buffered to be compatible with piped input, but that would also help the speed. Call it secure mode.

It would be slower to encode, but I think a lot faster than doing a verify or checksum after the fact. And I'm sure there are plenty of people on the HA board who wouldn't mind the speed tradeoff. The only obstacle is programmer time, and a new feature this big would need a lot of work.
Title: lossless codec testing
Post by: jcoalson on 2006-11-17 00:42:21
the -V option to flac does exactly that.
Title: lossless codec testing
Post by: saratoga on 2006-11-17 01:44:24

Wouldnt the best test be to feed the encoder with random data, bit depths / channels and repeat in an automated fashion?


Do you mean trying to encode white noise? Sure, the compression ratio would be very poor, but at least it could be a somewhat useful test, I think.


This would actually make the most sense IMO, since you could test the encoder with 10s or even 100s of GB worth of data, but still have the test program be only a few hundred KB which would let everyone run the test, not just people with very fast internet.  You could even make the "random" data deterministic so that everyone runs the same sequence of data everytime, thus making the test perfectly repeatible.
Title: lossless codec testing
Post by: cabbagerat on 2006-11-17 06:05:00

Wouldnt the best test be to feed the encoder with random data, bit depths / channels and repeat in an automated fashion?


Do you mean trying to encode white noise? Sure, the compression ratio would be very poor, but at least it could be a somewhat useful test, I think.
I think a number of test samples consisting of random data would be useful. Tests with noise biased to both high and low frequencies would be interesting, especially high frequency noise as (I guess) this would defeat the prediction fairly efectively. Something like this (in MATLAB or Octave):
Code: [Select]
x=rand(1,1000)*2-1;
b=fir2(32, [0 0.5 0.7 1], [0.2 0.2 0.9 1]); %Design a FIR filter which rejects low frequencies somewhat
y=filtfilt(b, 1, x);

A full set of tests on random data would not prove that FLAC is correct, but they would be useful evidence. Tests could include white, pink and blue noise as well as noise with an unusual distribution - like a Rayleigh distribution.
Title: lossless codec testing
Post by: jcoalson on 2006-11-17 07:43:28
noise has its place in the tests for sure.  I think a wide selection of 'normal' audio from a big corpus would also help; imagine if a multiply-accumulate path for a filter would only overflow when excited by a certain kind of signal.  flac will often switch to verbatim frames and not even hit the common datapaths for noise, which is why my tests also include non-noise samples.

actually what motivated the topic in the first place was my original point that the widespread use of a codec could count as anecdotal evidence that it was lossless, except for the puzzling fact that people (who are normally particular about this) apparently are not checking that their codec is lossless, and this also seems true for the various comparisons.  so the anecdotal record would miss non-lossless problems that were not audible.

Josh
Title: lossless codec testing
Post by: greynol on 2006-11-17 08:10:58
FWIW, I verify each and every Monkey's Audio file that I encode by comparing an md5 checksum of the raw decoded data against one generated from the raw data used to create it.

I have yet to have a problem <knock on wood> but will be sure to report it here if I do.
Title: lossless codec testing
Post by: beto on 2006-11-17 14:04:52
FWIW, I verify each and every Monkey's Audio file that I encode by comparing an md5 checksum of the raw decoded data against one generated from the raw data used to create it.

I have yet to have a problem <knock on wood> but will be sure to report it here if I do.


And how do you do that? Which tools and what is the process you use? I'm interested in this but for wavpack, maybe you can give me some ideas.
Title: lossless codec testing
Post by: greynol on 2006-11-17 18:17:01
And how do you do that? Which tools and what is the process you use? I'm interested in this but for wavpack, maybe you can give me some ideas.

With md5sum.exe (the original one, not the one from etree!) in combination with the rarewares version of MAC.  I could not get the non-etree build of md5sum to pipe its output properly.

In the case of transcoding from flac, I use md5sum.exe in combination with Sox and the rarewares version of MAC, but Sox does not work right if the track does not end on a frame boundary, though I think of this as feature rather than a flaw.

It would be simpler to just use shntool with the rarewares version of MAC which should also not have trouble with tracks that don't end on frame boundaries but this method is slightly slower.

http://www.hydrogenaudio.org/forums/index....st&p=447547 (http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=49951&view=findpost&p=447547)

PS: I'm still looking for a command line CRC32 generator that handles piping and redirection as I cannot get fsum to do this.  This would allow me to check tracks against CRC information from EAC which calculated from the raw PCM data.
Title: lossless codec testing
Post by: beto on 2006-11-17 18:49:43
Thank you. That surely gave me some ideas.
Title: lossless codec testing
Post by: EuMesmo on 2006-11-19 22:03:43
It's funny to hear the question from you, as you are essentially asking "Is lossless realy without loss?"

A few years ago I searched on this forum for this question, and did not find the answer. I was looking for a lossless codec for archive, and had my doubts (as you said, people are picky about this tings). I devised the following test:

1-I ripped a track to aiff.
2-Encoded to flac.
3-Encoded the aiff to codecX (sometimes after converting from aiff to wav).
4-Trancoded from codecX with DMC to flac.
5-Decoded from codecX to wav-2
6-Enconded the wav-2 to flac
7-Compared the md5 of the audio data in the flac files.

I assumed flac was lossless, and it had an md5 to check just the audio data, which was what I was looking for. I got the same md5 from ape, shn, wv, rkau. The data was different from ra lossless and an old incarnation of wma lossless (it was 9.0. 9.1 got the same md5). I recently did the same test with apple lossless and ogg-flac, and once toyed with this test on my burned audio cds.

I am assuming that flac is completely lossless, and the loss could be on the encoding or on the decoding, or some part of the wav header been stored on the file (which woldn't suit me anyway). I know other problems with this test, and that it has a lot of limitations. But I found strange that the wiki didn't mention that ra lossless was not "completely lossless". It may have changed, however.

FWIW, I verify each and every Monkey's Audio file that I encode by comparing an md5 checksum of the raw decoded data against one generated from the raw data used to create it.

I have yet to have a problem <knock on wood> but will be sure to report it here if I do.


This is what I was looking to do, but didn't know how to do. I understood what you said, but I am not sure I can replicate it. Could you provide a bat file or an application to do it?

And after your statement I'll assume mac IS lossless.
Title: lossless codec testing
Post by: bhoar on 2006-11-20 00:02:33
From what I have read, there are several reasons a WAV file might contain different data, yes still contain bit-identical PCM audio.  WAV is a file format for containing several types of data organized in several ways.

I'd be curious if you compared the two files to see what the difference is.  If it is only in the first 36 bytes, then it is just a RIFF/WAVE header issue (e.g. there are two unused bytes in the header if the data is a typical WAV, perhaps they differ?) and the audio (the PCM data itself) has not changed. 

If it is later in the file, then there might be an issue with the lossless codec possibly not really being lossless, and that should be investigated.  Alternately, there might be a different way of chunking that still gives the same audio data, which would still be lossless.

Also double check the conversion chain to ensure you aren't inadvertantly turning on sample size/rate conversion of any type.

-brendan
Title: lossless codec testing
Post by: Jan S. on 2006-11-20 19:39:46
I discussed some of the problems last night with Roberto and we came up with some viewpoints on this.
There seems to be two theoretical ways to go about this:

1. You mathematically analyse the algoritms and mathematically work out if it will be lossless for all possible samples.
The problem with this is however that the encoder/decoder is not a closed system thus you cannot possibly account for external variables like FPU and CPU. So eventhough you algoritms might be perfect you cannot be sure your output will be. Hence this type of test will be pointless if the goal is absolute perfection.

2. You run as much data thru the codec to establish a decent confidence level (if you generate random (but non-repeated blocks) thru the codec couldn't this actually be calculated?).
This should be a quit easy task if the author provides a way to do this automatically.

Then again isn't all of this a non-issue if people just use the -V switch?
Title: lossless codec testing
Post by: greynol on 2006-11-20 19:57:16
Then again isn't all of this a non-issue if people just use the -V switch?
Certainly for flac, but what about other codecs?
Title: lossless codec testing
Post by: TBeck on 2006-11-20 20:01:50
Then again isn't all of this a non-issue if people just use the -V switch?
Certainly for flac, but what about other codecs?

It's also supported by TAK... 

hmpf... at least in the GUI version. I forgot to implement a switch in the command line version...
Title: lossless codec testing
Post by: HyperDrive on 2006-11-20 20:09:56
1. You mathematically analyse the algoritms and mathematically work out if it will be lossless for all possible samples.
The problem with this is however that the encoder/decoder is not a closed system thus you cannot possibly account for external variables like FPU and CPU. So eventhough you algoritms might be perfect you cannot be sure your output will be. Hence this type of test will be pointless if the goal is absolute perfection.

Pure and utter nonsense. Computer programming is an exact science (it's not too terribly hard to mathematically prove an algorithm's correctness, as Knuth and Dijkstra would atest). FPU/CPU are (by definition) fully deterministic state machines, which means your first argument is also invalid.
Title: lossless codec testing
Post by: rjamorim on 2006-11-20 20:19:21
Certainly for flac, but what about other codecs?


Get the developer of these codecs to implement similar funcionality. Hurray!

Pure and utter nonsense. Computer programming is an exact science (it's not too terribly hard to mathematically prove an algorithm's correctness, as Knuth and Dijkstra would atest). FPU/CPU are (by definition) fully deterministic state machines, which means your first argument is also invalid.


Lay off "The Art of Computer Programming" for a while, brainiac. CPUs/FPUs have bugs, you know?
Title: lossless codec testing
Post by: greynol on 2006-11-20 20:35:42
Certainly for flac, but what about other codecs?

Get the developer of these codecs to implement similar funcionality. Hurray!

Amen!
Title: lossless codec testing
Post by: jcoalson on 2006-11-20 21:12:12
I discussed some of the problems last night with Roberto and we came up with some viewpoints on this.
There seems to be two theoretical ways to go about this:

1. You mathematically analyse the algoritms and mathematically work out if it will be lossless for all possible samples.
The problem with this is however that the encoder/decoder is not a closed system thus you cannot possibly account for external variables like FPU and CPU. So eventhough you algoritms might be perfect you cannot be sure your output will be. Hence this type of test will be pointless if the goal is absolute perfection.

verifying algorithmic correctness is theoretically possible but can be very difficult.  there is more danger in the implementation, which is what I mean by codec.  I wasn't challenging the losslessness of formats themselves although I guess it could be possible.

2. You run as much data thru the codec to establish a decent confidence level (if you generate random (but non-repeated blocks) thru the codec couldn't this actually be calculated?).
This should be a quit easy task if the author provides a way to do this automatically.

not sure what you mean by random, but I described above why noise may not be sufficient (e.g. to excite overflow problems in a filter).

Then again isn't all of this a non-issue if people just use the -V switch?

not totally; I described above the kinds of errors that self checking cannot catch.

but anyway, even that would be a good addition to a general lossless comparison.

Josh
Title: lossless codec testing
Post by: rjamorim on 2006-11-20 21:23:01
but anyway, even that would be a good addition to a general lossless comparison.


I agree. Do people know what lossless codecs besides FLAC and TAK have -v switch or something similar?

Edit: I see wvunpack, ttaenc and mac.exe have -v too. ofr.exe has --verify. LPAC has -c. Any other takers?
Title: lossless codec testing
Post by: HbG on 2006-11-20 23:00:48
Is there an application that can compare the audio data in two .wav files and accepts piped input? I've searched for such a simple program but couldnt find it. Or some tool to test a flac against a wav.

I've used foobar's bitcompare with flake for a while, didn't find any differences. I'm using REACT for my ripping now, it'd be nice if i could automate testing with it by using the program described above, it'd be even nicer if many others would do the same.

By the way, flake deserves to be on rarewares!
Title: lossless codec testing
Post by: foosion on 2006-11-20 23:16:33
Lay off "The Art of Computer Programming" for a while, brainiac. CPUs/FPUs have bugs, you know?
There are two things you can do in this case:Actually, there is a third choice: You can take the position that outside events can cause arbitrary errors in the machine used to execute some software, and that you can never ever be sure anything works like intended. In that case you probably should spend your time on something more useful instead of worrying about the correctness of software.
Title: lossless codec testing
Post by: TBeck on 2006-11-20 23:47:10
but anyway, even that would be a good addition to a general lossless comparison.


I agree. Do people know what lossless codecs besides FLAC and TAK have -v switch or something similar?

Edit: I see wvunpack, ttaenc and mac.exe have -v too. ofr.exe has --verify. LPAC has -c. Any other takers?

I am not sure, if those functions are performing the same actions:

Possibility 1:

A verify or test function performed when decoding a compressed file: Look for decoding errors (invalid input for the decoder), calculate the checksum of the decoded data and compare it with the stored (by the encoder) checksum of the original data. This operation can be performed without the original uncompressed data available.

Possibility 2:

A verify function performed when encoding: Immediately decode the just encoded data and compare the whole data (not only it's CRC!) with the original data. This operation needs the original uncompressed data.

Obviously 2 should be able to detect any error, because each byte of the decoded data is beeing compared with the original data. 1 has to rely on the error detection strength of the checksum, which isn't perfect.

Possibly i am a bit anal, but this may also be true to some degree for this thread...

Edit: And possibly i am too ignorant and all the options of the different compressors you have listed are performing 2).
Title: lossless codec testing
Post by: Mark0 on 2006-11-20 23:50:28
I just want to say that, if someones come up with a corpus of various audio files for this kind of lossless testing, I will gladly offer space and a tracker to host a .torrent with it, if needed.

Bye!
Title: lossless codec testing
Post by: jcoalson on 2006-11-21 01:36:47
Obviously 2 should be able to detect any error, because each byte of the decoded data is beeing compared with the original data. 1 has to rely on the error detection strength of the checksum, which isn't perfect.

if you're talking about the codec's own verify function, again I say that this would not have caught the error I found with flake.

Josh
Title: lossless codec testing
Post by: cabbagerat on 2006-11-21 06:11:37
How about a distributed computing initiative? A small program could be developed which would run on somebody's PC (overnight), uncompress (to a temporary file) all their sound files, FLAC them, unFLAC them and compare the output. You could then get a bunch of HA members to run the program - giving an extremely large set of tests for every new FLAC version in a very short time.

It sounds like an odd idea - and it's by no means a proof - but will be a nice demonstration that FLAC works as claimed. I'd guess most FLAC users (including me) wouldn't mind donating a few cycles.

Pure and utter nonsense. Computer programming is an exact science (it's not too terribly hard to mathematically prove an algorithm's correctness, as Knuth and Dijkstra would atest). FPU/CPU are (by definition) fully deterministic state machines, which means your first argument is also invalid.
It's not hard to prove an algorithm's correctness if that algorithm is a member of the small subset of extremely simple algorithms. Many real-world algorithms defy closed-form analysis. And of course you know that the general case is undecidable.
Title: lossless codec testing
Post by: HyperDrive on 2006-11-21 09:06:23
Lay off "The Art of Computer Programming" for a while, brainiac. CPUs/FPUs have bugs, you know?

Agreed. But if the underlying hardware doesn't work for the required operations, you have bigger problems than lossless audio compression and encoding. Besides, (good) compilers work around buggy instructions. If after the encoding/decoding process the output stream bitwise equals the input, I'd say they're identical...

It's not hard to prove an algorithm's correctness if that algorithm is a member of the small subset of extremely simple algorithms. Many real-world algorithms defy closed-form analysis. And of course you know that the general case is undecidable.

Also agreed, to a certain extent. However, even if not proven correct, FLAC and Monkey's Audio, for example, are around for a while and should be quite tested at this point. Assuming the algorithms are correct, I believe the remaining potential problems could be classified as paranoia. 
Title: lossless codec testing
Post by: cabbagerat on 2006-11-21 09:41:24
Agreed. But if the underlying hardware doesn't work for the required operations, you have bigger problems than lossless audio compression and encoding. Besides, (good) compilers work around buggy instructions.
Not if the CPU is broken. Here are some graphs taken from three consecutive runs of an Octave program on a Duron 800 CPU which worked absolutely fine in OpenOffice, Firefox and Thunderbird.

http://www.brooker.co.za/brokenpc/broken1.png (http://www.brooker.co.za/brokenpc/broken1.png)
http://www.brooker.co.za/brokenpc/broken2.png (http://www.brooker.co.za/brokenpc/broken2.png)
http://www.brooker.co.za/brokenpc/broken3.png (http://www.brooker.co.za/brokenpc/broken3.png)

And the output of the identical program on the same PC with the CPU swapped out for another Duron 800.
http://www.brooker.co.za/brokenpc/fixed.png (http://www.brooker.co.za/brokenpc/fixed.png)

Which is what it is supposed to look like. Before you say that it's an Octave bug, I got the same problems with any double precision calculation performed on this CPU in both Linux and Windows. This sort of thing could really wreck a weekend of CD archiving.
Title: lossless codec testing
Post by: Synthetic Soul on 2006-11-21 10:14:03
Edit: And possibly i am too ignorant and all the options of the different compressors you have listed are performing 2).
I know that WavPack, True Audio, OptimFROG and Monkey's Audio are using the first method you list (verifying the compressed data).  I suspect this is the standard verification method.

if you're talking about the codec's own verify function, again I say that this would not have caught the error I found with flake.
I think Thomas was simply making a distinction between the two main approaches for the benefit of others.

It sounds like an odd idea - and it's by no means a proof - but will be a nice demonstration that FLAC works as claimed. I'd guess most FLAC users (including me) wouldn't mind donating a few cycles.
I don't think Josh started this to get more testing on FLAC.  I would like to think that he is trying to improve all lossless codecs, and therefore any testing should not be FLAC-specific - we need a system that can be applied to any lossless codec, irrespective of whether they can validate while encoding.

Is there an  application that can compare the audio data in two .wav files and  accepts piped input? I've searched for such a simple program but  couldnt find it.
I was wondering about Mark0's RIFFStrip (http://mark0.net/soft-riffstrip-e.html) in conjunction with MD5 hashes of the resulting file...

It seems it would be very useful if one of our clever developers could create a specific app to help us with this task though.  Or two: one to create random noise WAVE files and one to bit-compare WAVE audio data, possibly verifying valid headers also...
Title: lossless codec testing
Post by: HyperDrive on 2006-11-21 12:13:57
Not if the CPU is broken. Here are some graphs taken from three consecutive runs of an Octave program on a Duron 800 CPU which worked absolutely fine in OpenOffice, Firefox and Thunderbird.

http://www.brooker.co.za/brokenpc/broken1.png (http://www.brooker.co.za/brokenpc/broken1.png)
http://www.brooker.co.za/brokenpc/broken2.png (http://www.brooker.co.za/brokenpc/broken2.png)
http://www.brooker.co.za/brokenpc/broken3.png (http://www.brooker.co.za/brokenpc/broken3.png)

And the output of the identical program on the same PC with the CPU swapped out for another Duron 800.
http://www.brooker.co.za/brokenpc/fixed.png (http://www.brooker.co.za/brokenpc/fixed.png)

Which is what it is supposed to look like. Before you say that it's an Octave bug, I got the same problems with any double precision calculation performed on this CPU in both Linux and Windows. This sort of thing could really wreck a weekend of CD archiving.

Interesting, indeed. And that was precisely my point: If the underlying hardware is broken, you have bigger problems. Hardware bugs are (mostly) documented and/or fixed/worked around in later revisions, but in your case you had a broken FPU.
However (correct me if I'm wrong), in order to dump an audio CD, you should only need integer instructions (you're basically moving data around), which means you wouldn't end up with a corrupt .wav file. If the lossless compression process required floating-point, the resulting decompressed file would most likely differ from the original, exposing the FPU malfunction.
Title: lossless codec testing
Post by: SebastianG on 2006-11-21 12:39:47
It's not only malfunction but different floating point formats (coding represents different subsets of |R) and different algorithms for + - * / (may round differently).
Title: lossless codec testing
Post by: cabbagerat on 2006-11-21 13:53:59
However (correct me if I'm wrong), in order to dump an audio CD, you should only need integer instructions (you're basically moving data around), which means you wouldn't end up with a corrupt .wav file. If the lossless compression process required floating-point, the resulting decompressed file would most likely differ from the original, exposing the FPU malfunction.
If your lossless codec does not verify during encoding then problems like this one can make a mockery of even the best algorithm. I'm not saying they are common, just that there is a need for complete verification (encoder-decode-compare or equivalent) even with programs that are known to work. That isn't a perfect solution as it requires more cycles and can't work with on-the-fly encoding (unless something like an MD5 was taken beforehand), but is the only one that is truly foolproof.

Whether foolproof is necessary is a harder question
Title: lossless codec testing
Post by: TBeck on 2006-11-21 22:57:06
if you're talking about the codec's own verify function, again I say that this would not have caught the error I found with flake.
I think Thomas was simply making a distinction between the two main approaches for the benefit of others.

Thanks! That's exactly what i wanted to.

If a new feature should be introduced into the comparison table, it would be nice to have an exact definition.

To Josh: I am well aware of the limits of internal verification functions... But i agree, i should be more exact.

Edit: And possibly i am too ignorant and all the options of the different compressors you have listed are performing 2).
I know that WavPack, True Audio, OptimFROG and Monkey's Audio are using the first method you list (verifying the compressed data).  I suspect this is the standard verification method.

Then my distinction should make sense.

It sounds like an odd idea - and it's by no means a proof - but will be a nice demonstration that FLAC works as claimed. I'd guess most FLAC users (including me) wouldn't mind donating a few cycles.
I don't think Josh started this to get more testing on FLAC.  I would like to think that he is trying to improve all lossless codecs, and therefore any testing should not be FLAC-specific - we need a system that can be applied to any lossless codec, irrespective of whether they can validate while encoding.
...
It seems it would be very useful if one of our clever developers could create a specific app to help us with this task though.  Or two: one to create random noise WAVE files and one to bit-compare WAVE audio data, possibly verifying valid headers also...


This could be an interesting project. Probably some more evaluation and thinking is necessary, if a systematic testing of critical conditions/files is really significantly better than testing a large pool of random files. Possibly my initial enthusiasm for the benefits of the testing of generally critical files was not adequate.