Skip to main content
Topic: Tips for Testing Lossless Codecs (Read 12701 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Tips for Testing Lossless Codecs

I'd sort of like to do my own testing on WavPack and FLAC on my system, and I'm just wondering how/what sorts of plugins are used to keep logs of how much time it took to encode my test corpus and all that. Synthetic Soul's lossless comparison has exactly what I'm looking to calculate and I'm using fb2k. I'm just not sure how to get averages for encoding speed/time and all that sort of thing. I'm guessing they're in the form of plugins for fb2k, right?

Tips for Testing Lossless Codecs

Reply #1
...Anyone? I've done some googling and found that there was a component, "foo_null," that used to benchmark encoding/decoding speed. Can't find it available for download anywhere though.

...Little help?

Tips for Testing Lossless Codecs

Reply #2
I only use foobar to tell me the duration of the source files.

I use batch files to perform the tests, which use the encoder/decoder in question, TIMER to record the times taken to encode and decode, and FSUM to compare the MD5 hash of the decrypted files with that of the source WAVEs.

To calculate the en-/decoding rate you need the duration of the source file, and the time taken to en-/decode.  I obtain the duration in seconds (floating point) by loading the files into foobar, setting my copy script as  %length_seconds_fp%, selecting all the tracks, copying, and pasting into a text file.  This only needs to be done once (which is why I haven't bothered with a more automated process).  I use TIMER to record the time taken to en-/decode.  My batch files write the result from TIMER to a text file, and then I use a VBS script to scrape the values.  The rate is simply the duration/process time; so if the file is 10 seconds long and it takes 2 seconds to encode the rate is 10/2 or 5x.

To calculate the compression you just need the filesize of the source WAVE, and the filesize of the compressed file.  I use a simple batch script to create a text file containing the filesizes of the source files.  My test batch files record the encoded filesizes as they go.  The compression is simply the encoded filesize divided by the source filesize.

TIMER provides timings for both CPU only and CPU+IO.  My comparison uses CPU+IO times, but only as that is how i started reporting times, as I started by using TIMETHIS and that's all it will report.  The problem with reporting CPU+IO times is that the figures are distorted by my hard drives.  Although they are accurate for my system, they may not portray the results that someone else will see.  All that can be said is that they are correct in relation to each other.  Ooh look, a chart:



This chart shows the difference between CPU-only and CPU+IO speeds.  As you can see, my hard drive starts having problems around 30x, and can't get over 80x.

In other (other than my TAK comparison pages) results I have provided I tend to report CPU-only times.

I run my tests from a root folder.  I have a folder called "source" which contains all the source WAVE files.  This folder also contains "size.txt" (a list of filesizes in bytes for each file); "duration.txt" (a list of durations in seconds (x.xxx) for each file) and "digests.md5" (a list of MD5 hashes which is used to compare with the decoded files' hashes).

I then have other folders, like "FLAC" or "WavPack", in which I have subfolders for versions, etc.  Within these folders I have one folder per setting (e.g. "fx3").  Each folder has three batch files: "encode.bat" to encode all the source files; "decode.bat" to decode all the encoded files; and "md5.bat" to compare the MD5 hashes (called by "decode.bat").

My TAK comparison uses an Access database as the backend.  This is populated by a VBS script that scrapes all the test data (encode times, decode times and filesizes) and runs SQL statements.

I have another VBS script that will simply extract global and process times from the TIMER output and write the values to a text file.  I use this script for other result sets where I use Excel to calculate the values, and where I will generally use the process (CPU-only) times.  As all my scripts write text files that have one value per line it is easy to copy and paste into an Excel table, and then use formulae to calculate the rates and compression.

I have tried to make my batch file system as easy to reuse as possible, as I have reused it many, many times myself now.  If you are interested, I am happy to provide you with my scripts.
I'm on a horse.

Tips for Testing Lossless Codecs

Reply #3
//EDIT... you got the answer above

Tips for Testing Lossless Codecs

Reply #4
I have tried to make my batch file system as easy to reuse as possible, as I have reused it many, many times myself now.  If you are interested, I am happy to provide you with my scripts.

I'd really appreciate that.

Tips for Testing Lossless Codecs

Reply #5
If you unzip comparison-scripts-v1.1.zip you should have three folders: "source", "WavPack" and "TAK". I have put in some examples for WavPack and TAK so you can see how it works.
  • Ensure that TIMER.EXE and FSUM.EXE are in a folder in PATH. Your Windows folder ("C:\Windows" or "C:\WINNT") is a good place.
  • Put your source WAVE files in "source". I would suggest just putting in a couple to begin, to test the setup.
  • Double-click "/source/md5.bat" to create the MD5 hash file, "digests.md5". This is necessary for "md5.bat" to be able to compare hashes.
  • Although not necessary at this point, may as well do it now: double-click "/source/size.bat" to create "size.txt" - a list of filesizes for your source files. You will need this to calculate the compression. See post above RE: getting a list of durations for the files. NB: Ensure that the files are in filename order in foobar or the rows will not match the other text files!
You must ensure that all encoder and decoder apps are in a folder in PATH, like your Windows folder. Either that or change all "encode.bat" and "decode.bat" files to include the full path to the apps.

The Files

If you navigate to "/WavPack/4.40" you will see that it contains folders for a few example settings (f; h; x) and also the files "encodeall.bat" and "decodeall.bat". If you double click "encodeall.bat" it will call any "encode.bat" files it finds in subfolders. If you double-click "decodeall.bat" it will call any "decode.bat" files it finds in subfolders. In this way you can either run all settings at once using "encodeall.bat", or one at a time using the relevant "encode.bat". "encode.bat" discovers the location of "source" itself, so there's no need to mess about.  This enables you to have subfolders within subfolders within subfolders (e.g.: "/Wavpack/4.40/alpha3/hx", without confusing the scripts. It will accept the path to a folder if required though.

Each "settings" folder contains 4 files to begin: encode.bat; decode.bat; md5.bat and timer.vbs.
  • "encode.bat" encodes the files, creating "encode.txt" and "size.txt". "encode.txt" contains the output from TIMER.EXE, while "size.txt" contains a list of the filesizes for the encoded files.
  • "decode.bat" decodes the files, creating "decode.txt" calling "md5.bat", and then deleting the encoded and decoded files. "decode.txt" contains the output from TIMER.EXE.
  • "md5.bat" is called by "decode.bat" and creates "md5.txt", which contains the return from FSUM for the MD5 hash compare.
  • "timer.vbs" can be used to scrape the timings from the "encode.txt" and "decode.txt" files. Double-click it (you need Windows Scripting Host installed) and you should get the new files "encode.txt.global.txt"; "encode.txt.process.txt"; "decode.txt.global.txt"; "decode.txt.process.txt". "encode.txt.process.txt", for example, will contain a list of the CPU-only times taken to encode. "decode.txt.global.txt" will contain a list of CPU+IO times taken to decode.
Note: "timer.vbs" and "md5.bat" are the same for every folder. If you create new folders you can just copy those to your heart's content. If you create a new WavPack setting you can copy all four files over, and just change "encode.bat" to reflect the setting. "decode.bat" will only need to be amended for a different codec.  Take a look at an example from the TAK folder and WavPack folder to spot the differences.

Both "encode.bat" and "decode.bat" have two lines that change. These are:

@SET extension=

... and:

@SET commandLine=

Check the example scripts for usage. Note the use of the %input% and %output% variables for commandLine. For "encode.bat" %input% will be the WAVE and %output% the encoded file. For "decode.bat" %input% will be the encoded file and %output% the new WAVE file.

Edit @ 2006-12-19 : You must always use %output% to specify the output file, in both "encode.bat" and "decode.bat".  This will ensure that the encoded and decoded files are created in the settings folder itself, and not the source folder (or anywhere else for that matter).

Testing The Scripts

For testing I suggest that you just double-click on "encodeall.bat" in "/WavPack/4.40", and let that run. Check the setting folders and you should see the encoded files, "size.txt", and "encode.txt".

Double-click "decodeall.bat" to decode the files. Once they have all been decoded you should see the new files "decode.txt", and "md5.txt".

You can then double-click "timer.vbs" to scrape the times, and check the new files created (e.g.: "encode.txt.process.txt").

All feedback welcome.

Edit: I would like to point out that I have given these scripts to one member earlier today.  He appeared to have no problems getting them to run, or understanding the results, so hopefully it's easier than I've made it sound! (FYI Mike: v1.1 is just a tidy of the scripts, there is no functionality change.  Thanks for your feeedback.)
I'm on a horse.

Tips for Testing Lossless Codecs

Reply #6
Thanks for sharing your scripts.

I test again today with same file (and a couple more). no more bug in size.txt 
I simply cannot reproduce this error. Good !

To calculate wave's length, you can use Text Tools in foobar2000 (File - Preferences - Tools - Text Tools).
I clear Header and Footer fields and for Body i put this :
Code: [Select]
%length_seconds_fp%'
'

Results are tracks duration in seconds as floating point number. You can easily export them :
1. select tracks in correct order
2. right click - Text tools - Copy Text
3. past in a text file or excel

Tips for Testing Lossless Codecs

Reply #7
@Synthetic Soul

Thanks for your scripts... I intend to make full use of them while testing TAK. One question though, what about the final step (collecting the data from all the txt files and putting them into an excel file)? Is there any automated method or is it just manual labor?

Tips for Testing Lossless Codecs

Reply #8
Sorry, just manual labour.  However, it is simply a case of cut'n'paste, and if you've set the spreadsheet up beforehand the process should take two minutes.

As my main testing (for TAK) uses a script to go into my database I've never considered automating input to Excel.
I'm on a horse.

Tips for Testing Lossless Codecs

Reply #9
Whoops! Thanks again Soul. I'm still testing them out myself!


Tips for Testing Lossless Codecs

Reply #11
@Synthetic Soul

Thanks for you reply. Could you please elaborate on your method (using database/script)?

(the thought of manually entering so much data is scary)

Tips for Testing Lossless Codecs

Reply #12
I have a VBS script tha I use to scrape the relevant info (encode times, decodes times and encoded filesize) and insert the information into my Access database that is used in my comparison.

The script uses ADO/SQL to communicate with Access.

Once again, I have one file per "settings" folder.  The only difference between files being the ID used to specify the encoder/setting.

I first create the necessary encoder and settings records in my database, and then, manually,  amend the ID used in each script to the ID of the records I've just created.  I then double-click the VBS file(s) and 1-2 seconds later the database has been updated.

I have passed this info to Thundik81; however, as I have pointed out to him, the database I currently use is a little simplistic, and I wouldn't really recommend it to others.  I have designed an improved database schema, that further normalises the data to make it easier to filter results, which my current system sorely lacks.  Unfortunately I have no time to update my system to use this database, or re-run the additional test that it would require.  I'd like to move it to PHP and MySQL/SQLite, but hey.

The VBS script can be found here.  I can provide you with a link to my database but I'm embarrased to share it right now with the duff WavPack 4.40 data it currently contains.

As I say, I'd rather people didn't just use my system anyway.  I'd like to see people using my scripts as a method to get the data, and then use new and exciting methods to use it.  I'm a litle concerned that I seem to have hijacked this thread, and don't really want to set myself up as an expert, as I simply am not.

If I can be of help though, I will try.
I'm on a horse.

Tips for Testing Lossless Codecs

Reply #13
@kanak

I see from your post in the TAK testing thread that you have managed to get my scripts to run.

I was just wondering if you had any suggestions, comments, etc.?  Was it terrible creating your spreadsheet?

Also, one suggestion:  I have seen the PDF (very tidy BTW) and you are reporting compression rate to 4 decimal places.  As was pointed out to me, this may be a little misleading.  The test method, although as accurate as I know how, is not so accurate that we can reproduce results to an accuracy of 4 decimal places.  Accordingly I now report my rates as integer values.  Your call, obviously.

Anyway, I was just digging for some feedback. 

Edit: Ah, seeing as I'm here, if you do want to see my Access database, I have replaced the bad WavPack data, so feel free to download the live version.*

* I wouldn't normally have a database in a web-accessible folder, but it was always my intention to let people access the data.  (One reason I am currently looking at GData.)
I'm on a horse.

Tips for Testing Lossless Codecs

Reply #14
@Synthetic Soul

Thank you so much for your scripts. I doubt i'd have tried out the that many codecs/settings if i didn't have your scripts to automate the tasks.

However, i still can't get my optimfrog settings to work well. The output is created in the "source" directory. Could you please post one of your encoder.bat for optimfrog?

Regarding the 4-decimal place. i agree with you. Having that many decimals does imply a sense of precision that may not necessarily be present in the measurement. It's just that in this particular test, using a lower value of precision meant that some codec settings would appear to have the same values. But i don't really have so many qualms about the lack of "scientific" method in this test, specially because the sample size was a single file (and less than 2 minutes long). In my future tests, i intend to use 1-2 decimal places.

Also, thanks for providing your access database. I'm going to tool around and see if i can get it to work. I sure hope I can get it to work  .


Oh one more question, do you use Process Time or Global Time?

PS: I love how you present your test results. The data is sortable and all that. Have to figure out some way to do that

Tips for Testing Lossless Codecs

Reply #15
Thank you so much for your scripts. I doubt i'd have tried out the that many codecs/settings if i didn't have your scripts to automate the tasks.
Thanks for the reply.  No problems, I'm glad they are providing some help.

However, i still can't get my optimfrog settings to work well. The output is created in the "source" directory. Could you please post one of your encoder.bat for optimfrog?
I don't have an example here (at work), but it sounds like you are not explicitly setting an output.  I would use:

OFR.EXE --encode %input% --output %output%

OFR.EXE --decode %input% --output %output%


Oh one more question, do you use Process Time or Global Time?
I use Global (CPU+IO) Time, but if I could start afresh I would use Process (CPU-only) Time.

When I first started testing TAK I was using Microsoft's TimeThis, which will only report "Global" time. When I discovered TIMER, after the ever-helpful Josef Pohm alerted me to the issue of my hard drive throttling, I felt I still had to report Global time, so that Thomas could compare results with previous results.

I would point you again to the graph at the top of this thread.

In conclusion: If I were you I would use Process time. If you choose to use database.vbs then you would need to amend line 78 from:

objRegExp.Pattern = "Global Time += +([\d\.]+) = "

... to:

objRegExp.Pattern = "Process Time += +([\d\.]+) = "

PS: I love how you present your test results. The data is sortable and all that. Have to figure out some way to do that smile.gif
Thanks.  I was pleased with the system, but I'm finding it increasingly frustrating that it provides no method to filter the results, so that you could easily compare Flake vs FLAC, FLAC 1.1.2 vs 1.1.3, Wavpack default vs FLAC default, etc.  This requires a new database structure and code rewrite, which I just don't have time for.  I do find the sorting ability helpful though, and I also use the CSV download - if I want to filter the table to compare just a few settings.
I'm on a horse.

Tips for Testing Lossless Codecs

Reply #16
FYI: I have updated my VBS file "duration.vbs".  It now reads the wave header to calculate the file duration.  This means that it will correctly deal with files that are not 16 bit 44100Hz stereo.  Unfortunately it will still fail if files have additional RIFF chunks (which most don't).

This can be used as a method to create the list of file durations - just put the file in the "source" folder and double-click.

Code: [Select]
Dim objFSO, objFolder, objFile, objOutput

Set objFSO = CreateObject("Scripting.FileSystemObject")

Set objFolder = objFSO.GetFolder("./")

Set objOutput = objFSO.OpenTextFile("duration.txt.tmp", 2, True)

For Each objFile in objFolder.Files
  If GetExtension(objFile.Name) = ".wav" Then
    objOutput.WriteLine(GetWaveDuration(objFile.Name))
  End If
Next

objOutput.Close
Set objOutput = Nothing
Set objFile = Nothing

If objFSO.FileExists("duration.txt") Then objFSO.DeleteFile "duration.txt"
objFSO.MoveFile "duration.txt.tmp", "duration.txt"

Set objFSO = Nothing

Function GetExtension(ByVal strFile)
  GetExtension = LCase(Mid(strFile, InstrRev(strFile, ".")))
End Function

Function GetWaveDuration(ByVal strFile)
  Dim objSource, objBytes
  Set objSource = CreateObject("ADODB.Stream")
  objSource.Type = 1
  objSource.Open
  objSource.LoadFromFile strFile
  objBytes = objSource.Read(44)
  objSource.Close
  Set objSource = Nothing
  GetWaveDuration = Round(Binary2Integer(objBytes, 41, 44)/Binary2Integer(objBytes, 29, 32), 7)
End Function

Function Binary2Integer(Binary, intStart, intEnd)
  Dim i
  For i = intStart To intEnd
    Binary2Integer = Binary2Integer + AscB(MidB(Binary, i, 1)) * (2 ^ ((i - intStart) * 8))
  Next
End Function
I'm on a horse.

Tips for Testing Lossless Codecs

Reply #17
Maybe I'm missing something, but how to transform all these data about individual files to single compressions ratios and speed for each preset? Or how I shoud use the database?

You are talking to a total iliterate in excel and database things... that yet want to create an apresentable comparison... at least in .txt mode.

Thanks, and sorry for the inconvenience.

Tips for Testing Lossless Codecs

Reply #18
The easiest way is in a spreadsheet.  You need a minimum of 8 columns:
  • Wave filesize (in bytes)
  • File duration (in seconds)
  • Encoded filesize (in bytes)
  • Compression (=Column 3/Column 1)
  • Encode time (in seconds)
  • Encode rate (=Column 2/Column 5)
  • Decode time (in seconds)
  • Decode rate (=Column 2/Column 7)
All data columns (1, 2, 3, 5, and 7) are populated using the data in the text files created by the scripts:

1. /source/size.txt
2. /source/duration.txt
3. /setting folder/size.txt
5. /setting folder/encode.txt.process.txt
7. /setting folder/decode.txt.process.txt

The idea is that you set up the spreadsheet headings and calculations, and then just copy and paste from the text files.  Each line in the text file becomes a row in the spreadsheet. 

Sometimes I will use one sheet for various settings (so columns 3 to 8 are repeated for each setting), Other times I set up a sheet with the eight columns, get the formatting and calculations all set up, and then duplicate the sheet for the number of settings I am reporting - one sheet per setting.

You can achieve the simple column calculations, and the column totals, using very basic spreadsheet formulae.  Tip, use the dollar sign to stop a cell shifting - e.g.: in column 6 use "=$B2/E2".  When copied to column 8 it will correctly transfer as "=$B2/G2" (not "=D2/G2", as it would without the "$").

Have a look at this Google Docs example (some Flake testing I did):

http://spreadsheets.google.com/pub?key=pwA..._urKzEsJW438tFg

Edit: Doesn't look like you can see the formulae in that, so I've uploaded a recent XL spreadsheet (FLAC IC9sseW testing).  Hopefully, even as an inexperienced XL user, you can see how simple it is from that.

Edit 2:  It doesn't make sense to try to do this purely in Notepad. Using XL for the calculations I could create a report with the script output from scratch in less than five minutes.  It is easy to copy and paste from XL - column boundaries simply convert to tabs, which I then convert to four spaces, as the beginnings of a plain text table - if that is how I am reporting my findings (example).
I'm on a horse.

Tips for Testing Lossless Codecs

Reply #19
Ohhh! Thanks, I got it.

But now I have a problem: in my locale the decimal separator is a comma(,), not a dot(.). The TIMER.EXE gives the times with an dot, and then the spreadsheet don't works. There is an quick fix for that, or I must change my locale configs?

Another doubt: the "process time" counts only the amount of time realy used to encode the TAK file? I can do other things in my computer while it encode that way? And I assume that the Global is only the time delta from start to end, including everything, right?

Tips for Testing Lossless Codecs

Reply #20
But now I have a problem: in my locale the decimal separator is a comma(,), not a dot(.). The TIMER.EXE gives the times with an dot, and then the spreadsheet don't works. There is an quick fix for that, or I must change my locale configs?
I can only suggest doing a find'n'replace on the text files.  It's a real shame that this extra step needs to be introduced.

 
Another doubt: the "process time"  counts only the amount of time realy used to encode the TAK file? I can  do other things in my computer while it encode that way? And I assume  that the Global is only the time delta from start to end, including  everything, right?
I always set the scripts going, with no other apps running (except normal services), and leave the PC well alone.  However, in theory, the process time should only report the CPU time required by the enc/decoding process.  As you say, global time is the time from start to finish (including I/O time and other processes).
I'm on a horse.

Tips for Testing Lossless Codecs

Reply #21
Quote
I can only suggest doing a find'n'replace on the text files. It's a real shame that this extra step needs to be introduced.

Now I got EditPad Lite and it is only one more click per file. Not bad.
Quote
I always set the scripts going, with no other apps running (except normal services), and leave the PC well alone. However, in theory, the process time should only report the CPU time required by the enc/decoding process. As you say, global time is the time from start to finish (including I/O time and other processes).

I made some tests to see if it is true or not. The results below are from Flake (decoded by FLAC), timed by Process-time, alone or in multitasking enviroment. Two tests for each situation:

Code: [Select]
                                    Rate-1                                 Rate-2    
                Comp %      Encoding   Decoding      Comp %      Encoding   Decoding
0-alone        64,298%        68x        53x        64,298%        68x        54x
5-alone        58,744%        49x        53x        58,744%        48x        53x
0-multi        64,298%        59x        47x        64,298%        48x        36x
5-multi        58,744%        44x        39x        58,744%        35x        39x

Hardware: Celeron 1.7GHz (L2 cache of 128KB)
          512mb DDR-133
          Seagate 7200.7 80GB IDE


Well, there is an impact, not so big as in global (not shown here) but considerable and not constant. It is better to run tests leaving the PC alone.

And as you see, I managed to make yours scripts run, and the Excel to work.

Tips for Testing Lossless Codecs

Reply #22
Thank you for the test.  I guess my understanding of the process time is wrong; I suppose it must be all processing during that period (?).

I'm glad you got to grips with Excel.  A spreadsheet is very useful in this situation.
I'm on a horse.

Tips for Testing Lossless Codecs

Reply #23
Well, there is an impact, not so big as in global (not shown here) but considerable and not constant. It is better to run tests leaving the PC alone.

Possibly it's a memory caching issue. Flake loads it data into the cache, then the other process is interrupting and replaces the content of the cache with it's own data. Then back to Flake, which has to reload it's data, what is very time consuming.

Tips for Testing Lossless Codecs

Reply #24
Possibly it's a memory caching issue. Flake loads it data into the cache, then the other process is interrupting and replaces the content of the cache with it's own data. Then back to Flake, which has to reload it's data, what is very time consuming.

Yeah, probabily. I was playbacking and seeking video and audio files, browsing in the internet with 60+ tabs, opening heavy aplications, etc. The global time was 2~3 times higher than the alone global time. And if it was all processing in this period, then the process time would be also two times higher.

 
SimplePortal 1.0.0 RC1 © 2008-2019