I've always been curious about the average compression rate across my whole FLAC library.
My library is just shy of 36,000 files, with a makeup of about 40% rock, 40% jazz, and the remainder a mixture of bluegrass, country, classical, folk, rap, electronic, etc. It's all encoded at the default -5 level, and the tagging consists of basic text fields, with no embedded images. I compared the total file sizes in bytes to the calculated expanded music size.
It worked out to 41.9% compression on the entire library.
Total FLAC files size: 925.092 GB (993,310,377,824 bytes)
Total uncompressed size: 1580.926 GB (1,708,243,555,030 bytes)
Ratio: 0.581
Compression: 41.9%
All encoded with -8, figures reported by foobar2000, which should mean no tags:
My “non-classical” section (performer-oriented genres, for the larger part metal): 916 kb/s, that is a ratio of .65.
My “classical” section (composer-oriented genres): 614, that's about the square of non-classical ratio
So only 35% and 38.6%, respectively. Interesting.
I limited results to: %codec% IS flac AND %__bitspersample% IS 16 AND %__samplerate% IS 44100
avg. bitrate is 960 kbps
That's mostly metal and some rock.
Man, that's only 30.3%.
A sample size of three doesn't mean much, but I'm surprised at the range - from 30.3% to 41.9% compression. I'm guessing that library content must be the big difference maker here.
you didn't calculate Porcus's values correctly:
916/1411 = 64.9% of orignal size , 35.1% compression
614/1411 = 43.5% of original size, 56.5% compresson
neither xnor's:
960/1411 = 68.0% of original size, 32% compression
Thanks. The 614 kbps calculation was wrong, but assuming 1k = 1024:
kbps x 1024 / 16 x 2 x 44100
or
kbps / 1378.125
Is that wrong for how foobar2000 calculates bitrates?
The files I selected above have a total duration of about 6 days. Dynamic range compression, clipping etc. - which is not uncommon to metal - does not compress very well.
What is wrong is the assumtion that kilobits per second (kbps) are in powers of 2. They are not.
Also, your calculation would be like doing the opposite (as if 16 x 2 x 44100 was in powers of 2). . Actually, that was correct (you're dividing the samplerate. I had read it backwards)
Afaik kbit/s means kilo bits per second = 10^3 bit/s.
As long as the denominator of 1411.2 k has the same “k”, you would in any case get the right percentage
I posted my extremes CD-rips here: http://www.hydrogenaudio.org/forums/index....st&p=800823 (http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=95670&view=findpost&p=800823)
252 kb/s and 1345 kb/s using flac -8. Notice how much TAK beats WavPack on the Piaf and how much WavPack beats TAK on the Merzbow.
By the way, meandring off topic, I think such percentage figures slow down the reading of postings, as you always have to check for every poster whether it is stated reduction or remaining. Stating bitrate instead (and censoring the sample down to CD format) is ... 'unambiguous more quickly'.
FWIW my lossless library is "only" 3,800 tracks , 110 GB, and 99.4% Flac (pretty much all at -5). Average bitrate is 824 kbps for a ratio of 0.584. Music is mostly electronica (techno, ambient, idm/glitch, jungle, etc.) with parts jazz, fusion, funk, hiphop, dub, post-rock.
Duration : 1wk 6d 18:13:27.801 (52 426 424 003 samples)
Sample rate : 44100 Hz
Channels : 2
Bits per sample : 16
Avg. bitrate : 847 kbps
Tool : reference libFLAC 1.2.1 20070917 (99.0%); reference libFLAC 1.1.3 20061120 (0.7%); reference libFLAC 1.1.2 20050205 (0.1%)
So, 60.03% of original file size
mine is
GB MB B Compression
WAV 293.36 300,400.54 314,992,792,464.00 100.00%
FLAC -0 212.42 217,521.94 228,088,285,396.00 72.41%
FLAC -1 207.16 212,131.17 222,435,657,072.00 70.62%
FLAC -2 206.64 211,595.27 221,873,717,728.00 70.44%
FLAC -3 202.06 206,911.12 216,962,038,256.00 68.88%
FLAC -4 196.14 200,851.75 210,608,326,464.00 66.86%
FLAC -5 195.73 200,430.36 210,166,468,830.00 66.72%
FLAC -6 195.72 200,421.93 210,157,620,477.00 66.72%
FLAC -7 195.47 200,160.16 209,883,137,584.00 66.63%
FLAC -8 195.06 199,746.01 209,448,876,032.00 66.49%
Mainly rock music
I'm doing a lossless encoder compression test at the moment with FLAC, ALAC, Wavpack, LA, Monkeys Audio, Optimfrog, Shorten, Takc and TTA. It's taking forever to complete but i'll start a new thread with the results
I only have FLAC -8 files, and filtering via "%codec% IS FLAC AND %__bitspersample% IS 16 AND %__samplerate% IS 44100 AND %__channels% IS 2" gives me about 15570 tracks of various genres, with 795kbps on average, which is 56.34% of the original size, or a compression ratio of 43.65%.
[...]
Tool : reference libFLAC 1.2.1 20070917 (99.0%); reference libFLAC 1.1.3 20061120 (0.7%); reference libFLAC 1.1.2 20050205 (0.1%)
Sidenote: I have a quite narrow window for displaying properties, so I got so tired at having to scroll over the “reference libFLAC” entries in order to read the LAME's, that I re-encoded my 1.1.*'s. Took a couple of days (copying backup, converting and bit-verifying ... I don't trust the --force) and likely gave room for another 50 Cent of music, pun intended.
I'm doing a lossless encoder compression test at the moment with FLAC, ALAC, Wavpack, LA, Monkeys Audio, Optimfrog, Shorten, Takc and TTA. It's taking forever to complete but i'll start a new thread with the results
Is it environment-friendly, balancing the energy use against your having to skip a step in the upgrade cycle waiting for the the job to finish, or did you have to set aside one computer for this task?
Ha
The encode script i'm running is only chewing away at 1 core so i can still use my PC as normal, i just can't turn it off
LA is currently the winner out of FLAC, ALAC and Wavpack by about 3%. Still many more encodes to go though.
Not counting the metadata blocks, it works out to:
Compressed: 688.5GB (739,319,931,688 bytes)
Uncompressed: 1243.7GB (1,335,454,659,871 bytes)
Ratio: 0.5536
The metadata blocks take up an additional 6.2GB, which is mostly embedded cover art.
My library is about 95% Classical music.
Average bitrate (according to foobar2000) is 618 kbps, that is 43,8 % of original size.
Compression ratio: 56,2 %
Not counting the metadata blocks, it works out to:
Compressed: 688.5GB (739,319,931,688 bytes)
Uncompressed: 1243.7GB (1,335,454,659,871 bytes)
Ratio: 0.5536
The metadata blocks take up an additional 6.2GB, which is mostly embedded cover art.
Tuffy, how did you determine the amount of space taken by your metadata blocks? Thanks!
Tuffy, how did you determine the amount of space taken by your metadata blocks? Thanks!
I wrote a little Python script to do it. FLAC's metadata blocks are pretty easy to walk through so it didn't take long.
Tuffy, how did you determine the amount of space taken by your metadata blocks? Thanks!
I wrote a little Python script to do it. FLAC's metadata blocks are pretty easy to walk through so it didn't take long.
To channel a kid/young adult in a cheesy commercial for a product I can't recall who repeatedly whines "....and?"...
It would be great if you could post your admittedly clever python script thanks!
To channel a kid/young adult in a cheesy commercial for a product I can't recall who repeatedly whines "....and?"...
It would be great if you could post your admittedly clever python script thanks!
Here goes. It's called with a single directory as the first argument which it searches recursively. It assumes your FLAC files end in ".flac" and match the spec (without any ID3 junk at the beginning, an initial STREAMINFO block and so on). There's no progress indicator either, but you're welcome to add some if you like.
#!/usr/bin/python
import sys
import os.path
import struct
def read_block_header(f):
"""takes FLAC file stream
returns (is last, block ID, block length)"""
i = struct.unpack(">I", f.read(4))[0]
return (i >> 31,
(i >> 24) & 7,
i & 0xFFFFFF)
def parse_streaminfo(s):
"""takes 34 byte STREAMINFO block as string
returns (sample rate, channel count, bits per sample, PCM frames)"""
i = struct.unpack(">10xQ16x", s)[0]
return (i >> 44,
((i >> 41) & 7) + 1,
((i >> 36) & 0x1F) + 1,
i & 0xFFFFFFFFF)
def FLAC_sizes(filename):
"""returns (metadata length, frames length, uncompressed size)
or raises IOError or ValueError if some error occurs"""
metadata_length = 0
f = open(filename, "rb")
if (f.read(4) == "fLaC"):
metadata_length += 4
else:
raise ValueError("not a FLAC file")
(is_last, block_ID, block_length) = read_block_header(f)
metadata_length += 4
if (block_ID == 0):
(sample_rate,
channel_count,
bits_per_sample,
PCM_frames) = parse_streaminfo(f.read(block_length))
uncompressed_size = (channel_count *
(bits_per_sample / 8) *
PCM_frames)
metadata_length += block_length
else:
raise ValueError("STREAMINFO not first block in stream")
while (is_last != 1):
(is_last, block_ID, block_length) = read_block_header(f)
metadata_length += 4
f.read(block_length)
metadata_length += block_length
return (metadata_length,
os.path.getsize(filename) - metadata_length,
uncompressed_size)
if (__name__ == "__main__"):
total_metadata_length = 0
total_frames_length = 0
total_uncompressed_length = 0
for (d, ds, fs) in os.walk(sys.argv[1]):
for f in fs:
if (f.lower().endswith(".flac")):
path = os.path.join(d, f)
try:
(metadata_length,
frames_length,
uncompressed_length) = FLAC_sizes(path)
total_metadata_length += metadata_length
total_frames_length += frames_length
total_uncompressed_length += uncompressed_length
except (ValueError, IOError), err:
print "*** %s: %s" % (path, err)
print " metadata length : %s" % (total_metadata_length)
print "total frames length : %s" % (total_frames_length)
print "uncompressed length : %s" % (total_uncompressed_length)
print " compression : %2.2f%%" % \
((float(total_frames_length) * 100) /
float(total_uncompressed_length))
my FLAC library has an average of 930 kbps which makes an average of 65,9 % compression.
it contains 90% metal
tuffy, thank you for posting the script - it works great! I discovered my metadata adds about 1% to my library size.
Using tuffy's script and not counting metadata blocks, my ratio of compressed size over uncompressed size was 52.87%.
Using foobar2000's reported average bitrate of 746 kbps over 1411.2 kbps, my ratio is also 52.87%.