Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: Wavpack (hhx3) is very slow on quad core cpu (Read 10309 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Wavpack (hhx3) is very slow on quad core cpu

When I'm encoding four WAVs at once using foobar2000, the encoders utilize 25% each at the beginning (~65x realtime) but it soon drops to a CPU load of only ~7% for each instance in average (~20x realtime). 

And when using 4 command shells to encode the WAVs at the same time, there's this speed drop, too, although a not as severe.

What's happening here? What's the bottleneck? My CPU's shared cache? My harddrive?

wavpack v4.5 using -hh -x3
foobar2000 v0.9.5.4
cpu Core 2 Quad Q6700 (4x2.66GHz)

PS: when not using the -x option the CPU load drops even more

PPS: Ah! When using -x6 every instance of Wavpack now utilizes 25% of the overall CPU time! Interesting...

I'm quite sure that my HDD is the bottleneck here... would it be possible to increase the speed by pre-caching bigger amounts of the WAV files? In order to decrease the amount of disk accesses... my HD is quite busy with random seeks during the test. A different more radical angle would be to pre-cache the whole WAVs into RAM, but I find that a bit extreme.

Wavpack (hhx3) is very slow on quad core cpu

Reply #1
I'm quite sure that my HDD is the bottleneck here... would it be possible to increase the speed by pre-caching bigger amounts of the WAV files?

Try to defragment your HDD first. Maybe it'll improve encoding speed.

Added:
And try to set number of threads to 2 (advanced settings in foobar2000). It can decrease amount of "random seeks"

Wavpack (hhx3) is very slow on quad core cpu

Reply #2
To be honest, i'm a bit surprised that with nowadays HDD speeds, apps still write small chunks of data at a time. Its well known that when writing large amounts of data in multiple tasks, seeking is THE bottleneck. In the case of optical drives, thats even noticable while reading data - speed penalties of 90% or even more, just because of seek-saturation - and it doesn't get better with drives, which write/read faster, but still seek as slow as ever. To get this rant to a conclusion: Why is this well-known problem so rarely addressed efficiently? Is it something which should better be "fixed" on the driver-level of the OS?
I am arrogant and I can afford it because I deliver.

Wavpack (hhx3) is very slow on quad core cpu

Reply #3
It does sound like it is blocking on I/O.
Perhaps a bigger read buffer for each thread/instance would help alleviate this.

Edit: @Lyx: I think some kernels such as Linux do implement read-ahead caching which alleviates this to some extent.
Quote
"Readahead is a technique employed by the kernel in an attempt to improve file reading performance. If the kernel has reason to believe that a particular file is being read sequentially, it will attempt to read blocks from the file into memory before the application requests them. When readahead works, it speeds up the system's throughput, since the reading application does not have to wait for its requests. When readahead fails, instead, it generates useless I/O and occupies memory pages which are needed for some other purpose."

http://kerneltrap.org/node/6642

Wavpack (hhx3) is very slow on quad core cpu

Reply #4
FYI, average read and write speeds was 3 MB/s... I have an "old" SATA I drive. Means no native command queueing etc... so when four processes read and write at the same time, the effective I/O throughput drops tremendously. Unless bigger read/write buffers are used, which is probably not the case with wavpack.exe.

Wavpack (hhx3) is very slow on quad core cpu

Reply #5
PPS: Ah! When using -x6 every instance of Wavpack now utilizes 25% of the overall CPU time! Interesting...


That would be consistent with you having an I/O bottleneck, since the higher -x options are more computationally expensive and therefore the CPU rather than I/O becomes the limiting factor.

I don't know how foobar works for I/O but I'd imagine it was slower when run under foobar because foobar itself will be reading and writing the data to the disk and using the dll for encoding. If this isn't well optimised then you end up with lots of contention for locks inside the C library I/O routines. When running 4 comandline encoders you have four separate processes and therefore no contention for C library locks.

In the case of the command line encoder performance will depend on how the disk I/O is implemented, a naive implementation will cause performance problems when the amount of time required for the processing is of the same order as the amount of time required for the disk I/O.

Without rewriting the command line encoder's I/O (which is possible because it's open source) or Foobar (which isn't because it's closed) the best thing you can probably do is add a second hard disk. If you are encoding lots of files then you'll probably get better performance out of having two sets of files on two disks rather than RAIDing them.

Wavpack (hhx3) is very slow on quad core cpu

Reply #6
Without rewriting the command line encoder's I/O (which is possible because it's open source) or Foobar (which isn't because it's closed) the best thing you can probably do is add a second hard disk. If you are encoding lots of files then you'll probably get better performance out of having two sets of files on two disks rather than RAIDing them.

Yeah, I have more than one hard disk in my system and it was this approach you mention that I used to overcome the HD random access time bottleneck in the past. The only difference, now that I've got a new quad core CPU, is that I'll probably have to make use of this trick more often.

I'm curious how much NCQ will improve the performance.

Quote
If this isn't well optimised then you end up with lots of contention for locks inside the C library I/O routines. When running 4 comandline encoders you have four separate processes and therefore no contention for C library locks.
Yes, foobar2000's converter uses by default stdout for serving a WAV (or PCM?) stream to the encoders, even when the source files are already in WAV file format. So you think the speed increase when using command shells, could indicate an unoptimized use of I/O sytem functions in foobar2000? Or maybe it's the fb2k's converters own parallel conversion from WAV files to PCM stdio stream...?

Wavpack (hhx3) is very slow on quad core cpu

Reply #7
Windows is likely caching all your writes, and probably most of your reads anyway, so I doubt disk performance is an issue.

Wavpack (hhx3) is very slow on quad core cpu

Reply #8
That's basically true, but it only happens if the files have been read right before. Under usual circumstances this is rarely the case, for instance when transcoding or when there was some time or even a reboot between ripping the CD and encoding the WAV.

Wavpack (hhx3) is very slow on quad core cpu

Reply #9
I would say your HD is the bottleneck.

Here's why:

First, you said it's encoding 4 WAV files at 65x realtime each.

Ok, a WAV file is 1411kbps bitrate, or about 167 kilobytes per second.

Now, encoding it at 65x realtime means that the WAV is being read at about 10.8 megabytes/sec.

Multiply that by 4, because you are doing 4 WAVs, and that means the HD is being asked to transfer about 43 megabytes/sec sustained.

That's just reading.

What about writing and general Windows HD access?  That could well push it to almost 50 megabytes/sec total transfer rate which is often near the upper range of most HD drives' real-world sustained transfer rates, especially if you are doing this on a laptop/notebook computer or an external USB 2.0/Firewire drive.

Try encoding just 3 WAVs simultaneously, instead of 4, and you'll have plenty of "wiggle room" so that the encoding process isn't waiting more than necessary for the HD.

Wavpack (hhx3) is very slow on quad core cpu

Reply #10
That's basically true, but it only happens if the files have been read right before.


Which is the case here since you call fopen() on the file right away, letting the OS know that it should be cached into RAM on first read.  Its not like you do a disk seek for each 256KB (or whatever you chunk size is).  You're probably reading in 10s of MB at a time and caching the rest, and therefore should be getting very close to peak sequential read speed (barring any windows or drivers bugs anyway).

Wavpack (hhx3) is very slow on quad core cpu

Reply #11
Try encoding just 3 WAVs simultaneously, instead of 4, and you'll have plenty of "wiggle room" so that the encoding process isn't waiting more than necessary for the HD.
That has increased the average overall read and write speed to both 8MB/s, all three threads utilize 25% of CPU time now. Encoding speed is reported to be 45x realtime.

Near the end of the encode read speed drops to ~0.xMB/s (fb2k is done reading the WAVs) and the write speed increases to ~30MB/s.

Using 2 threads results in a drop of the R/W speed to 5.8MB/s and 30x realtime speed. So using 3 threads for these compression settings (-hhx3) indeed is the optimum.

 

Wavpack (hhx3) is very slow on quad core cpu

Reply #12
First, you said it's encoding 4 WAV files at 65x realtime each.

Foobar's displayed encoding rate is net, for all tasks combined.