Video Codecs - their CPU usage

2015-10-19 12:17:06

Hi,
I'm looking at building a series of videos on a machine with a 206 MHz Intel SA-1110 microprocessor (a StrongARM SA1110 in all but name). The screen itself is only 240 by 320 but annoyingly, it uses 16-bits per pixel (5:6:5) and doesn't have the benefit of SIMD operations. I presume I have to decode to 24-bit (byte each of red, green & blue). Happily, the CPU does have an excellent barrel shifter so the conversion should be pretty quick if some good, hand-coded assembly language is used. I can organize it so that the data I am working on is already in the L1 cache for reading & s push multiple registers so I can fill an entire cache line in 1 instruction. Meanwhile, I've moved onto the next part of the image.

I have tried REALLY hard to get the Dhrymips or core-marks or ANY indication of how much CPU time the decoder takes. With H264 I know it's best, if you can, to set it to maximum complexity (which uses analysis by synthesis) so it takes you middle-range (2GHz) PC overnight to compress it - the compression time doesn't matter since it isn't live. In fact, it's taken from a GY-HM650 camcorder in AVCHD format set to highest quality (you get 10 minutes footage). It will transcode the output into H.264, HD MPEG2 (35/25/19Mbps) and other formats.

The problem is, I cannot find an example where H.264, Vorbis Ogg, Vorbis Opus and all of the others are tested for the CPU usage for a test sequence. Since so many use 8:8:8 final output, it may even be possible to reduce the accuracy of the decoder (to save time) since the bottom 2 or 3 bits are lost anyway. I have E-mailed everyone connected to these formats and NOBODY want's to answer the question. I think that they just assume it will only be used on GHz+ CPUs <-not the plural since 4-core & 8-core machines are out there.

Audio is simply a stereo 16-bit DAC which deals with double-buffering itself (although with a few lines of code, triple buffering is simple).

Can someone PLEASE help. I promise I'm not one of those people who ask for an experts time if I hadn't looked very, very hard myself. A test sequence of different sizes, ideally, but just 1 size will give me a good enough guesstimate. I've read that some of the CPUs have an extra 2K instruction (or maybe mixed) cache which I believe can also be switched into being a 2K scratchpad. That's what the original Playstation had. No cache, just a 32K scratchpad for code & data. Never underestimate the power that can bring. The Atari Jaguar had about 12 waitstates for a cache-miss so everyone codeded their games in 16K lumps that were transferred in as needed. They even managed a reasonable version of doom - John Carmack is about the best programmer I've ever met. His historic use of a dummy read to fill the destination cache line when he was drawing walls. Take that one MOV Cl,[EDI] out of the loop and the game runs at ½ the original speed!

Many thanks in advance,
CC

Notice