Video processing benefits from hardware acceleration

A processor with a hardware video accelerator can provide an interesting alternative to a traditional DSP-based multimedia system design, writes Israel Shem-Tov, high-end applications engineer at Future Electronics (Israel).


Recent years have seen the proliferation of a new type of general-purpose processor containing a hardware accelerator video processing unit (VPU).

This architecture provides an alternative to the traditional DSP-based approach that implements video processing in software. While a DSP system can support a wide variety of video formats, the requirement for ever faster computation of video data streams strains many DSPs to the limit.

Hardware accelerator VPUs are finding favour, therefore, in part because they offer low CPU overhead, and lower power consumption than a DSP-based system.

For designers of embedded systems, however, there is another strong reason to evaluate a VPU: both the implementation of video functions in hardware, and the software and tools available from the device manufacturer, serve to ease the design of sophisticated, modern user interfaces.

For engineers who are new to the world of embedded graphics, evaluating a device such as the i.MX53 application processor from Freescale can provide a route to implementation of high-quality video.

Hardware accelerator VPU architecture

The hardware accelerator VPU in the i.MX53 processor is typical of the breed, in that it provides support for a wide range of video formats. It also supports multiple decoding and full-duplex multi-party calls simultaneously.

Like most hardware accelerators, the i.MX53 VPU has a rich set of integrated video functions. 

Configuring decoding/encoding and the host interface, for instance, is performed via firmware in an embedded programmable DSP named the ‘BIT processor’ (see Figure 1). This provides for an unusual level of programmability and flexibility.

And moving video data and commands into and out of the VPU is also easy because it operates over two, straightforward interfaces: the 64-bit AMBA3 AXI (Advanced eXtensible Interface) for data throughput and the 32-bit AMBA3 APB (Advanced Peripheral Bus) for system control.

For control of the VPU by the host processor, the VPU provides host interface registers. Most commands and responses between the host processor and the VPU are transmitted through these registers.

To control the VPU, a set of Applications Programming Interface (API) functions is also provided that includes all of the required operations on the host processor side.

Nearly all control functions, including rate control, Flexible Macroblock Order, Arbitrary Slice Ordering, video CODEC control and error resilience, are implemented in the BIT processor. This means that the resources required for the host CPU to control the VPU are small, typically requiring no more than 1MIPS of processing throughput.


Elements of the i.MX53 VPU

The i.MX53 VPU consists of two main components: a video CODEC, and a VPU gasket. The video CODEC is the heart of the video accelerator. It is comprised of the embedded 16-bit BIT processor, video CODEC hardware, and a bus arbiter/interface.

The VPU gasket is tasked with converting the AMBA APB3 bus to the IP Sky Blue bus.

This VPU can handle a maximum of four processes simultaneously. Each process can have a different format – it can, for instance, handle MPEG-4, MPEG-2, H.264 and VC-1 bitstreams side-by-side.

Each decoding process consists of three functional elements:

– Create a process: the software creates and configures processes.

– Run a process: at the correct time, the software will begin a specific process. The correct time is when decoding is in the Idle state and the bitstream to be decoded is available in the external memory.

– Quit a process: the software quits a specific process.

If more than one process is ready to run, a different process ID must be assigned to each one in a range from 0 to 3, via a function called RunIndex. The ID is assigned based on the order of creation

One process does not take priority over another. After creating all processes at the initialisation step, the host enables the BIT processor to execute the processes defined in the RunIndex. All processes are executed in a time division-like mechanism.

Critical role of memory management

Memory management in the VPU is important, since tasks such as 1080i/p decoding need extremely high memory bandwidth.
Unfortunately, memory bandwidth is sometimes constrained, especially in systems running modern operating systems that are optimised for multi-tasking.

If insufficient memory is available to the VPU, the result will be discontinuous video display or even decoding errors. Careful management of memory bandwidth can help to prevent this problem.

The VPU has full access to the entire external memory, which is used to load or store image frames, bitstreams, programs and data for the BIT processor. The buffer size requirement is dependent on the video standard and the target application. Each standard also requires a different temporary memory size when it processes de-blocking or overlap-smoothing filtering.

The VPU uses six kinds of buffer:
1) Frame buffer, for storing image frames
2) BIT processor program memory, for boot code and firmware
3) Working buffer, for intermediate data from the BIT processor and the video decoding hardware
4) Bitstream buffer, for loading bitstream data
5) Parameter buffer, for BIT processor command execution arguments and return data
6) Search RAM, used by the memory module to reduce SDRAM bus loading

The host processor has to assign buffers for bitst reams on a per-instance basis. If the VPU handles n bitstreams simultaneously in an application, the host should assign n bitstream buffers, and specify the base address and size.

One of the biggest advantages of using a processor with an embedded VPU is that ready-made video decoding and encoding functions are implemented in hardware.

The design engineer is helped by having simple APIs, fully verified functions and stable real-time performance.

Providers of operating systems and board support packages, such as Bsquare, can also help designers. It supports products such as Microsoft Silverlight and HTML5, which abstract the programmer from the hardware, which simplifies the design of a GUI and helping embedded designs with a contemporary user interface get to market more quickly.


Leave a Reply

Your email address will not be published. Required fields are marked *