High-definition surveillance cameras need to directly encode the image with a high-quality H.264 Encoder, so the image can be realistically transmitted over a standard Ethernet connection.
A main profile H.264 encoder that can encode an HD video stream in real time can be implemented in an FPGA.
This design includes a camera sensor front end module, a video compression module, an Ethernet MAC module, an embedded processor, and a multi-port frame buffer that provides memory storage to all other modules.
The multi-port frame buffer serves as a hub. All other modules send and receive data to and from the frame buffer that communicates with other modules.
The camera sensor front end module processes the video data and stores the video to the frame buffer. The H.264 encoder reads that video data from the frame buffer and performs the encoding process.
The compressed bit stream is stored in the frame buffer which is read by the Ethernet MAC module.
The H.264 encoder used in this design is an IP core available from EyeLytics.
The ‘Raster to Block’ module reads images from the frame buffer in raster scan order and rearranges it in macroblock format which is sent to the motion estimation engine (MEE) and the spatial estimation engine.
The MEE reads a reference image from the frame buffer and finds the motion vector of the current macroblock by searching the reference image. It also determines the best partition used for each macroblock.
The H.264 specification allows four different inter-prediction macroblock partitions and four different sub-macroblock partitions.
The best motion vector and the best partition together with the corresponding macroblock coding cost are sent to the mode decision module and to the motion compensation engine.
The spatial estimation engine (SEE) finds the best estimation of the current macroblock by using neighbouring pixel values within the same image.
There are a total of nine intra 4×4 luma modes, four intra 16×16 luma modes and four intra chroma modes defined in the H.264 specification. The SEE determines the best luma and chroma modes to be used for the current macroblock.
These are sent to the mode decision module which compares the macroblock coding costs and determines the best prediction to be used. This prediction can either be inter or intra prediction.
The best intra mode is sent to the intra prediction engine which uses neighbouring pixel values to generate the intra prediction for the current macroblock.
Based on the mode decision result, the transform and quantisation engine subtracts the prediction values from the current macroblock and generates the quantised coefficients.
The collected information is sent to the CABAC Module to generate the final bit stream which is sent to the frame buffer.
Camera sensor front-end module
The camera sensor used in this design is a MT9P031, a 5M pixel sensor from Aptina. This camera sensor is programmed to send 1920×1080 video images to the FPGA at 30 frames per second.
The video data format used is 12-bit RGB Bayer pattern. The data transfer clock is set to 100MHz.
An embedded processor is used for programming the various registers within the different modules as well as to run the TCP/IP stack for streaming the compressed video.
Working with the Ethernet MAC module, the embedded processor runs a lightweight implementation of the TCP/IP stack, a streaming application, and a web server application. The frame buffer connects to two DDR2-SDRAM chips with a 32-bit data bus and runs at 150MHz.
The FPGA design takes about 50,000 logic elements (LEs) with the Encoder Module using approximately 30,000 LEs when implementing a main profile H.264 encoding for a 720p/30 video stream.
Authors are Suhel Dhanani, senior manager – DSP at Altera and Mankit Lo, CEO of EyeLytics