Electronics Weekly Magazine
Loading

Sign-up for newsletters:

Electronics Weekly newsletters - Sign up for Made By Monkeys, Mannerisms, Gadget Master and Daily and Monthly newsletters

Globalpress Summit: Tensilica launches enhanced cores

David Manners
Monday 28 March 2011 18:58

Tensilica has announced new Xtensa cores designated LX4 for compute-intensive dataplane and DSP functions such as imaging, video, networking and baseband wired/wireless communications which quadruple data bandwidth, reduce power and deliver faster development times.

The LX4 DPU (Dataplane Processor Unit) supports local data memory bandwidth of up to 1024 bits per cycle, wider VLIW instructions up to 128 bits for increased parallel processing, and a cache memory prefetch option that boosts overall performance for systems with long off-chip memory latency.

"Tensilica’s DPUs combine control and DSP functions in cores that can be optimized to provide 10x to 100x performance improvement compared to a standard RISC or DSP core," says Steve Roddy, Tensilica’s marketing vp, " Xtensa LX4 cores range from an ultra-small programmable DPU as exemplified by a 1GigaMAC DSP in 0.01mm2 (in 28 nm process technology) up to the ConnX BBE 64-128 with over 100 GigaMAC per second performance."

The Xtensa LX4 DPU has four times the local data memory bandwidth of the Xtensa LX3 DPU, with up to two 512-bit load/store operations per cycle. Designers can now create wide SIMD (single instruction multiple data) DSPs that pump more data into more MAC (multiply accumulate) units each clock cycle for extremely fast performance. This makes Xtensa LX4 DPUs ideal for wired and wireless baseband processing, video pre- and post-processing, image signal processing, and various network packet processing functions.

This enhanced local memory bandwidth is in addition to Tensilica’s existing customizable local port and queue interfaces that provide unlimited point-to-point data and control signal bandwidth.

Tensilica now offers both the Port/Queue interfaces that allow connections between Xtensa DPUs and other system block just like traditional RTL block interconnection, and the new ultra-high bandwidth local memory connections.

Xtensa LX4 cores double the allowable width of its Flexible Length Instruction eXtensions (FLIX) instructions from 64- to 128-bits wide. This allows the execution of twice the number of independent operations per clock cycle. Every wide FLIX instruction is seamlessly intermixed with the shorter base Xtensa instruction set so there is no mode switch penalty when using FLIX.

With FLIX, the Xtensa LX4 DPU can deliver the performance characteristics of a specialty VLIW processor with smaller code size than competing VLIW DSPs. Tensilica’s Xtensa C/C++ compiler automatically extracts parallelism from source code and bundles multiple operations into single FLIX instructions.

An Xtensa LX4 DPU with wide FLIX instructions running parallel operations at low clock frequency can often deliver performance matching that of larger, higher MHz non-VLIW cores but consumes far less energy completing the same task.

The new data prefetch option reduces cycle counts in long-latency designs by fetching data from system memory ahead of its use. This way, the data is ready and waiting when the application code needs it, reducing wasted cycles when the DPU would have to wait for data. The benefits are seen most when streaming data from contiguous memory locations. It’s a much simpler alternative for memory access optimization than adding a separate DMA (Direct Memory Access) engine, which requires additional software programming and application code tuning.

Tensilica’s tools automate not only the creation of the DPU hardware but also the creation of the matching comprehensive software development tool set. Because the underlying base Xtensa instruction set is never changed, designers can access Tensilica’s ecosystem of third party applications software and development tools even after heavily customizing the Xtensa DPU.

Customizable Xtensa DPUs are compatible with major operating systems, debug probes and ICE (in-circuit emulator) solutions, and come with an automatically generated, complete software development toolchain including an advanced integrated development environment based on the Eclipse framework, a world-class compiler, a cycle-accurate SystemC-compatible instruction set simulator, and the industry standard GNU toolchain.

Tensilica has also come out with its Vectorization Assistant tool which suggests ways developers can improve compiler vectorization of their C-code when running on SIMD (single instruction multiple data) DSPs. The Vectorization Assistant explains what is preventing further vectorization so the software developer can improve the source C-code to take advantage of the DPU’s parallel execution units.

The Xtensa LX4 DPU is available now from Tensilica. The base Xtensa LX4 DPU can reach speeds of over 1 GHz in 45 nm process technology (45GS) with an area of 0.044 mm2.

 

 

Comments powered by Disqus

Share the content

Most Viewed

Products

Related Jobs

Resources