Comparison between DSP and ordinary MCU

Consider an example of digital signal processing, such as a finite impulse response filter (FIR). In mathematical terms, the FIR filter is a series of dot products. Take an input quantity and an ordinal vector, multiply the coefficient and the sliding window of the input sample, and then add all the products together to form an output sample.

Similar operations are repeated in large numbers during digital signal processing, so that devices designed for this purpose must provide specialized support, facilitating the shunting of DSP devices and general purpose processors (GPPs):

1 Support for dense multiplication operations

GPP is not designed to do dense multiplication tasks. Even some modern GPPs require multiple instruction cycles to do a multiplication. The DSP processor uses specialized hardware to implement single-cycle multiplication. The DSP processor also adds an accumulator register to handle the sum of multiple products. The accumulator registers are typically wider than the other registers, adding extra bits called result bits to avoid overflow. At the same time, in order to fully embody the benefits of specialized multiply-accumulate hardware, almost all DSP instruction sets contain explicit MAC instructions.

2 memory structure

Traditionally, GPP uses the von Neumann memory structure. In this configuration, only one memory space is connected to the processor core through a set of buses (an address bus and a data bus). Typically, doing a multiplication will result in 4 memory accesses, using at least four instruction cycles.

Most DSPs use a Harvard architecture that divides the memory space into two, storing programs and data separately. They have two sets of buses connected to the processor core, allowing them to be accessed at the same time. This arrangement doubles the bandwidth of the processor memory and, more importantly, provides data and instructions to the processor core at the same time. In this layout, the DSP is able to implement a single-cycle MAC instruction.

A further problem is that typical high-performance GPPs now contain two on-chip caches, one for data and one for instructions, which are directly connected to the processor core to speed up access at runtime. Physically, the on-chip dual memory and bus architecture is almost identical to the Harvard architecture. However, logically, there are still important differences between the two.

GPP uses control logic to determine which data and instruction words are stored in the on-chip cache, which programmers do not specify (or may not know at all). In contrast, DSPs use multiple on-chip memories and multiple sets of buses to ensure multiple accesses to memory per instruction cycle. When using a DSP, the programmer has to explicitly control which data and instructions are to be stored in on-chip memory. When programmers write programs, they must ensure that the processor can effectively use its dual bus.

In addition, the DSP processor has almost no data cache. This is because the typical data of the DSP is the data stream. That is to say, after the DSP processor calculates each data sample, it discards it and almost no longer uses it.

3 zero overhead loop

If you understand a common feature of DSP algorithms, that is, most of the processing time is spent on executing smaller loops, it is easy to understand why most DSPs have dedicated hardware for zero-overhead loops. The so-called zero-overhead loop means that the processor does not take the time to check the value of the loop counter when the loop is executed, the condition is shifted to the top of the loop, and the loop counter is decremented by one.

In contrast, the GPP cycle is implemented using software. Some high-performance GPPs use branch-prediction hardware to achieve almost the same effect as hardware-supported zero-overhead loops.

4 fixed point calculation

Most DSPs use fixed-point calculations instead of floating point. Although DSP applications must pay great attention to the accuracy of the numbers, it should be much easier to do with floating point, but for DSP, cheap is also very important. Fixed-point machines are cheaper (and faster) than the corresponding floating-point machines. In order to ensure the accuracy of numbers without using floating-point machines, DSP processors support saturation calculations, rounding and shifting in both instruction set and hardware.

LED Light Module

Shenzhen Isource lighting Co., Ltd , http://www.isourceled.com