# Design of High Speed and Area Efficient FIR Filter Architecture using modified Adder and Multiplier

M.Jayashree

Department Of ECE, RMK Engineering College, Chennai ,India.

# Abstract:

Finite impulse response (FIR) filter is one of the important components in any DSP and communication systems. Filter architecture contains many components; Two of the main components are adder and multiplier. Different types of adders and multipliers are available in the digital circuits, but need an efficient adder and multiplier design to design efficient filters. The existing adder is ripple carry adder and the existing multiplier is Wallace tree multiplier, both take more area and delay. To reduce the drawbacks in the existing system, the partial products generation and reduction needs a new efficient adder and multiplier named Reduced square root carry select (CSLA) adder and Bi-recoder multiplier respectively, is implemented. This modified adder and multiplier overcomes the existing drawbacks. It is implemented by verilog HDL. Then both adder and multiplier are compared with the existing adder and multiplier and the performance is analyzed. The design is implemented using Modelsim 6.3c and Xilinx ISE version 12.4. Finally the modified design is applied in the design of direct form FIR Filter and thus the efficient FIR Filter is obtained.

*Keywords* — FIR Filter, High speed, area efficient, partial products, Reduced square root carry select Adder, Bi-Recoder Multiplier.

## I. INTRODUCTION

One of the most extensively used functions executed in DSP is Finite Impulse Response (FIR) filtering. In several applications, in order to attain high spectral suppression and noise reduction, FIR filters are used. A lot of prior efforts for decreasing power consumption of FIR filter usually focus on the miniaturization of the filter coefficients whereas maintaining a fixed filter order. FIR filter structures are simplified to minimizing the number additions, subtractions and add & shift operations. Though, one of the problems encountered is that one time the filter architecture is determined, the coefficients cannot be altered; consequently, those are not appropriate to FIR filter with programmable coefficients.Finite impulse response (FIR) filter is one of the important components in any DSP and communication systems. The output from the DSP processor is depends on the FIR filter, so need an efficient FIR filter design, to achieve an efficient output. Filter architecture contains many components; one of the main components are adder and multiplier. Different types of adders and multipliers are available in the digital circuits, but need an efficient adder and multiplier design to get efficient filters. In the existing Wallace tree multiplier was designed and implemented using verilog HDL. In this phase the existing adder is ripple carry adder and is replaced by square root carry select adder and the performance is compared and as the modified adder has improved performance and to reduce the drawbacks in the existing system this adder is used. It is one of the best adder in the digital circuit design. This Multiplier is designed by verilogHDL, after the design Wallace tree multiplier and sqrt csla adder is implemented in FIR Filter and the result is analysed . Implement the design using Modelsim 6.3c and Xilinx ISE version 12.1 Finally the designed adder and multiplier are applied into the FIR filter, and show the best

filter. Finite Impulse Response (FIR) filter is used to filter the noise or unwanted signals at finite impulse durations.

Multiplication and Accumulation (MAC) unit estimates the duration of periodic impulses. Therefore, high performance of multiplication and accumulation architectures is required to improve the performance of digital FIR filter. In this paper, a novel, reduced complexity SQRT CSLA based Wallace Tree multiplier is incorporated into multiplication of direct form FIR filter. Hence, absolutely we can improve the performance of digital FIR filter than other best existing FIR filters. DSP applications include audio and speech signal processing, sonar, radar and other sensor array processing, spectral estimation, statistical signal processing, digital image processing, signal processing for telecommunications, control of systems, biomedical engineering, seismic data processing, among others.





Ripple Carry Adder (RCA) is one of the basic VLSI based adders which is largely affected by Carry Propagation Delay (CPD). To reduce the CPD of circuit, Carry Select Adder (CSLA) is developed in past. In CSLA circuit, N-bit data is divided into  $\sqrt{N}$  groups to provide the parallelism. Hence, this circuit is named as SQRT CSLA. Divided each and every group can operate instantly at same time. However, RCA circuits of SQRT CSLA reduce the performance in terms of speed. Hence, one set of RCA circuits is replaced by BEC circuits (have same functionality with less number of gates) to increase the speed of the adder significantly. The circuit diagram of 16bit BEC based SQRT CSLA circuit is illustrated in the figure 2.





Every group structures have RCA, BEC and Multiplexer circuits, hence most essential components to design group structures of SQRT CSLA are Full Adders (FAs), Half Adders (HAs), Logic Gates (AND, EX-OR and NOT) and Multiplexers. For instance, group-2 and group-3 structures of 16-bit SQRT CSLA circuits are illustrated. Finally, Multiplexors are used to provide final sum outputs. Carry input

(Cin) is given to the selection input of first group of Multiplexers. Remaining groups get the Carry inputs from

previous groups. Hence, final stage of SQRT CSLA only cause little CPD than traditional RCA circuit is shown in the figure 3.



# Fig 3. Proposed Group2 structure for proposed SQRT CSLA

In this, the complexity of BEC circuits and multiplexer circuits are realized and re-constructed to increase the performance in terms silicon area and power consumption. Redundant logic function of each group structures are identified and eliminated to reduce the hardware complexity. Hence, the developed adder circuit is named as "Reduced Complexity SQRT CSLA. The circuit diagram of reduced complexity SQRT CSLA for 4-bit addition Similarly, we can extend and compress the circuit of for group-5 structure and group-2, group- 3 structures. When compared to the traditional group structures of BEC based SQRT CSLA, developed group structures of reduced complexity SQRT CSLA reduces the gate count value significantly. Theoretically, 38% of gate counts are reduced in reduced complexity SQRT CSLA than traditional SQRT CSLA adder circuits. Further, the performance of reduced complexity SQRT CSLA is compared with Compressor based adder circuits are shown in the figure 4.



Both compressors based digital adder and reduced complexity SQRT CSLA adder is incorporated into the addition part of Bi-Recoder multiplier independently. The performance of reduced complexity SQRT CSLA based Bi-Recoder is better than the performance of compressors adder based Bi-Recoder due to less hardware complexity of reduced complexity SORT CSLA. Both compressors based digital adder and reduced complexity SQRT CSLA adder is incorporate addition part of Bi-Recoder multiplier independently. The performance of reduced complexity SQRT CSLA based Bi-Recoder is better than the performance of reduced complexity SQRT CSLA. Hence, this circuit is named as SQRT CSLA. Divided each and every group can operate instantly at same time. Therefore the resultant partial products of the bi-recoder multiplier are added using the SQRT CSLA. The SQRT CSLA are the fastest adders. The carry output of one stage is given as a carry input to the next stage.

#### ADVANTAGES OF SQRT CSLA

- 1. Hardware implementation is easily achieved.
- 2. Power consumption is reduced to 17%.
- 3. Area(LUT and Slice) reduced to 9.7%.
- 4. The delay of the system is reduced to 6%.
- 5.

## III. BASIC THEORY OF BI-RECODER MULTIPLIER

The partial product generation is the first method of any multiplier According to array multiplier with the proposed structure is shown below in the figure 5. The AND gates are used to provide partial product generation. On the other hand, 2:1 Multiplexers are used to provide partial product generation. Multiplicand value is directly given to one of the input of 2:1Multiplexer and N-bit of zero's are given to another input of 2:1 Multiplexer.



Fig 5: Bi-Recoder Multiplier

For instance, 8-bit multiplier requires 8 multiplexer to provide the partial product results. In every stage, single bit of multiplier is considered as selection input of Multiplexer. If it is zero, Multiplexer simply passes '0' to output else if it is one, Multiplexer passes the multiplicand value to output. Multiplexer based partial product generation technique considered as a basement tutorial for designing a novel Bi- Recoder multiplier. In every stage of Bi-Recoder multiplier, two bits of multiplier value are considered as selection input. If it is '00' means, Multiplexer simply passes'0' to output else if it is '01' means, Multiplexer passes the multiplicand value to output else if it is '10' means, Multiplexer passes the 1-bit left shifted value of multiplicand to output else it is '11' means, Multiplexer passes the addition value of multiplicand and 1-bit left shifted value of multiplicand to output. In this way, Bi- Recoder multiplier produces the partial product values effectively. For instance, 8-bit multiplier requires only 4 multiplexer to provide partial product generation. Hence, hardware complexity of Bi-Recoder multiplier reduces effectively. The partial product generation of Bi- Recoder multiplier. In this example, 8-bit multiplier is considered. In 'a' represented as 8-bit Multiplicand value and 'b' represented as 8-bit Multiplier value. In each stage, 9-bit partial product output is generated under the condition of selection inputs. Further, Wallace tree reduction method is used to reduce the 4-rows of partial product generation values into2- rows of partial product generation values. Hence, the partial product generation circuit of Bi-Recoder absolutely reduces the hardware complexity, delay and power consumption of multiplier. The best performance with help of only partial product generator circuit, because the performance of any multiplier depends on both partial product generation and types of adder used for adding the partial product generation values.

#### **IV. Bi-Recoder Based FIR Filter**

Digital Signal Processing (DSP) operations are widely used in wireless communication Technologies to control and guide the signal flows. Convolution, Correlation, Frequency Transformation and filtering are the important operations of DSP applications. In this research work, Finite Impulse Response (FIR) filter is considered for improving the performance of digital filtering process in wireless communication technology. Large endeavours have been worked on direct form digital FIR filter to improve the performance in terms of high speed and throughput. The relationship of input- output of Linear Time Invariant (LTI) System is represented as in equation,

 $yout(n) = \sum Coeff p Xin (n - 1)$ 

Where, xin(n) represents the input samples of FIR filter, yout(n) represents the output samples of FIR filter, N is the order of the filter or length of the filter and Coeff p denotes the coefficient of filters.

Impulse response of FIR filter must be finite and therefore, Periodical multiplication and accumulation structures are used to maintain the impulse response of FIR filter as finite. Square Root Carry Select Adder (SQRT CSLA) is one of the best VLSI based adders, because it utilizes less hardware complexity and high speed. The combination of Ripple Carry Adder (RCA) and Binary to Excess1 Conversion (BEC) unit is to reduce the propagation delay of addition process. In SQRT CSLA, N-bit data can be divided into  $\sqrt{N}$  groups for performing parallel addition process. Reduced complexity Wallace multiplier is developed for the design of digital FIR filter. The matrix of triangular order outputs are divided into three row groups. Full Adders (FAs) are used for adding three bits and Single bit and a group of two bits are moved to the next stage directly. In final stage of Wallace tree multiplier require sufficient N-bit binary adder for performing accumulation operation. Efficient CSLA circuit is used for addition part of reduced complexity Wallace multiplier. Parallel Prefix Han-Carlson Adder, for addition part of reduced complexity Wallace multiplier.

#### SIMULATION RESULTS:

#### [1] FIR FILTER RESULTS WITH BIRECODER MULTIPLIER

#### AREA

| Device Utilization Summary                     |      |           |             |         |
|------------------------------------------------|------|-----------|-------------|---------|
| Logic Utilization                              | Used | Available | Utilization | Note(s) |
| Number of Slice Flip Flops                     | 35   | 1,536     | 2%          |         |
| Number of 4 input LUTs                         | 67   | 1,536     | 4%          |         |
| Number of occupied Slices                      | 46   | 768       | 5%          |         |
| Number of Slices containing only related logic | 46   | 46        | 100%        |         |
| Number of Slices containing unrelated logic    | 0    | 46        | 0%          |         |
| Total Number of 4 input LUTs                   | 68   | 1,536     | 4%          |         |
| Number used as logic                           | 67   |           |             |         |
| Number used as a route-thru                    | 1    |           |             |         |
| Number of bonded IOBs                          | 26   | 124       | 20%         |         |
| Number of BUFGMUXs                             | 1    | 8         | 12%         |         |
| Average Fanout of Non-Clock Nets               | 3.28 |           |             |         |



#### DELAY

Timing Summary: -----Speed Grade: -5

> Minimum period: 6.360ns (Maximum Frequency: 157.227MHz) Minimum input arrival time before clock: 4.357ns Maximum output required time after clock: 10.881ns Maximum combinational path delay: No path found

#### Timing Detail:

All values displayed in nanoseconds (ns)

FIR FILTER OUTPUT

# Image: selection the selection of the selection of

#### 2.SQRT CSLA OUTPUT

| File Edit View Add                                                                              | File Edit View Add Format Tools Window |                                        |                                         |                                         |       |  |
|-------------------------------------------------------------------------------------------------|----------------------------------------|----------------------------------------|-----------------------------------------|-----------------------------------------|-------|--|
|                                                                                                 |                                        |                                        |                                         |                                         |       |  |
| Message                                                                                         | s [                                    |                                        |                                         |                                         | 1. 1. |  |
|                                                                                                 |                                        | 00000000000000000000000000000000000000 | 0000000000000001110<br>0000000011100011 | 000000000000000000000000000000000000000 |       |  |
| sqrt_cslajsel<br>↓ (sqrt_cslajsel<br>↓ (sqrt_cslajsel<br>↓ (sqrt_cslajsel1<br>↓ (sqrt_cslajsel2 |                                        | 000000001100010                        |                                         | 2000000011101100                        |       |  |
| ✓ jourt_colajseiz                                                                               | 510<br>510                             |                                        |                                         |                                         |       |  |
|                                                                                                 |                                        |                                        |                                         |                                         |       |  |

# **3.WALLACE TREE MULTIPLIER OUTPUT**



# WALLACE TREE MULTIPLIER AREA

| Logic Utilization                              | Used | Available | Utilization | Note(s) |
|------------------------------------------------|------|-----------|-------------|---------|
| Number of Slice Flip Rops                      | 45   | 3,840     | 1%          |         |
| Number of 4 input LUTs                         | 144  | 3,840     | 3%          |         |
| Logic Distribution                             |      |           |             |         |
| Number of occupied Slices                      | 94   | 1,920     | 4%          |         |
| Number of Slices containing only related logic | 94   | 94        | 100%        |         |
| Number of Slices containing unrelated logic    | 0    | 94        | 0%          |         |
| Total Number of 4 input LUTs                   | 144  | 3,840     | 3%          |         |
| Number of bonded IOBs                          | 34   | 141       | 24%         |         |
| Number of BUFGMUXs                             | 1    | 8         | 12%         |         |
|                                                |      |           |             |         |

## WALLACE TREE MULTIPLIER DELAY

Timing Summary:

Speed Grade: -5

Minimum period: 1.483ns (Maximum Frequency: 674.377MHz) Minimum input arrival time before clock: 21.440ns Maximum output required time after clock: 6.216ns Maximum combinational path delay: No path found

Timing Detail: ------All values displayed in nanoseconds (ns)

#### COMPARISION TABLE: TABLE 1.

# Comparison of RCA and Reduced Complexity SQRT CSLA

| No. of bits | Method                       | Delay (ns) | % reduction |  |
|-------------|------------------------------|------------|-------------|--|
| 8           | Ripple Carry Adder (RCA)     | 17.585ns   | 31.2%       |  |
|             | Reduced Complexity SQRT CSLA | 12.083ns   |             |  |
| 16          | Ripple Carry Adder (RCA)     | 28.741ns   | 48.75%      |  |
|             | Reduced Complexity SQRT CSLA | 14.729ns   |             |  |
| 32          | Ripple Carry Adder (RCA)     | 51.052ns   | 60.9%       |  |
|             | Reduced Complexity SQRT CSLA | 19.958ns   |             |  |
| 64          | Ripple Carry Adder (RCA)     | 95.675ns   | 70.06%      |  |
|             | Reduced Complexity SQRT CSLA | 28.642ns   |             |  |

# TABLE 2.

# COMPARISION OF WALLACE TREE MULTIPLIER AND BI-RECODER MULTIPLIER.

| Parameters | FIR Filter Using<br>Wallace<br>tre<br>e multiplier | FIR Filter using Bi-<br>Recoder multiplier |
|------------|----------------------------------------------------|--------------------------------------------|
| LUTs       | 89                                                 | 67                                         |
| Slices     | 87                                                 | 46                                         |
| Delay (ns) | 4.644ns                                            | 4.357ns                                    |
| Power (W)  | 0.252w                                             | 0.219w                                     |

# GRAPHICAL REPRESENTATION FOR TWO MULTIPLIERS COMPARISION



POWER



#### DELAY



# **CONCLUSION:**

The proposed work present design of digital FIR filter using Verilog Hardware Description Language and is implemented using Xilinx 12.4 software . A Multiplier with SQRT CSLA is introduced in this project to increase the performance of MAC unit of digital FIR filter.Redundant logic functions of both traditional multipliers and adders are identified to increase the performance of MAC unit. Also Reduced complexity SQRT CSLA based Wallace tree Multiplier offers 6.4% reduction in silicon area and 36.58% reduction in delay and 29.95% reduction of power consumption than the Ripple carry adder based filter structure.Likewise the reduction of area delay and power is identified for Reduced complexity SQRT CSLA based Bi-Recoder Multiplier and the FIR Filter output using two different multiplier is compared.Thus the best adder multiplier combo is selected and FIR Filter design is done.The graphical representation and comparision table shown above clearly shows the efficient multiplier and adder.

#### ACKNOWLEDGEMENT

The author is grateful to the lab facility at Electronics and Communication Engineering department of RMK Engineering College, Chennai which provided excellent support to finish the work successfully.

## REFERENCES

[1] Basant Kumar Mohanty, Pramod Kumar Meher.(2016),"A High-Performance FIR Filter Architecture for Fixed and Reconfigurable Applications" IEEE Transactions on Very Large Scale Integration (VLSI) Systems . Volume: 24, Issue: 2,pages 444-452.

[2] Xin Lou, Ya JunYu, Pramod Kumar Meher. (2015)," New Approach to the Reduction of Sign-Extension Overhead for Efficient Implementation of Multiple Constant Multiplications" IEEE Transactions on Circuits and Systems I: Regular Papers .Volume: 62, Issue: 11, pages 2695-2705.

[3] N.Jhansi,B.R.B.Jaswanth.(2014),"Design and Analysis of High Performance FIR Filter using MAC Unit" International Journal of Advanced Research in Computer and Communication Engineering Vol. 3, Issue 11.

[4] AlJuffri, A. A. AlNahdi, M. M. Hemaid, A. A. AlShaalan, O. A. BenSaleh, M. S.Obeid, A. M. & Qasim, S. M.(2015), "ASIC realization and performance evaluation of scalable microprogrammed FIR filters using Wallace tree and Vedic multipliers", IEEE 15th International Conference on Environment and Electrical Engineering (EEEIC), pp. 1995-1998.

[5]V.NithishKumar,KoteshwaraRaoNalluri,G.Lakshminarayan an.(2015)," Design of area and power efficient digital FIR filter using modified MAC unit" 2<sup>nd</sup> IEEE International Conference on Electronics and Communication Systems

[6] Deepika ,Nidhi Goel.(2016)," Design of FIR Filter using reconfigurable MAC unit",3<sup>rd</sup> IEEE International Conference on Signal Processing and Integrated Networks.

[7] KokilaBhartiJaiswal,NithishKumarV,PavithraSeshadri,Laks hminarayanan G.(2015)," Low power wallace tree multiplier using modified full adder",3<sup>rd</sup> IEEE International Conference on Signal Processing ,Communication and Networking.

[8] AlJuffri, A. A. Badawi, A. S. BenSaleh, M. S. Obeid, A. M. & Qasim, S. M. (2015), "FPGA implementation of scalable microprogrammed FIR filter architectures using Wallace tree and Vedic multipliers", IEEE International Conference on Technological Advances in Electrical, Electronics and Computer Engineering (TAEECE), pp. 159-162.

[9] Chen, J. Chang, C. H. Fen, F. Ding, W. and Ding, J.(2014), "Novel Design Algorithm for Low Complexity Programmable FIR Filters Based on Extended Double Base Number System", IEEE Transaction on Circuits and Systems, Vol. 62, No.1, pp. 1-10.

[10] Chinnapparaj, S. and Somasundaraswari, D.(2015),"High Speed Multiplication and Accumulation (MAC) Design for Digital Fir Filter", Middle-East Journal of Scientific Research, Vol. 23, No. 4, pp: 750-755.

[11] Bhalke, S. Manjula, B. M. & Sharma, C. (2013), "FPGA implementation of efficient FIR Filter with quantized fixed point coefficients", IEEE International Conference on Emerging Trends in Communication, Control, Signal Processing & Computing Applications (C2SPCA), pp. 1-6.

[12] Bo, Z. and Xiuwei, T. (2011), "Design of a novel adaptive FIR filter based on FPGA", IEEE International Conference on Electronic Measurement & Instruments (ICEMI), Vol. 1, pp. 68-70.

[13] Anandi, V. Rangarajan, R. & Ramesh, M.(2013), "Low power VLSI compressors. In Green Computing, Communication and Conservation of Energy (ICGCE), International Conference on IEEE, pp. 231-236.

[14] Bakalis, D. Kalligeros, E.,Nikolos, D.,Vergos, H. T. & Alexiou, G.(2000), "Low power BIST for Wallace tree-based multipliers", IEEE First International Symposium on Quality Electronic Design, ISQED, pp. 433-438.

[15] Burian, A., & Takala, J.(2004), "VLSI-efficient implementation of full adder-based median filter", IEEE International Symposium on Circuits and Systems pp. 817-820.

[16] Chang, C. H. Chen, J. & Vinod, A. P.(2008) ,"Information theoretic approach to complexity reduction of FIR filter design", JEEE Transactions on Circuits and Systems I, Vol.55, No.8, pp. 2310-2321.

[17] Chang, H.M. Yang, J.S. Choi, J.P. and Lee, W.C.(2012), "Low-latency polyphase filter scheme based on the IIR filter for the OFDM repeater", IEEE on Wireless Telecommunications Symposium (WTS), pp. 1-1.

[18] Choi, K. & Song, M.(2001), "Design of a high performance 32× 32-bit multiplier with a novel sign select Booth encoder", IEEE International Symposium on Circuits and Systems, ISCAS, Vol. 2, pp. 701-704.

[19] Deshmukh, R. M. & Keote, R.(2015), "Design of polyphase FIR filter using bypass feed direct multiplier", International Conference on Communications and Signal Processing (ICCSP), pp. 1640-1643.

[20] Gokhale, G. R. & Bahirgonde, P. D. (2015), "Design of Vedic-multiplier using area-efficient Carry Select Adder" ,IEEE International Conference onAdvances in Computing, Communications and Informatics (ICACCI), pp. 576-581.

[21] Gowrishankar, V. Manoranjitham, D. and Jagadeesh, P. (2013), "Efficient FIR Filter Design Using Modified Carry Select Adder & Wallace Tree Multiplier", International Journal of Science Engineering and Technology Research (IJSETR), Vol. 2, No. 3, pp: 703-711.

[22] Gunasekaran, K. and Manikandan, M. (2014), "Area Efficient Design of Reconfigurable FIR filters using Russian Peasant Multiplier with Modified Carry Select Adder"