

*International Journal of* Intelligent Engineering & Systems

http://www.inass.org/

# Low Area VLSI implementation of CSLA for FIR Filter Design

Bommalingaiah Nanjappa Mohan Kumar<sup>1</sup>\*

Rangaraju Hulivangala Gangappa<sup>1</sup>

<sup>1</sup>Government Sri Krishnarajendra Silver Jubilee Technological Institute, India \* Corresponding author's Email: mohankumarbn1@gmail.com

**Abstract:** Finite Impulse Response (FIR) filter is a major building block in Digital Signal Processing (DSP) system. The conventional FIR filters are implemented by utilizing a normal adder, which consumes more area. Minimizing the hardware utilization of the adder is a significant challenge in FIR filter design. Generally, complexity of the FIR filter is dominated by the adder. So, an efficient FIR filter was designed with the help of Carry Select Adder (CSLA) to reduce area and hardware complexity of the accumulation block. The area has been reduced by using CSLA that is very significant idea while designing a less area filter. The proposed method named as Low Area- CSLA-FIR (LA-CSLA-FIR) filter. The LA-CSLA-FIR filter was implemented in Xilinx Field Programmable Gate Array (FPGA) on different devices such as Virtex-4, Virtex-5 and Virtex-6 by using Verilog code. The experimental results of the LA-CSLA-FIR filter design reduced average the FPGA device utilization: 15.38 % of LUTs, 8% of flip flops and 8% of slices on Virtex-5 compared to existing filter designs.

**Keywords:** Carry select adder, Digital signal processing, Finite impulse response, Field programmable gate array, Xilinx tool.

# 1. Introduction

Digital filters are mostly used in communication or DSP systems like matched filtering, pulse shaping and channel equalization because of their guaranteed linear phase and stability [1, 2]. Generally, there are two types of techniques, those are, Sparse FIR filters and multiplier fewer FIR filters, to design FIR filters that consume low power and hardware cost is less [3-55]. An FIR filter in the transposed form can be classified into the algorithmic delay, SAs and multiplier block [6]. The arithmetic operators required more area which is the main limitations for designing a digital filter in the DSP system [7]. The FIR filter is always stable because it has no feedback and linear phase response [8]. Traditionally, the fixed point coefficients designed and presented in binary forms and integer forms. The fixed point coefficients have been analysed by a Common Sub-Expression Sharing (CSS) to reduce the hardware complexity. The CSS between coefficients processed and identified only once. Hence, the number of adders can be reduced [9].

Digital filters play an important role to remove unwanted noise. The digital filters are classified into two types, those are, FIR and Infinite Impulse Response (IIR) filters [10]. The FIR filters have certain advantages over IIR filters. These FIR filter counterparts are stable, which particularly used for different applications [11]. The implementation or design of FIR filter carried out by using MATLAB and FPGA platform [12, 13].

Power based optimization methods used for Linear Phase FIR filter design. These FIR filter design has achieved low power consumption. But, this technique needs more area for hardware designing [14]. Pipeline architecture for adaptive FIR filter implemented based on Distributed Arithmetic (DA). This filter design used a fast bit clock for carrying Save Accumulation (CSA). Slower clock cycle used for the remaining operations in order to reduce the power consumption [15]. The major contribution of LA-CSLA-FIR filter are stated as follows. The CSLA is utilized in accumulator circuit for addition operation, which reduce the area of the filter architecture. The

International Journal of Intelligent Engineering and Systems, Vol.12, No.4, 2019

advantage of proposed method is that it consumes less power and area, so it is more suitable in digital and medical image applications for speech enhancement, noise reduction and so on. The LA-CSLA-FIR filter has been implemented on FPGA platform using Verilog code. The FPGA is a much suitable platform for digital Very Large Scale Integrated (VLSI) design because reusability, more flexibility, low power.

This research work is composed as follows: Section 2 presents a literature survey of recent papers based on FIR filter. In section 3, brief explanation about the proposed FIR filters design is presented. Section 4 gives the comparative experimental result of a proposed LA-CSLA-FIR and convolution methods. The conclusion made in section 5.

#### 2. Literature survey

The researchers have suggested several methods for FIR filter design. In this section, a brief evaluation of a few significant contributions on the FIR filter design is presented.

S.Y. Park and P.K. Meher [16] presented the DA based reconfigurable FIR filter design on implemented in FPGA and Application Specific Integrated Circuit (ASIC) platform whose FIR filter coefficients change during run-time. In this paper, a shared LUT design was implemented to compute the DA. Instead of employing separate registers to store the feasible outcomes of the Partial Inner Products (PIP) for processing DA in various bit locations, registers, which are shared by the DA units for bit slices of different weight. The FIR filter designed based on complex arithmetic operations, so it occupied more area in FIR filter architecture.

Lou et al. [17] presented a Pre-Structural Adder (PSA) for FIR filter design, the filter structure and half of the long word length SAs was interchanged by adders, which have shorter word length. The filter coefficients were carefully grouped, which is the main advantage of the symmetric impulse response of linear FIR filters. The area-delay and power delay improved in this work. The Wallace Tree Adder (WTA) has an irregular structure which is the major limitation of this method.

Jiajia Chen et al. [18] proposed the FIR filter, implemented by using Genetic Algorithm (GA) to solve the integer programming limitation at quadratic computational complexity by refining the search space for detecting an optimized solution to achieve frequency response specifications. A number of structural adders and registers are required to sum the delayed partial sums and filter coefficient in cascaded Tap Delay Accumulate (TDA), it increases the area of the filter design.

Mittal et al. [19] designed a low power and high speed 16-order FIR filter. This method reduced filter's area, power and delay by using adder, shifter WTM and Vedic multiplier algorithms. To reduce the computational complexity of the filter, the coefficients were denoted in a conical signed digit. The FIR filter with Han Carlson Adder (HCA) occupied less area. But, delay increased between the adders: Ripple Carry Adder (RCA), Kogge Stone Adder (KSA), Brent Kung Adder (BKA) and Ladner Fischer Adder (LFA).

Chen et al. [20] proposed the FIR filter design that was implemented using Sensitivity Driven (SA) algorithm. The SA algorithm required a less area and power to design the FIR filter. Its logic synthesis and power simulation outputs with other FIR filter coefficient synthesis techniques. In experimental results, the proposed method reduced 33.9% to 54.8 % of area and power consumption. The proposed method is not much suitable for real time application because reduction of area is not efficient.

The LA-CSLA-FIR methodology is implemented for improving the performance of the FIR filter design and to overcome the abovementioned problems.

# 3. Low Area–CSLA–FIR filter design methodology

The conventional FIR filter consists of a number of LUT, bit-shift register, scalable accumulator, which includes adder, subtractor and registers. The FIR filter design requires filter coefficient adder and multipliers that increases the size of the filter design. Thus CSLA approach used in LA-FIR filter design reduces the area, hardware utilization and increases system speed. The principle of the LA-CSLA-FIR method is briefly described in the following sections.

#### 3.1 Proposed finite impulse response architecture

This research work gives a brief explanation of the basic structure and hardware characteristics of the FIR digital filter. An-tap FIR filter can be expressed in the general form as presented in Eq. (1).

$$u(n) = \sum_{l=0}^{N-1} h(i)v(n-i), n = 0, 1, 2, 3, \dots, \infty$$
(1)

Here, v(n) is an infinite length input sequence, h(i)



Figure. 1 Block diagram of the LA-CSLA-FIR filter



Figure. 2 Structure of the carry select adder

represents coefficients of the N-FIR filter length and u(n) represents the filter output. N-1 adders and Nmultiplier are required to implement an N-tap FIR filter. Fig. 1 shows the block diagram of the LA-CSLA-FIR filter architecture. It consists of the Random Access Memory (RAM), Read Only Memory (ROM), control unit, PE, address generator, accumulator, CSLA and register. The working principle of the efficient FIR filter design is described below.

The control unit module generates the clock and reset signal. The input data reader block provides the input data. The coefficient ROM modules store the coefficients and data RAM modules stores the data. The filter module is used for computing the filter output. The address generator generates the data address and it helps to read the data from ROM to obtain the filter coefficient and input data. The input data reader module provides the input data; continuous operations are carried out as follows. Initially, compute the memory address of this new data, enable the data RAM module and store the data in the RAM according to the address. In the second stage, read the coefficient value from the ROM module one by one and the information used for getting the filter results are ready. In the third stage, enable the data RAM module and read the corresponding data from the data RAM according to the information. The PE is one of the major blocks in the FIR filter design. The PE output is given to the Accumulator. In this research, the accumulator is designed based on CSLA. The register holds the value of zero at the initial stage. At first clock cycle, the multiplication operation is used to multiply an input data with coefficient. Now, the result of the filter is given to the input of the accumulator. In second clock cycle, the register holds first clock cycle filter output. The output of the initial clock cycle is given to the input of the CSLA that helps to generate the filter output which is stored in the register.



Figure. 3 Block diagram of the CSLA operation

#### 3.2 Carry select adder

The block diagram of CSLA is shown in Fig. 2. The main advantage of the CSLA is less propagation delay because it occupies less area in the FIR filter architecture. The CSLA is realized by the parallel stage that results from many pairs of Ripple Carry Adder (RCA). The RCAs generate their temporary sum and carry for the CSLA architecture by considering the carry input to be zero and one respectively. In Fig. 2, each of the *n* bit RCA structures, *n* number of single bit Full Adder (FA) is expressed in Eq. (2) and (3).

$$S = a \oplus b \oplus C_{in} \tag{2}$$

$$C_{out} = a.b + b.C_{in} + a.C_{in}$$
(3)

Here, *S* represents sum,  $C_{out}$  is carry output,  $C_{in}$  is carry input, *a* and *b* is input bits. The CSLA employs less number of the logic gates to drive the Binary-to-Excess-1 Converter (BER) logic instead of n-number full adder. The CSLA architecture employs the BER and RCA with  $C_{in} = 0$ . The CSLA operation is described in below section.

#### 3.2.1. Carry select adder operation

The CSLA filter operation is described as shown in Fig. 3, which is consist of the four Full Adder (FA) design, RCA, Binary-to- Excess Converter (BEC) and MUX. In CSLA operation, input: a=0100 and b=0101. The initial stage, A0=0, B0=0, B1=0 are given to the input of the RCA circuit has two FA for Cin= 0. The first output sum is one and carry is zero, which given to the input of second FA. The second FA sum is zero and carry also zero, this carry output is given to the input of the Mux, while the RCA output is 01. In the second stage, the A2=1, B2=1 are given to the first FA, it produces sum is one and carry is zero. In second FA, A3=0, B3=0 and first FA carry one is given to the input of the second FA, sum is one and carry is zero. This FA outputs are given to the input of the BEC. The BEC main operation is one-bit incremental.

If the input of the BEC is 10 and it outputs as 11. The BEC and second stage FA output is given to the input of MUX. If the selection line is zero when the MUX produces output is 10 or if the selection line is one when the MUX produces output is 11. Finally, the concatenation of the first stage output and MUX output, the output is 1001.

#### 4. Results and discussions

The proposed work mainly defines simulation and design of the FIR filter. This LA-CSLA-FIR filter design is implemented using Xilinx FPGA based different Virtex devices like Virtex-4, Virtex-5 and Virtex-6 by using Verilog code. An electronic design is applied in Verilog for verification through simulation, timing analysis and logic synthesis.

In FPGA implementation, LUT, slice, flip-flop, frequency, and power consumption are analysed for digital filter designs. The FPGA provides a configurable design through an array of adjustable logic modules are interrelated by programmable routing resources and enclosed by input and output block. The LA-CSLA-FIR filter design is verified in the Modelsim tool. The experimental results of existing and proposed method is tabulated shown in Table 1. In this research, both existing and proposed method is implemented on FPGA Virtex devices for comparative analysis. The FIR filter has been designed based on ABC algorithm. The algorithm mainly used to reduce power consumption in the filter design. The trade-off between Pass Band Ripple (PBR), Stop Band Ripple (SBR) and the power consumption avoids the use of the classical single objective based optimization algorithms. The performance of the LA-CSLA-FIR filter is analysed for different lengths in terms 8 tap, 16 tap and 32 tap. The LA-CSLA-FIR filter design is suitable to Virtex 4-xc4vfx12, Virtex 5-xc5vlx20T and Virtex 6xc6vcx75t. The Virtex 5- xc5vlx20T is considered as high configuration device, which provided better performance compared to Virtex-4 and Virtex-6. In PSA-FIR [17], the Wallace Tree Adder (WTA) has an irregular structure that requires more execution time. The hardware utilization is more (slice and four input slices) in FIR-ABC algorithm [21]. To

International Journal of Intelligent Engineering and Systems, Vol.12, No.4, 2019

overcome this problem, the proposed FIR designed based on efficient CSLA. In this research the CSLA is used in the accumulator process for adding the processing element output stage by stage, it occupies

Table 1. Performance comparison results of different FPGA for PSA-FIR [17] and proposed LA-CSLA-FIR design

| Target<br>FPGA        | Filter Des      | igns   | LUT       | Flip-Flop | Slice    | Frequency | Power<br>(W) |
|-----------------------|-----------------|--------|-----------|-----------|----------|-----------|--------------|
|                       | PSA-FIR         | 8-tap  | 168/10944 | 99/10944  | 144/5472 | 240.5     | 0.174        |
|                       | [17]            | 16-tap | 180/10944 | 100/10944 | 126/5472 | 240.5     | 0.174        |
|                       | [1/]            | 32-tap | 193/10944 | 101/10944 | 153/5472 | 240.5     | 0.175        |
| Virtex 4              | FIR-ABC         | 8-tap  | 170/10944 | 104/10944 | 148/5472 | 239.33    | 0.174        |
| xc4vfx12              | Algorithm       | 16-tap | 178/10944 | 95/10944  | 122/5472 | 239.33    | 0.174        |
|                       | [21]            | 32-tap | 195/10944 | 99/10944  | 147/5472 | 239.33    | 0.174        |
|                       | LA-CSLA-        | 8-tap  | 143/10944 | 79/10944  | 101/5742 | 241.09    | 0.174        |
|                       | FIR             | 16-tap | 154/10944 | 80/10944  | 123/5742 | 241.09    | 0.174        |
|                       | ГIК             | 32-tap | 167/10944 | 81/10944  | 121/5742 | 241.09    | 0.175        |
|                       |                 | 8-tap  | 69/12480  | 21/12480  | 27/3120  | 174.99    | 0.322        |
|                       | PSA-FIR         | 16-tap | 71/12480  | 22/12480  | 25/3120  | 174.99    | 0.322        |
|                       | [17]            | 32-tap | 74/12480  | 23/12480  | 23/3120  | 174.99    | 0.322        |
| Vinter 5              | FIR-ABC         | 8-tap  | 72/12480  | 30/12480  | 25/3120  | 174.99    | 0.322        |
| Virtex 5<br>xc5vlx20T | Algorithm       | 16-tap | 72/12480  | 28/12480  | 24/3120  | 174.99    | 0.322        |
|                       | [21]            | 32-tap | 78/12480  | 25/12480  | 23/3120  | 174.99    | 0.322        |
|                       |                 | 8-tap  | 62/12480  | 21/12480  | 24/3120  | 166.09    | 0.322        |
|                       | LA-CSLA-        | 16-tap | 64/12480  | 22/12480  | 22/3120  | 166.09    | 0.322        |
|                       | FIR             | 32-tap | 66/12480  | 23/12480  | 21/3120  | 166.09    | 0.322        |
| Virtex6<br>xc6vcx75t  |                 | 8-tap  | 104/46560 | 80/93120  | 57/11640 | 247.39    | 1.293        |
|                       | PSA-FIR<br>[17] | 16-tap | 108/46560 | 81/93120  | 54/11640 | 247.39    | 1.293        |
|                       | [1,]            | 32-tap | 111/46560 | 83/93120  | 42/11640 | 247.39    | 1.293        |
|                       | FIR-ABC         | 8-tap  | 103/46560 | 82/93120  | 60/11640 | 247.39    | 1.293        |
|                       | Algorithm       | 16-tap | 108/46560 | 84/93120  | 45/11640 | 247.39    | 1.293        |
|                       | [21]            | 32-tap | 115/46560 | 82/93120  | 61/11640 | 247.39    | 1.293        |
|                       | LA-CSLA-        | 8-tap  | 100/46560 | 84/93120  | 42/11640 | 247.68    | 1.293        |
|                       | FIR             | 16-tap | 102/46560 | 80/93120  | 52/11640 | 247.68    | 1.293        |
|                       |                 | 32-tap | 104/46560 | 79/93120  | 40/11640 | 247.68    | 1.293        |

■ PSA -FIR [17] ■ FIR- ABC Algorithm [21] ■ LA-CSLA-FIR



Figure. 4 Comparison graph of LUTs utilization in Virtex-5 for existing and LA-CSLA-FIR method



Figure. 5 Comparison graph of Flip flops utilization in Virtex-5 for existing and LA-CSLA-FIR method



Figure. 6 Comparison graph of slices utilization in Virtex-5 for existing and LA-CSLA-FIR

less area compared to the normal adder design. Because the proposed FIR design obtained less device utilization compared to FIR filter designs. Hence, this proposed FIR filter is much suitable for noise removal process in the medical area. The proposed method obtained the 15.38% of area, 8 % of flip flop and 8 % of slice in the on Virtex-5. Fig. 4, 5 and 6 illustrates the comparison graph of performance of Virtex-5 FPGA device for existing and LA-CSLA-FIR filter designs, it is evident that the number of LUTs, flip-flops and slices are minimized on Virtex-5 by using LA-CSLA approach to design the FIR filter compared to the PSA-FIR [17] and FIR-ABC algorithm [21].

Fig. 7 shows the output waveform of the complete FIR design using Modelsim, which validate the correctness of the FIR filter design. In

Fig. 7, red colour represents the filter input and coefficient, violet colour represents the filter output and the accumulator results. Here, the FIR filter input data 36 is multiplied with coefficient 1 that result stored iny. Input and coefficient considered as 8-bit and output considered as 16-bit. In the first stage, the Acc (Accumulator) has zero added with y, the results then get stored in the next clock cycle and the Acc gives an FIR output. The power performance of the Virtex-4 for LA-CSLA FIR filter design is taken from the Xilinx tool. The Virtex-4 device consumes less power (0.174 w) compared to the Virtex-5 (0.332w) and Virtex-6 (1.293w).

Fig. 8 shows the performance of the power in Virtex-4 for LA-CSLA-FIR., which is taken from Xilinx tool. The following steps are to get this

International Journal of Intelligent Engineering and Systems, Vol.12, No.4, 2019 DOI: 10.22266/ijies2019.0831.10

## Received: March 13, 2019

| Wave - Default ==================================== |                   |           |            |                |               |                 |                |                 |                |                  |         |                |                |                      |           |                |
|-----------------------------------------------------|-------------------|-----------|------------|----------------|---------------|-----------------|----------------|-----------------|----------------|------------------|---------|----------------|----------------|----------------------|-----------|----------------|
| <b>ê</b> •                                          | Msgs              |           |            |                |               |                 |                |                 |                |                  |         |                |                |                      |           |                |
|                                                     | 1'h0              |           |            |                |               |                 |                |                 |                |                  |         |                |                |                      |           |                |
| 👍 /top_ext_tb/u0/en                                 | 1'h1              |           |            |                |               |                 |                |                 |                |                  |         |                |                |                      |           |                |
| 🔷 /top_ext_tb/u0/rst_cnt                            | 1'h0              |           |            |                |               |                 |                |                 |                |                  |         |                |                |                      |           |                |
| 🖅 🕂 top_ext_tb/u0/addr                              | 4'd2              | 4'd6      | (4'd7      | ( <b>4'd</b> 8 | <u>, 4'd0</u> | (4'd1           | ( <b>4</b> 'd2 | (4'd3           | ) 4'd4         | (4'd5            | 4'd6    | ( <b>4</b> 'd7 | ( <b>4'</b> d8 | ( <mark>4'd</mark> 0 | (4d1      | ( <b>4</b> 'd2 |
| 🖅 🕂 http://top_ext_tb/u0/data_out                   | 8'd36             | 8'd0      |            |                |               |                 | 8'd36          | 8'd129          | 8'd9           | 8'd99            | 8'd13   | 8'd141         | 8'd101         | 8'd18                |           |                |
| ₽                                                   | 8'd1              | 8'd5      | 8'd6       | 8'd7           | 8'd8          | 8'd0            | 8'd1           | 8'd2            | 8'd3           | 8'd4             | 8'd5    | 8'd6           | 8'd7           | 8'd8                 | 8'd0      | 8'd1           |
| 🖅 🔶 /top_ext_tb/u0/y                                | 16'd36            | 16'd0     |            |                |               |                 | 16'd36         | 16'd258         | 16'd27         | 16'd396          | 16'd65  | 16'd846        | 16'd707        | 16'd144              | 16'd0     | 1              |
| 🗄 🔶 /top_ext_tb/u0/Acc                              | 16'd0             |           |            |                |               |                 | 16'd0          | 16'd36          | 16'd294        | 16'd321          | 16'd717 | 16'd782        | 16'd1628       | 16'd2335             | 16'd2479  | 9              |
| ₽                                                   | 8'd118            | -8'd115   | (8'd101    | (8'd18         | (8'd1         | <u>(8'd13</u>   | (8'd118        | (8'd61          | <u>(-8'd19</u> | <u>) -8'd116</u> | (-8'd7  | (-8'd58        | (-8'd59        | ) -8'd86             | (-8'd27   | (8'            |
| 💠 /top_ext_tb/u0/rst                                | 1'h0              |           |            |                |               |                 |                |                 |                |                  |         |                |                |                      |           |                |
| 🔷 /top_ext_tb/u0/r_w                                | 1'h0              |           |            |                |               |                 |                |                 |                |                  |         |                |                |                      |           |                |
| 🖅 🖓 /top_ext_tb/u0/u3/RAM                           | 8'h00 8'h24 8'h81 | 8'h00 8'. | . (8'h00 8 | ' ) 8'h00 8    | ' ) 8'h00     | 8'h24 8'h81 8'h | 09 8'h63 8'h0  | d 8'h8d 8'h65 8 | 3'h12          |                  |         |                |                |                      | 8 haa 8'. | (8'            |



| A                   | В             |   | C D     | E          | F             | G           | Н               | I. | J      | К         | L           | М           | N           |
|---------------------|---------------|---|---------|------------|---------------|-------------|-----------------|----|--------|-----------|-------------|-------------|-------------|
| Device              |               |   | On-Chip | Power (W)  | Used          | Available   | Utilization (%) |    | Supply | Summary   | Total       | Dynamic     | Quiescent   |
| Family              | Virtex4       |   | Clocks  | 0.007      | 1             |             |                 |    | Source | Voltage   | Current (A) | Current (A) | Current (A) |
| Part                | xc4vfx12      |   | Logic   | 0.000      | 142           | 10944       | 1               | ]  | Vccint | 1.200     | 0.078       | 0.006       | 0.072       |
| Package             | sf363         |   | Signals | 0.000      | 244           |             |                 |    | Vccaux | 2.500     | 0.031       | 0.000       | 0.031       |
| Temp Grade          | Commercial    | • | DSPs    | 0.000      | 1             | 32          | 3               |    | Vcco25 | 2.500     | 0.001       | 0.000       | 0.001       |
| Process             | Typical       | • | DCMs    | 0.000      | 0             | 4           | 0               |    |        |           |             |             |             |
| Speed Grade         | -12           |   | 10s     | 0.000      | 26            | 240         | 11              |    |        |           | Total       | Dynamic     | Quiescent   |
|                     |               |   | Leakage | 0.167      |               |             |                 |    | Supply | Power (W) | 0.174       | 0.007       | 0.167       |
| Environment         |               |   | Total   | 0.174      |               |             |                 |    |        |           |             |             |             |
| Ambient Temp (C)    | 50.0          |   |         |            |               |             |                 |    |        |           |             |             |             |
| Use custom TJA?     | No            | • |         |            | Effective TJA | Max Ambient | Junction Temp   |    |        |           |             |             |             |
| Custom TJA (C/W) NA |               |   | Thermal | Properties | (C/W)         | (C)         | (C)             |    |        |           |             |             |             |
| Airflow (LFM)       | 250           | • |         |            | 14.7          | 82.4        | 52.6            |    |        |           |             |             |             |
|                     |               |   |         |            |               |             |                 |    |        |           |             |             |             |
| Characterization    |               |   |         |            |               |             |                 |    |        |           |             |             |             |
| PRODUCTION          | v1.0,02-02-08 | 3 |         |            |               |             |                 |    |        |           |             |             |             |
|                     |               |   |         |            |               |             |                 |    |        |           |             |             |             |

Figure. 8 Performance of Power in Virtex-4 device for LA-CSLA- FIR Design



Figure. 9 RTL schematic of the LA-CSLA-FIR Design



Figure. 10 RTL Schematic of CSLA Design



Figure. 12 An internal block of the Top module for LA-CSLA-FIR filter



Figure. 11 Top module of LA-CSLA-FIR filter

power results, synthesize--> implementation---> place and route--> analyze power distribution from power analyzer. The RTL schematic of the LA-CSLA-FIR filter design is shown in Fig. 9 and the RTL schematic of the CSLA architecture is shown in Fig. 10. The schematics are taken from simplify pro tool by using Verilog code. There is separate code for each block: ROM, RAM, an address generator, PE and CSLA and decoder. Generally, the input value performs multiplication operation with a coefficient, which produces the output in the Acc. The RTL schematic is used for verified the proposed FIR filter design.

The generation of 8-bit coefficient value is generated in the addr\_gen as shown in Fig. 9. The 8bit input values are randomly generated in the RAM module. This 8-bit input and coefficient are given to the input of the PE which produces 16-bit output. In Accumulator (Acc), 16-bit PE output with initial value is added and 16-bit value is stored in the register module, which is given to the feedback of Acc. In final clock cycle, the output of PE is added in Acc circuit by using CSLA, which provides the output of filter. The proper CLSA design is verified, which is shown in Fig. 10. The top module of the LA-CSLA-FIR design is shown in Fig. 11. An internal block of the top module is shown in Fig. 12, which is taken from the Xilinx tool. The proposed architecture is verified without any error of the architecture design represented by Fig. 12.

## 5. Conclusion

In this paper, the low area FIR filter is designed by using efficient CLSA. The CSLA approach is used in the accumulator for addition the processing element output stage by stage, which has reduced the area in proposed FIR design compared to conventional FIR designs. The proposed LA-CSLA-FIR filter design is coded using Verilog and implemented in FPGA employing Xilinx tool targeting Virter-4, Virtex-5 and Virtex-6 devices. In this research, the Virtex 5- xc5vlx20T is considered as high configuration device, so it provided better performance compared to Virtex-4 and Virtex-6. The proposed FIR filter obtained 15.38% of area, 8 % of flip flop and 8 % of slice in the on Virtex-5. Hence, the overall performance of area and power of the proposed FIR filter implementation is superior than existing filter designs. In future work, FIR filter

International Journal of Intelligent Engineering and Systems, Vol.12, No.4, 2019

design will perform based on optimal multiplier to further moderate FPGA utilization.

## References

- C.Y. Yao and C.L. Sha, "Fixed-point FIR filter design and implementation in the expanding subexpression space", In: *Proc. of IEEE International Symposium on Circuits and Systems*, pp.185-188, 2010
- [2] S. Bhattacharjee, S. Sil, and A. Chakrabarti, "Evaluation of power efficient fir filter for FPGA based DSP applications", *Procedia Technology*, Vol.10, pp.856-865, 2013.
- [3] W.B. Ye and Y.J. Yu, "Bit-level multipliers less FIR filter optimization incorporating sparse filter technique", *IEEE Transactions on Circuits and Systems I: Regular Papers*, Vol.61, No.11, pp.3206-3215, 2014.
- [4] A. Jiang, H.K. Kwan, and Y. Zhu, "Peak-errorconstrained sparse FIR filter design using iterative SOCP", IEEE Transactions on Signal Processing Vol.60, No.8, pp.4035-4044, 2012.
- [5] K.H. Dangra and G.S. Gawande, "Efficient design and implementation of multiplierless FIR filter", In: Proc. of IEEE International Conference on Computing Communication Control and automation, pp.1-5, 2016
- [6] M. Faust, M. Kumm, C. H. Chang, and P. Zipf, "Efficient structural adder pipelining in transposed form FIR filters", *IEEE International Conference on Digital Signal Processing*, pp.311-314, 2015.
- [7] R. Lehto, T. Taurén, and O. Vainio, "Recursive FIR filter structures on FPGA", *Microprocessors and Microsystems*, Vol.35, No.7, pp.595-602, 2011
- [8] E. Ozpolat, B. Karakaya, T. Kaya, and A. Gulten, "FPGA-based digital Filter Design for Biomedical Signal", In: Proc. of XII International Conference on Perspective Technologies and Methods in MEMS Design, pp.70-73, 2016.
- [9] C.Y. Yao, W.C. Hsia, and Y.H. Ho, "Designing Hardware-Efficient Fixed-Point FIR Filters in an Expanding Subexpression Space", *IEEE Transactions on Circuits and Systems*, Vol.61, No.1, pp.202-212, 2014
- [10] S.S. Rajput and S.S. Bhadauria, "Implementation of FIR filter using adjustable window function and its application in speech signal processing", *International Journal of Advances in Electrical and Electronics Engineering*, Vol.1, pp.158-164, 2012

- [11] J.P. Digvijay and P.C. Bhaskar, "FPGA Based FIR Filter Design for Enhancement of ECG Signal by Minimizing Base-line Drift Interference", *International Journal of Current Engineering and Technology*, Vol.3, No.5, pp.1775-1778, 2013.
- [12] C. Renuka and A. M. Guna Sekhar, "Design and Implementation of Parallel Micro-programmed FIR Filter Using Efficient Multipliers on FPGA", *International Journal of Scientific Research in Science and Technology*, Vol.4, No.2, pp.234-238, 2018.
- [13] C.P. Thanh, B.X. Hoang, Q.T.C.L. Duc Tran, A, Ho, "Implementation of a short word length ternary FIR filter in both FPGA and ASIC," In: *Proc. of IEEE International conference on recent advances in signal processing*, *telecommunication and computing*, 2018.
- [14] M. Alawad and M. Lin, "Fir filter based on stochastic computing with reconfigurable digital fabric", *IEEE Annual International Symposium* on Field-Programmable Custom Computing Machines, pp. 92-95, 2015.
- [15] B. Doss, K. Soundararajan, and Y. Narasimha Murthy, "Low-Power and Low-Area Adaptive FIR Filter Based on DA Using FPGA", *International Journal of Scientific Research and Management*, Vol.3, No.1, pp.2010-2014, 2015.
- [16] S.Y. Park and P.K. Meher, "Efficient FPGA and ASIC realizations of a DA-based reconfigurable FIR digital filter", *IEEE Transactions on Circuits and Systems II: Express Briefs*, Vol.61, No.7, pp.511-515, 2014.
- [17] X. Lou, P.K. Meher, Y. J. Yu, and W. Ye, "Novel Structure for Area-Efficient Implementation of FIR Filters", IEEE Transactions on Circuits and Systems II: Express Briefs, Vol.64, No.10, pp.1212 – 1216, 2016.
- [18] C. Jiajia, C.H. Chang, J. Ding, R. Qiao, and M. Faust, "Tap Delay-and-Accumulate Cost Aware Coefficient Synthesis Algorithm for the Design of Area-Power Efficient FIR Filters", *IEEE Transactions on Circuits and Systems Part1 Regular Papers*, Vol.65, No.2, pp.712-722, 2018.
- [19] A. Mittal, A. Nandi, and D. Yadav, "Comparative study of 16-order FIR filter design using different multiplication techniques", *IET Circuits, Devices & Systems*, Vol.11, No.3, pp.196-200, 2017.
- [20] J. Chen, J. Tan, C.H. Chang, and F. Feng, "A new cost-aware sensitivity-driven algorithm for the design of FIR filters", *IEEE Trans Circuits Systems I*, Vol.99, pp.1-11, 2016.

International Journal of Intelligent Engineering and Systems, Vol.12, No.4, 2019

DOI: 10.22266/ijies2019.0831.10

Received: March 13, 2019

[21] A.K. Dwivedi, S. Ghosh, and N.D. Londhe, "Low power FIR filter design using modified multi-objective artificial bee colony algorithm", *Engineering Applications of Artificial Intelligence*, Vol.55, pp.58-69, 2016.