European Journal of Advances in Engineering and Technology, 2018, 5(5): 344-349



**Research Article** 

ISSN: 2394 - 658X

# Survey on Approximate Multipliers for Image Processing

O Vignesh<sup>1</sup>, H Mangalam<sup>2</sup> and K AnjuBala<sup>3</sup>

<sup>1</sup>Teaching Fellow, Department of Electronics Engineering, MIT Campus, Anna University Chennai, India–600 044 <sup>2</sup>Professor, Department of ECE, Sri Ramakrishna Institute of Technology, Coimbatore, India–641010 <sup>3</sup>PG Scholar, Department of Electronics Engineering, MIT Campus, Anna University, Chennai, India–600 044 vicky6058@gmail.com,

### ABSTRACT

Approximate computing has become ravishing approach for designing high performance and low energy consumption with limited loss in accuracy for many digital logic designs. Since there is a trade-off between speed/power with accuracy shown through previous research works. The demand of high speed and power efficiency as well as the feature of error tolerant applications has driven the development of approximate arithmetic circuits. The most important arithmetic modules in a processor are adder and multipliers to determine the performance for many computing tasks in today's digital systems. A comparative study of efficient algorithm and architecture for various approximate multipliers are presented in this survey. Approximate multiplier reduces the transistor count, power consumption, delay and it provides high speed output. The result shows that the proposed Approximate multiplier design accomplish significant reductions in power dissipation, delay and transistor count compared to an exact accurate multiplier. An application in image processing is performed, where peak signal to noise ratio of the image is analyzed. The image quality of the approximate multiplier shows satisfactory result compared to an accurate multiplier.

Key words: Approximate multiplier, Accuracy, High speed, Low power, Error tolerance.

### **INTRODUCTION**

Arithmetic computing [1] is very important in the design of digital processors and application oriented systems. Arithmetic unit such as adder, multiplier and divider are the basic key components in digital circuits. Multipliers are the core components of arithmetic circuits in Digital Signal Processing (DSP) [2] and it has extremely high demand on its performance and low power consumption. It is difficult for the multipliers to improve performance and also to reduce power consumption with full accuracy. So we need to propose many error tolerant application such as multimedia and image processing application by approximate. Approximate computing can improve performance and energy efficiency with reduced design complexity for many error tolerant applications in arithmetic operation. Arithmetic circuits are the building blocks for the error resilience system and approximate design in recent research, without affecting the performance of the end user. The critical requirement in digital signal processing are minimizing delay, power consumption and significantly improve the performance of the processor. There are different levels of design hierarchy for reducing energy consumption and area with better accuracy using approximate digital circuits design.



Fig. 1 4x4 bit-array multiplier structure

Designing multiplier is one of the most significant arithmetic units used in digital systems such as digital signal processing, FIR filters and portable devices. The design circuit logic is get tested and it should be fault free, if the error occurs means redundant circuitry are kept on-chip which replace the faulty parts [3]. Fig.1 shows the structure of 4 bit array multiplier. Array multiplication is performed by shift and add algorithm. Sequential multipliers have less silicon area, low speed and low power consumption compared to the parallel multipliers. The main disadvantage of multiplier are more silicon area, low speed and high power consumption, which negatively affect the performance of the digital systems. To eliminate the disadvantages of the design, Digital designers have proposed a new technique for optimization. Since there is trade off among area, speed and power consumption; a new idea proposed is approximate multiplier method. This will help in optimization of these three parameters. Approximation technique is the main resources of the error resilience [4] application by improving speed and power with limited loss in its accuracy. Approximation process eliminates some parts of arithmetic circuits by changing mid-calculation using different algorithm to increase the power consumption and performance, reduce timing delay and implementation area using image processing algorithm [5] to tolerate the errors.

# APPROXIMATE MULTIPLIERS

Precise multipliers take more delay and high power consumption to get accurate result whereas the approximate multiplier has less logical elements compared to the existing multiplier and also it consumes less power consumption with limited loss of accuracy. The approximate multiplier circuit generates the exact output on most input combination and incurs approximate error on the rest resulting in variable circuit latency. Approximate multiplier focus mainly for accumulation of partial products, which is crucial for power consumption.

### **Power and Area Efficient Approximate Multipliers**

The partial product is the result of multiplying the multiplicand to the multiplier. Approximation adders such as half adder, full adder, 4:2 compressor adder [6] are applied to these altered partial products which reduces design complexity. The generated partial products of the multiplier [7] are altered by varying the probability terms to reduce logic complexity of the circuit. Implementation of the approximate multiplier has three steps: generation of partial products, reduction tree method and a vector merge addition. Power consumption, circuit complexity are dominated by partial product reduction method. Hence, approximation is applied in reduction tree method to reduce power consumption.

|                  |                  |                  |                         |                  |                  | X <sub>7</sub>          | X <sub>6</sub>   | X5               |                  | ( <sub>4</sub> ) | X <sub>3</sub>   | <b>X</b> <sub>2</sub> | $X_1$            | $X_0$            |
|------------------|------------------|------------------|-------------------------|------------------|------------------|-------------------------|------------------|------------------|------------------|------------------|------------------|-----------------------|------------------|------------------|
|                  |                  |                  |                         |                  |                  | <b>Y</b> <sub>7</sub>   | Y <sub>6</sub>   | Y <sub>5</sub>   | Y                | 4                | Y <sub>3</sub>   | Y <sub>2</sub>        | Y <sub>1</sub>   | Y <sub>0</sub>   |
| A <sub>7,7</sub> | A <sub>7,6</sub> | A <sub>7,5</sub> | A <sub>7,4</sub>        | A <sub>7,3</sub> | A <sub>7,2</sub> | A <sub>7,1</sub>        | A <sub>7,0</sub> | A <sub>6,0</sub> | A <sub>5,0</sub> | A <sub>4,0</sub> | A <sub>3,0</sub> | A <sub>2,0</sub>      | A <sub>1,0</sub> | A <sub>0,0</sub> |
|                  | A <sub>6,7</sub> | A <sub>5,7</sub> | A <sub>4,7</sub>        | A <sub>3,7</sub> | A <sub>2,7</sub> | A <sub>1,7</sub>        | A <sub>0,7</sub> | A <sub>0,6</sub> | A <sub>0,5</sub> | A <sub>0,4</sub> | A <sub>0,3</sub> | A <sub>0,2</sub>      | A <sub>0,1</sub> |                  |
|                  |                  | A <sub>6,6</sub> | A <sub>6,5</sub>        | A <sub>6,4</sub> | A <sub>6,3</sub> | A <sub>6,2</sub>        | A <sub>6,1</sub> | A <sub>5,1</sub> | A <sub>4,1</sub> | A <sub>3,1</sub> | A <sub>2,1</sub> | A <sub>1,1</sub>      |                  |                  |
|                  |                  |                  | A <sub>5,6</sub>        | A <sub>4,6</sub> | A <sub>3,6</sub> | A <sub>2,6</sub>        | A <sub>1,6</sub> | A <sub>1,5</sub> | A <sub>1,4</sub> | A <sub>1,3</sub> | A <sub>1,2</sub> |                       |                  |                  |
|                  |                  |                  |                         | A <sub>5,5</sub> | A <sub>5,4</sub> | A <sub>5,3</sub>        | A <sub>5,2</sub> | A <sub>4,2</sub> | A <sub>3,2</sub> | A <sub>2,2</sub> |                  |                       |                  |                  |
|                  |                  |                  |                         |                  | A <sub>4,5</sub> | A <sub>3,5</sub>        | A <sub>2,5</sub> | A <sub>2,4</sub> | A <sub>2,3</sub> |                  |                  |                       |                  |                  |
|                  |                  |                  |                         |                  |                  | $A_{4,4}$               | A <sub>4,3</sub> | A <sub>3,3</sub> |                  |                  |                  |                       |                  |                  |
|                  |                  |                  |                         |                  |                  |                         | A <sub>3,4</sub> |                  |                  |                  |                  |                       |                  |                  |
| A <sub>7,7</sub> | A <sub>7,6</sub> | A <sub>7,5</sub> | P <sub>7,4</sub>        | P <sub>7,3</sub> | P <sub>7,2</sub> | P <sub>7,1</sub>        | P <sub>7,0</sub> | P <sub>6,0</sub> | P <sub>5,0</sub> | P <sub>4,0</sub> | P <sub>3,0</sub> | A <sub>2,0</sub>      | A <sub>1,0</sub> | A <sub>0,0</sub> |
|                  | A <sub>6,7</sub> | A <sub>5,7</sub> | P <sub>6,5</sub>        | P <sub>6,4</sub> | P <sub>6,3</sub> | P <sub>6,2</sub>        | P <sub>6,1</sub> | P <sub>5,1</sub> | P <sub>4,1</sub> | P <sub>3,1</sub> | P <sub>2,1</sub> | A <sub>0,2</sub>      | A <sub>0,1</sub> |                  |
|                  |                  | A <sub>6,6</sub> | G <sub>7,4</sub>        | A <sub>5,5</sub> | P <sub>5,4</sub> | P <sub>5,3</sub>        | P <sub>5,2</sub> | P <sub>4,2</sub> | P <sub>3,2</sub> | A <sub>2,2</sub> | G <sub>3,0</sub> | A <sub>1,1</sub>      |                  |                  |
|                  |                  |                  | <b>G</b> <sub>6,5</sub> | G <sub>7,3</sub> | G <sub>7,2</sub> | A <sub>4,4</sub>        | P <sub>4,3</sub> | A <sub>3,3</sub> | G <sub>5,0</sub> | G <sub>4,0</sub> | G <sub>2,1</sub> |                       |                  |                  |
|                  |                  |                  |                         | G <sub>6,4</sub> | G <sub>6,3</sub> | G <sub>7,1</sub>        | G <sub>7,0</sub> | G <sub>6,0</sub> | G <sub>4,1</sub> | G <sub>3,1</sub> |                  |                       |                  |                  |
|                  |                  |                  |                         |                  | G <sub>5,4</sub> | G <sub>6,2</sub>        | G <sub>6,1</sub> | G <sub>5,1</sub> | G <sub>3,2</sub> |                  |                  |                       |                  |                  |
|                  |                  |                  |                         |                  |                  | <b>G</b> <sub>5,3</sub> | G <sub>5,2</sub> | G <sub>4,2</sub> |                  |                  |                  |                       |                  |                  |
|                  |                  |                  |                         |                  |                  |                         | G <sub>4,3</sub> |                  |                  |                  |                  |                       |                  |                  |

Fig. 2 Transformation of generated partial products into altered partial products

Geometric mean filter is widely used in image processing to reduce Gaussian noise. Peak Signal to Noise Ratio (PSNR) is used as to assess the quality of approximate multipliers and it is based on mean-square error found between resulting image of exact multiplier and approximate multipliers. The proposed multipliers outperforms the accurate multiplier by all means of area, power, and delay, achieves better PSNR values in image processing application.

### **Dual-Quality 4:2** Compressors in Dynamic Accuracy Configurable Multipliers

This approximate multiplier uses the dual-quality 4:2 compressors [8] which provide the ability of switching between the accurate and approximate operating modes during the runtime. This provide high speed and low power consumption at the cost of less accuracy. The compressors [6] may be utilized in the architectures of dynamic

quality configurable for the parallel multipliers. The efficiencies of these compressors can be evaluated by a 45-nm standard CMOS technology in a 8-bit Dadda multiplier shown in Fig.3.



### Fig. 3 Reduction circuitry of a 8-bit Dadda mutiplier.

The basic structure of proposed dual quality 4:2 compressor is shown in Fig.4. It consists of two modes, they are approximate and supplementary. In the approximate mode, only the approximate part is active while the supplementary part is power gated, whereas in the exact operating mode, the supplementary part along with some components of the approximate part are get utilized. The power gating technique is used to turn OFF the unused components of the approximate part.



Fig. 4 Block diagram of the proposed approximate dual quality 4:2 compressors

This approximate multiplier assess the effectiveness of the proposed compressors in real applications, they are utilized in three image processing applications [5] such as image sharpening, smoothening and image multiplication. It is necessary to convert the each pixel value into n bits. All input image pixel values are mapped from range (0-255) to (0-65535). Quality of the image is measured based on peak signal to noise ratio (PSNR) value of the image.

# **Energy Efficient Approximate Multiplication Circuits through Partial Product Perforation**

The approximate multipliers such as Array, Wallace multipliers are designed by the partial product perforation technique [9] is process of perforating any two rows from the original partial products generated by the normal multipliers. The partial product perforation entirely different from the truncation [10] technique and the optimized design solution are mainly focus on power, area and error trade off. Truncation eliminates the circuit that produces least significant bits (LSBs) of the accumulation tree, while the perforation technique skips the generation of partial products and thus reduces the accumulation of the operands. The array multiplier well known of its simple structure is shown in Fig.6(a). The multiplication operation generally produce partial product. The generated partial products are get shifted and then added based on their bit orders. Carry propagate adder has N-1 adders, where N is the multiplier structure. The partial product perforation technique in approximate array multiplier is shown in Fig. 6(b). Second, Accurate and Approximate wallace multiplier is shown in Fig.6(c) and 6(d) respectively, combination of three points represents the full adder and two points represents the half adder operations. By analysing the results of multipliers, approximate multiplier produces the better results in terms of delay, area and efficiency in comparison with accurate multipliers.

The Efficiency of proposed multiplier can be evaluated by using Canny edge detection and geometric mean filters from the image processing application. Canny edge detection is considered to be an optimal edge detector filter [11] and it tracks the edges by Hysteresis technology. The peak signal–noise ratio (PSNR) is used to evaluate the accuracy of the output images. Geometric mean filter removes noise from images, produces better results than the arithmetic mean filter for Gaussian-type noise. The proposed multipliers evaluate that the accurate multiplier achieves better peak signal to noise ratio (PSNR) values in image processing application.



Fig. 5 Partial product reduction process of 8 bit (a) Accurate array, (b) Approximate array, (c) Accurate Wallace multiplier (d) Approximate Wallace multiplier

# Approximate Hybrid High Radix Encoding for Energy-Efficient Inexact Multipliers

An approximate hybrid high radix encoding [11] for generating the partial products in signed multiplications that encodes the most significant bits with the accurate radix-4 encoding and the least significant bits with an approximate higher radix encoding. The approximations are performed by rounding the high radix values to their nearest power of two, hence the number of the partial products decreases significantly and simpler tree architectures are used for their accumulation to reduce the multiplier's energy consumption, area, and critical path delay. The effectiveness of the approximate hybrid radix encoding technique is explored with its application using 16-bit Wallace tree multiplier to implement the partial product's sum, whereas the two outputs produced by the Wallace tree are added using a prefix adder. Signed numbers for k=6, 8, 10 are encoded using radix-64, radix-256, and radix-1024, respectively shown in fig.6. The trees also include the encoding's correction term with constant terms and sign factors. The proposed approximate multipliers are named RAD2k, e.g., RAD64, RAD256, and RAD1024 showing the selected approximate high radix encoding. This technique are configured to achieve the desired energy and accuracy. Compared with the accurate radix-4 multiplier, the proposed multipliers deliver up to 56% energy and 55% area.



Fig. 6 Hybrid encoding of accurate radix-4 and approximate (a) radix-64, (b) radix-256, and (c) radix-1024 The evaluation of the proposed multipliers can be done in image processing application [1] using Sobel edge detector. The Sobel detector is linearly applied to the image for the process of computing an approximation of the gradient of the image intensity function. An input image of 16-bit pixels is used for the performance evaluation of edge detection with  $16 \times 16$  approximate multipliers. The percentage of the edges detection is a quality metrics using the approximate multiplier are compared with the accurate radix-4.

### Low-power and high-speed shift-based multiplier

The novel algorithm and architecture of the decimal multiplication [12] are proposed to reduce the area in its hardware realization and keeping the latency at reasonable rate. In the sequential multiplier architecture provides better performance at area cost while comparing simpler combinational multiplier logarithmic shifter [13] is considered to be a positive approximation term used for shifting process by means of multiplication. This provides the truncation portion of shifter without the loss in its value. The proposed approximate multiplier uses the new

technique called the shiftply [14], by using shift registers. The main idea behind our proposed scheme is to approximate the first operand of a multiplier to the nearest exponent power of  $2(2^n)$  and then it get multiplied by the second operand. A High-level 8-Bit sequential "Shiftply" design schematic is shown in Fig.7. The most significant "10" or "11" pattern of the Active is located, which then gets rounded to the nearest power of 2. Whenever the shift is passive, fill it sequentially with active bits from the MSB. Instead of full-fledged multiplication the proposed approximate multiplication scheme is end up with a shift method. Pass transistor logic was implemented with the design for targeting low energy circuits [15]. The performance of this proposed approximate multiplier can be simulated using HSPICE using 90 nm technology and the results get compared with the accurate multiplier.



Fig. 7 High-level 8-Bit sequential "Shiftply" design schematic

The feasibility of the Shiftply scheme is used in error tolerant multimedia applications by using JPEG encoding technique. The shift-based sequential multiplication reaches very high speed with high clock frequencies and it takes the less transistor count. The evaluation of the proposed multipliers can be done in image processing application using JPEG compression encoding technique. When compared with the original image, the compressed images of approximate multiplier exhibit close peak signal to noise ratio in the shiftply scheme by using JPEG encoding scheme.

# COMPARISON OF APPROXIMATE MULTIPLIER

The area, power improvements, delay and the image processing techniques used in various approximate multipliers is compared with the accurate array multiplier. The below given Table-1 shows the clear description of various multipliers results with its parameters.

| Multiplier                | Power  | Area  | Delay | Image processing techniques |
|---------------------------|--------|-------|-------|-----------------------------|
| Vasileios Leon [2]        | 56%    | 55%   | 8.38% | Sobel edge detection        |
| Suganthi Venkatachalam[4] | 72%    | 38%   | -     | Geometric mean filter       |
| Omid Akbari[7]            | 68%    | 42%   | 46%   | Smoothening, Sharpening     |
| Sami Malek [8]            | 64%    | 69.6% | -     | JPEG compression encoding   |
| Georgios Zervakis[11]     | 50%    | 45%   | 35%   | Canny edge detection        |
| Bharat Garg [9]           | 57.37% | 50.2% | 21%   | Gaussian smoothing filter   |

Table -1 Comparison of various approximate multiplier

### CONCLUSIONS

This survey explains various proposed approximate multiplier by using the Image processing techniques. This approximation technique achieves significant improvement in parameters like high speed, area, delay and more power savings with limited loss in accuracy. An image processing application of approximate multiplier is performed, where peak signal to noise ratio of the image is analyzed. The image quality of the approximate multiplier shows satisfactory results compared to an accurate multiplier.

### REFERENCES

- [1]. Qiang Xu, Nam Sung Kim and Todd Mytkowicz, Approximate Computing: A Survey, *IEEE Design & Test*, **2016**, 33(1), 8 22.
- [2]. S Narayanamoorthy, H A Moghaddam, Z Liu, T Park and N S Kim, Energy-efficient approximate multiplication for digital signal processing and classification applications, *IEEE Transaction on Very Large Scale Integration (VLSI) Systems*, **2015**, 2(6), 1180–1184.
- [3]. Vignesh, A Built-In Self ReAir Analyer For Word-Oriented Memories, *Global Journal Of Engineering Science and Researches*, **2015**, 2(5), 149-156.
- [4]. Bharat Garg and G K Sharma, ACM: An Energy-Efficient Accuracy Configurable Multiplier for Error-Resilient Applications, *J. Electron Test*, **2017**, 33,479-489.
- [5]. Vignesh, H Mangalam and S Gayathri, FPGA architecture for text extraction from images, *Cluster Computing*, **2018**, https://doi.org/10.1007/s10586-017-1567-z, 1-10.

- [6]. A Momeni, J Han, P Montuschi, and F Lombardi, Design and analysis of approximate compressors for multiplication, *IEEE Transaction on computing*, 2015, 64(4), 984 – 994.
- [7]. Suganthi Venkatachalam and Seok-Bum Ko, Design of Power and Area Efficient Approximate Multipliers, *IEEE Transaction on Very Large Scale Integration (VLSI) Systems*, **2017**, 15(5), 1782-1786.
- [8]. Omid Akbari, Mehdi Kamal, Ali Afzali-Kusha and Massoud Pedram, Dual-Quality 4:2 Compressors for Utilizing in Dynamic Accuracy Configurable Multipliers, *IEEE Transaction on Very Large Scale* Integration (VLSI) Systems, 2017, 25(4), 1352 - 1361.
- [9]. Georgios Zervakis, Kostas Tsoumanis, Sotirios Xydis, Dimitrios Soudris and Kiamal Pekmestzi, Design-Efficient Approximate Multiplication Circuits Through Partial Product Perforation, *IEEE Transaction on Very Large Scale Integration (VLSI) Systems*, **2016**, 24(10), 3105 – 3117.
- [10]. M Seyed Mohammad, Tabei and Hooman Nikmehr, An Unsigned Truncated Sequential Multiplier with Variable Error Compensation, *Microprocessors and Microsystems*, **2017**, 49, 9 17.
- [11]. Vasileios Leon, Georgios Zervakis, Dimitrios Soudris, and Kiamal Pekmestzi, Approximate Hybrid High Radix Encoding for Energy-Efficient Inexact Multipliers, *IEEE Transaction on Very Large Scale Integration (VLSI) Systems*, 2018, 12(5), 421 - 430.
- [12]. O Vignesh, FPGA Implementation of Hardware Efficient Sequential Decimal Fixed Point Multipler, *Global Journal Of Engineering Science and Researches*, **2015**, 2(5), 139-148.
- [13]. Syed Ershad Ahmed and M. B. Srinivas, An Improved Logarithmic Multiplier for Media Processing, Journal of Signal Processing Systems, **2018**, https://doi.org/10.1007/s12265-018-1350-2,1-14.
- [14]. Sami Malek, Sarah Abdallah, Ali Chehab, Imad H Elhajj and Ayman Kayssi, Low-power and high-speed shift-based multiplier for error tolerant applications, *Microprocessors and Microsystems*, 2017, 52, 566-574.
- [15]. O Vignesh and H Mangalam, An efficient 1bit full subtractor circuit using hybrid CMOS logic, Asian journals of Information technology, 2016, 15(16), 2948 – 2953.