RESEARCH ARTICLE OPEN ACCESS # Design of Radix- 4 Signed Digit Encoding for Pre- Encoded Multipliers Using Verilog. V.Indumathi<sup>1</sup>, P.Nagaraju<sup>2</sup> <sup>1</sup>M-Tech, Dept. of ECE,kakinada Institute of Engineering and technology, korangi. <sup>2</sup>Assoc. Prof,Dept. of ECE,kakinada Institute of Engineering and technology, korangi. ## **Abstract:** In this paper, we introduce an architecture of pre-encoded multipliers for digital signal processing applications based on off-line encoding of coefficients. To this extend, the Non-Redundant radix-4 Signed-Digit (NR4SD) encoding technique, which uses the digit values f1; 0; þ1; þ2g or f2; 1; 0; þ1g, is proposed leading to a multiplier design with less complex partial products implementation. Extensive experimental analysis verifies that the proposed pre-encoded NR4SD multipliers, including the coefficients memory, are more area and power efficient than the conventional Modified Booth scheme. **Keywords** — Multiplying circuits, modified Booth encoding, pre-encodedmultipliers, VLSI implementation. #### 1. INTRODUCTION Multimedia and digital signal processing (DSP) applications (e.g., fast Fourier transform (FFT), audio/video CoDecs) carry out a large number of multiplications with coefficients that do not change during the execution of the application. Since the multiplier is a basic component for implementing computationally intensive applications, architecture seriously affects their performance. Constant coefficients can be encoded to contain the least nonzero digits using the canonic signed digit (CSD) representation. CSD multipliers comprise the fewest non-zero partial products, which in turn decreases their switching activity. However, the CSD encoding involves serious limitations. Folding technique, which reduces silicon area by time-multiplexing many operations into single functional units, e.g., adders, multipliers, is not feasible as the CSD-based multipliers are hard-wired to specific coefficients. In, a CSD-based programmable multiplier design was proposed for groups of coefficients pre-determined that certain features. The size of ROM used to groups of coefficients the significantly reduced as well as the area and power consumption of the circuit. However, this multiplier design lacks flexibility since the partial products generation unit is designed specifically for a group of coefficients and cannot be reused for another group. Also, this method cannot be easily extended to large groups of predetermined coefficients attaining at the same time high efficiency. Modified Booth (MB) encoding tackles the aforementioned limitations and reduces to half the number of partial products resulting to reduced area, critical delay and power consumption. However, a dedicated encoding circuit is required and the partial products generation is more complex. In, Kim et al. proposed a technique , for designing efficient MB similar to multipliers for groups of pre-determined coefficients with the same limitations described in the previous paragraph. Fig. 1. Block diagram of the NR4SD<sup>-</sup> encoding scheme at the (a) digit and (b) word level. At one time, there was a push to name and adjust different levels of huge scale joining above VLSI. Terms like Ultra-substantial scale Integration (ULSI) were utilized. In any case, the gigantic number of entryways and transistors accessible on regular gadgets rendered such fine refinements recommending debatable.Terms more prominent than VLSI levels of combination are no more in boundless use. Indeed, even VLSI is presently to some degree interesting, given the regular suspicion that all chip are VLSI or better. Starting mid 2008. billion-transistor processors economically accessible, an illustration of which is Intel's Montecito Itanium chip. This is relied upon to wind up more typical as semiconductor manufacture moves from the present era of 65 nm procedures to the following 45 nm eras (while encountering new difficulties, for example, expanded variety crosswise over procedure corners). Another outstanding case is NVIDIA's 280 arrangement GPU. TABLE 1 Modified Booth Encoding | $b_{2j+1}$ | $b_{2j}$ | $b_{2j-1}$ | $\mathbf{b}_{j}^{MB}$ | $s_{j}$ | $one_j$ | $two_j$ | |------------|----------|------------|-----------------------|---------|---------|---------| | 0 | 0 | 0 | 0 | 0 | 0 | 0 | | 0 | 0 | 1 | +1 | 0 | 1 | 0 | | 0 | 1 | 0 | +1 | 0 | 1 | 0 | | 0 | 1 | 1 | +2 | 0 | 0 | 1 | | 1 | 0 | 0 | -2 | 1 | 0 | 1 | | 1 | 0 | 1 | -1 | 1 | 1 | 0 | | 1 | 1 | 0 | -1 | 1 | 1 | 0 | | 1 | 1 | 1 | 0 | 1 | 0 | 0 | # 2. SIMULATION IMPLEMENTATION #### **Pre-Encoded Mb Multiplier Design** In the pre-encoded MB multiplier scheme, the coefficient B is encoded off-line according to the conventional MB form (Table 1). The resulting encoding signals of B are stored in a ROM. The circled part of Fig. 3, which contains the ROM with coefficients in 2's complement form and the MB encoding circuit, is now totally replaced by the ROM of Fig. 5. The MB encoding blocks of Fig. 3 are omitted. The new ROM of Fig. 5 is used to store the encoding signals of B and feed them into the partial product generators (PPj Generators PPG) on each clock cycle. Targeting to decrease switching activity, the value '1' of sj in the last entry of Table 1 is replaced by '0'. The sign sj is now given by the relation: As a result, the PPG of Fig. 4a is replaced by the one of Fig. 4b. Compared to (4), (12) leads to a more complex design. However, due to the pre-encoding technique, there is no area/delay overhead at the circuit. The partial products, properly weighted, and the COR of (11) are fed into a CSA tree. The input carry cin; of (11) is computed as cin; <sup>1</sup>/<sub>4</sub> sj based on (12) and Table 1. The CS output of the tree is finally merged by a fast CLA adder. However, the ROM width is increased. Each digit requests encoding bits (i.e., s, two and one (Table 1)) to be stored in the ROM. Since the n-bit coefficient B needs three bits per digit when encoded in MB form, the ROM width requirement is 3n/2 bits per coefficient. Thus, the width and the overall size of the ROM are increased by 50 percent compared to the ROM of the conventional scheme. #### **Pre-Encoded NR4SD Multipliers Design** The system architecture for the pre-encoded NR4SD multipliers is presented in Fig. 6. Two bits are now stored in ROM: n 2jþ1, nþ 2j (Table 2) for the NR4SD or nþ 2jþ1, n 2j (Table 3) for the NR4SDþ form. In this way, we reduce the memory requirement to n + 1 bits per coefficient while the corresponding memory required for the pre-encoded MB scheme is 3n/2 bits per coefficient. Thus, the amount of stored bits is equal to that of the conventional MB design, except for the most significant digit that needs an extra bit as it is MB encoded. Compared to the preencoded MB multiplier, where the MB encoding blocks are omitted, the preencoded NR4SD multipliers need extra hardware to generate the signals of (6) and (8) for the NR4SD and NR4SDb form, respectively. # 3. SIMULATION RESULTS Fig:-3 Block Diagram Fig:-4 RTL Schematic Fig:-5 Simulation output #### 4. CONCLUSION In this paper, new designs of pre-encoded multipliers are explored by off-line encoding the standard coefficients and storing them in system memory. We propose encoding these coefficients in the Non-Redundant radix-4 Signed-Digit (NR4SD) form. The proposed pre encoded NR4SD multiplier designs are more area and power efficient compared to the conventional and pre-encoded MB designs. Extensive experimental analysis verifies the gains of the proposed preencoded NR4SD multipliers in terms of area consumption complexity and power compared the conventional MB multiplier. #### 5. REFERENCES [1] G. W. Reitwiesner, "Binary arithmetic," Adv. Comput., vol. 1, pp. 231–308, 1960. [2] K. K. Parhi, VLSI Digital Signal Processing Systems: Design and Implementation, Hoboken, NJ, USA: Wiley, 2007. [3] Y.-E. Kim, K.-J. Cho, J.-G. Chung, and X. Huang, "CSD-based programmable multiplier design for predetermined coefficient groups," IEICE Trans. Fundam. Electron. Commun. Comput. Sci., vol. 93, no. 1, pp. 324–326, 2010. [4] O. Macsorley, "High-speed arithmetic in binary computers," Proc. IRE, vol. 49, no. 1, pp. 67–91, Jan. 1961. [5] W.-C. Yeh and C.-W. Jen, "High-speed booth encoded parallel multiplier design," IEEE Trans. Comput., vol. 49, no. 7, pp. 692–701, Jul. 2000. [6] Z. Huang, "High-level optimization techniques for low-power multiplier design," Ph.D. dissertation, Dept. Comput. Sci., Univ. California, Los Angeles, CA, USA, 2003. [7] Z. Huang and M. Ercegovac, "High-performance low-power left-to-right array multiplier design," IEEE Trans. Comput., vol. 54, no. 3, pp. 272–283, Mar. 2005. [8] Y.-E. Kim, K.-J. Cho, and J.-G. Chung, "Low power small area modified booth multiplier design for predetermined coefficients," IEICE Trans. Fundam. Electron. Commun. Comput. Sci., vol. E90-A, no. 3, pp. 694–697, Mar. 2007. [9] C. Wang, W.-S. Gan, C. C. Jong, and J. Luo, "A low-cost 256-point FFT processor for portable speech and audio applications," in Proc. Int. Symp. Integr. Circuits, Sep. 2007, pp. 81–84. [10] A. Jacobson, D. Truong, and B. Baas, "The design of a reconfigurable continuous-flow mixed-radix FFT processor," in Proc. IEEE Int. Symp. Circuits Syst., May 2009, pp. 1133–1136. #### **Authors Profile** ## V.Indumathi I Indumathi was born in warangal ,telangana on february 17, 1994. I graduated from the V.S Lakshmi Enginering college for women (JNTU) kakinada. Presently I am studying M.Tech in kakinada Institute of Engineering and technology, korangi. Mr. P.NAGARAJU born was DRAKSHRAMAM, AP, on MAY 01 1982. He graduated from the Jawaharlal Nehru Technological University, Hyderabad, Postgraduated from the Jawaharlal Nehru Technological University, Kakinada, and Pursuing Ph.D. from JNTUK. Presently He is working as an Assoc. Prof in Kakinada Institute of Engineering & Technology, Korangi. So far he is having 12 Years of Teaching Experience in various reputed engineering colleges. His special fields of interest included VLSI-Signal Processing, Embedded Systems, Digital Signal Processing & communication Systems.