# **VLSI Architecture of Shared Multiplier Scheduling Scheme** for Reconfigurable FFT/IFFT Processor <sup>1</sup>B. Anil Babu, <sup>2</sup>V. Sree Lakshmi <sup>1</sup>Embedded Systems, Gudlavalleru Engineering College, Andhra Pradesh, India <sup>2</sup>Dept.of ECE, Gudlavalleru Engineering College, Andhra Pradesh, India #### **Abstract** This paper proposes VLSI Architecture of SMSS (Shared Multiplier Scheduling Scheme) for Reconfigurable FFT/IFFT processor. This architecture provides flexibility of selecting various FFT sizes (2, 4, 8, 16, 32, 64, 128 and 256) length, so that the hardware complexity of processor is reduced. The multipliers in SMSS based FFT processor are replaced with Vedic multiplier to improve speed of calculation. The proposed FFT/IFFT processors based on SMSS are designed using XILINX ISE Tool and modeled in Verilog HDL. The synthesis results shows that Area is reduced by 17% and speed is increased by 11%. In addition the proposed processor can be extended to any FFT sizes using additional stages. #### **Keywords** FFT (Fast Fourier Transform), reconfigurable FFT, Vedic multiplier, MRMDC (mixed radix multipath delay Commutator), OFDM (Orthogonal Frequency Division Multiplexing). #### I. Introduction FFT/IFFT processors play an important role in signal processing and image processing applications. Such a applications require high speed FFT/IFFT processors to meet continuing demands for higher data rates. For example, in OFDM systems the IFFT computation is required in transmitter, while FFT is performed in receiver. There is no need to implement IFFT processors separately because the IFFT processor is derived from FFT processor by taking complex conjugate of input and output data and then dividing the conjugate of the FFT output by FFT size. So the role of FFT processors in dealing with OFDM systems requires low power and high speed processors. For implementing such processors, several architectures are proposed for reducing hardware complexity. The FFT processors are divided mainly into Memory based, Reconfigurable, and Pipelined FFT processors .Memory based architectures [1-2] are proposed to achieve small area. The Application specific instruction set processors [3], [4] have been proposed to meet flexibility for FFT computation. Reconfigurable FFT processors [5] are proposed to select various FFT sizes on processor so that hardware complexity is reduced. The pipelined processors [6] are used for achieving high throughput rate. Pipelined processors are classified into SDC (single-path delay commutator), SPF (single-path feedback), MDF (multipath delay feedback), and MDC (multipath delay commutator). From the above, MDC architectures provides high throughput rate and simple synchronizing control using multidata paths. In the MDC architectures [7], if the radix size is high then high throughput rate is achieved. For example to achieve higher data rate radix-8, radix-16 and above are used. But in OFDM systems the typical FFT sizes are 128,256 but radix-8 algorithm can't suitable for 128/256 FFT sizes because it is not power of 8. So mixed radix algorithm is used. So mixed radix FFT/IFFT processors are required to meet computational demands of OFDM systems. The MRMDC architecture [8] in pipelined processors employing eight parallel data paths operates at higher clock frequency to satisfy computation demands. SMSS based FFT/IFFT processors [9] in MRMDC architecture provides area efficiency and higher data rate. The SMSS based FFT processors reduces the number of complex multipliers by moving the moving the multipliers from second stage to first stage for 128/256 point FFT processor. This paper proposes flexibility in selecting FFT sizes i.e. reconfigurable FFT/IFFT processor [5] based on SMSS algorithm [9], so that hardware complexity is further reduced. The multipliers in the existing processor are replaced with Vedic multiplier to improve the speed of computation. The remainder of this paper is organised as follows. Section II describes the MRMDC architecture [8] .Section III provides the details of the existing SMSS based FFT/IFFT processor. Section IV gives the information about the proposed Architecture of SMSS for reconfigurable FFT/IFFT processor (FFT size: 2, 4, 8, 16, 32, 64, 128 and 256). Section V presents the design and implementation of proposed FFT/IFFT processor with Vedic multiplier. Finally conclusion is presented in Section VI. ## **II. MRMDC Architecture** MRMDC (mixed radix multipath delay commutator) architecture [8] uses mixed radix algorithm and multipath data bits. The mixed radix algorithm with assumptions is described as follows. The N-point DFT (Discrete Fourier Transform) is defined as $$X(k) = \sum_{n=0}^{N-1} x(n). W_N^{nk}, k=0,1,...,N-1$$ (1) Where x(n) = input sequence, X(k) = output sequence, N=transform $W_N^{nk} = N^{th}$ primitive root of unity (twiddle factor) $$W_N^{nk} = e^{-jnk(\frac{2\pi}{N})}$$ $$= \cos(2\pi nk/N) - j\sin(2\pi nk/N)$$ (2) When the FFT size is not a power of radix r, then mixed radix algorithm should be used. For example if $$N = 256$$ $$n = 64n1+n2, \quad 0 \le n1 \le 4, \ 0 \le n2 \le 64$$ $$k = k1+4k2, \quad 0 \le k1 \le 4, \ 0 \le k2 \le 64$$ (3) Substituting (3) in (1) X(k) = X(k1+4k2) $$= \sum_{n=0}^{63} \sum_{n=0}^{3} x(64n1+2). W_{256}^{(64N1+N2)(K1+4K2)}$$ $$=\sum_{n=0}^{63} \{BF4(n2,k1)\}W_{64}^{n2k2}$$ (4) The 256 point mixed radix FFT algorithm can be derived from (4) by decomposing the remaining 64-point DFT into 8-point DFT twice. The MRMDC architecture is explained using 128/256-point MRMDC FFT/IFFT processor [8] .The architecture consists of BU's, delay commutators, and twiddle factor multipliers. In the first stage, the radix-2/4 BU can perform one radix-4 or two radix-2 operations to compute the 128 and 256-point FFTs. There are three stages based on radix-2, radix-4, and radix-8 algorithms. The input sequence of single path is divided into eight data paths. ### III. SMSS based FFT/IFFT Processor SMSS based FFT/IFFT processor with eight parallel MRMDC architecture gives high throughput and low hardware complexity. In the existing processor the first stage uses shared multipliers and second stage uses the modified radix-8 butterfly units. The architecture of existing SMSS based 128/256-point FFT/IFFT processor [9] is shown in below fig. 1. Fig. 1: SMSS based 128/256 Point FFT Processor In the first stage in fig. 1, the radix-2/4 BU can perform two radix-2 butterfly computations. The structure performs complex multiplications for second stage before the delay commutator using shared multipliers. In the first stage, the input sequence of each data-path is divided into four data streams. Based on the SMSS based algorithm [9], the processor achieved area efficiency when compared with previous architectures. For a single FFT size processor it is suitable, but there is a case where variable size FFT processor is required. So we need a processor that supports reconfigurability feature in order to select multiple FFT sizes on single processor. Section IV presents architecture of reconfigurable FFT processor for the existing processor. ## IV. VLSI Architecture of SMSS for Reconfigurable FFT/IFFT Processor The existing SMSS based FFT/IFFT processor [9] can support 128/256 point FFT size at a time only. Variable size FFT processor is required to support multiple FFT sizes. Reconfigurable FFT/IFFT processor provides such a flexibility of selecting any FFT sizes on single processor design without going to each FFT size processor. The existing work can be extended to support various FFT sizes (2, 4, 8, 16, 32, 64, 128 and 256) lengths using reconfigurable FFT processor. The reconfigurable feature is added to the existing specific size FFT processor using multiplexer as shown in fig. 2. Fig. 2: Block Diagram The re-configurability for the SMSS based FFT/IFFT processor reduces the hardware complexity. The multipliers used in the SMSS based FFT/IFFT processors are replaced with Vedic multipliers, so as to improve performance .Vedic multipliers follows ancient set of rules, so that the multiplication operations are efficient in time. Vedic multipliers [10] are based on sixteen sutras. Of these URDHVA TRIYAKBHYAM (vertically and crosswise) is sutra that supports all types of multiplications. So that the speed of calculation is further improved using Vedic multipliers. The implementation and synthesis report gives the performance details of proposed processor in section -V. ### **V. Results and Comparison** The proposed SMSS based reconfigurable FFT/IFFT processor using Vedic multiplier is designed using Xilinx ISE13.2 Tool and modeled in Verilog HDL. The proposed FFT processors use Vedic multiplier [10] (URDHVA TRIYAKBHYAM sutra) to improve computation speed. The synthesis result shows the area efficiency of propose processor and computation speed improvement. The proposed processor is simulated using Modelsim10.1 simulator. The functional verification of design is done using MATLAB tool. The RTL schematic and simulation results are shown below as Fig. 3, Fig. 4. Fig. 3: RTL Schematic of Proposed SMSS Based Reconfigurable FFT Processor The RTL schematic of proposed processor contains four selection lines $(S_5 S_6 S_7 S_8)$ and clock rate 126MHz with 256 complex inputs and complex outputs. The simulation results of proposed processor with selection lines $S_5 S_6 S_7 S_8 = 0000$ and inputs (in\_re1, in\_re2,in\_re3,in\_re4=1 others=0) as shown in fig. 4. | | | I | | | | | |-------------------|-------|------------------|-------------------|--------|--------|--------| | Name | Value | 10 ns | 200 ns | 400 ns | 600 ns | 800 ns | | out10_re[7:0] | 52 | | XXXXXXX | | 52 | | | out10_im[7:0] | 0 | X -2 | X | | D | | | out11_re[7:0] | -36 | X 104 | XXXXXXXX | | -36 | | | out11_im[7:0] | 2 | <u>x</u> ( | | 2 | | | | out12_re[7:0] | 16 | X -100 | XXXXXXXX | | 16 | | | out12_im[7:0] | 48 | X -104 | XXXXXXXX | | 48 | | | out13_re[7:0] | 2 | X 2 | X 4 X | | 2 | | | out13_im[7:0] | 2 | X ( | 2 X | | 2 | | | out14_re[7:0] | 110 | X -86 | XXXXX-14XX | | 110 | | | out14_im[7:0] | 0 | X -2 | X • X | | 0 | | | out15_re[7:0] | -18 | X -86 | XXXX <del>=</del> | | -18 | | | out15_im[7:0] | 2 | X | | 2 | | | | out16_re[7:0] | -100 | X -110 | XXXXXXXX | | -100 | | | out16_im[7:0] | 72 | X 86 | X9XXX86XX | | 72 | | | ▶ 😽 out17_re[7:0] | 4 | X 2 | X | | 4 | | | out17_im[7:0] | 2 | X | | 2 | | | | | | X1: 1,000.000 ns | | | | | | Name | Value | 10 ns | | 200 ns | 400 ns | 600 ns | 800 ns | |--------------------|-------|------------------|------|-----------|--------|--------|-----------| | ■ out35_re[7:0] | -98 | $\stackrel{}{=}$ | 40 | \$000000 | | -98 | | | 12.7 | -98 | | 40 | ****** | | -90 | <b>——</b> | | out35_im[7:0] | 2 | X | | XXXXXXX | 2 | | | | out36_re[7:0] | -126 | X X | | XXXXXXX | | -126 | | | out36_im[7:0] | 102 | XX | -40 | <u> </u> | | 102 | | | out37_re[7:0] | 2 | X X | 2 | X 4 X | | 2 | | | ▶ 🚟 out37_im[7:0] | 2 | X X | | 2 X | | 2 | | | ▶ =8 out38_re[7:0] | 0 | X | -2 | X0X2X - X | | 0 | | | out38_im[7:0] | 0 | X | -2 | X • X | | 0 | | | ▶ 🚟 out39_re[7:0] | 0 | X | -2 | X9X8X | | 0 | | | ■% out39_im[7:0] | 2 | X X | | | 2 | | | | ▶ 🛂 out40_re[7:0] | 36 | X X | -122 | XXXXXXXX | | 36 | | | out40_im[7:0] | -2 | x | 2 | XXX | | -2 | | | ■ out41_re[7:0] | 4 | X X | 2 | X | | 4 | | | ▶ 5 out41_im(7:0) | 2 | x X | | | 2 | | | | out42_re[7:0] | -64 | x | -52 | XXXXXX | | -64 | | | | | | • | V | | | | | | | V1. 1.000.000 e | | | | | | | | | X1: 1,000.000 r | 15 | | | | | | | | 1 | | | | | |-------------------|-------|------------------|----------------------------------------|--------|--------|--------| | Name | Value | 0 ns | 200 ns | 400 ns | 600 ns | 800 ns | | out77_im[7:0] | 2 | X | | 2 | | | | ▶ 🚟 out78_re[7:0] | 16 | X -86 | XXXX=14XX | | 16 | | | ▶ 🚟 out78_im[7:0] | 0 | X -2 | X • X | | 0 | | | ▶ 🚟 out79_re[7:0] | 48 | X -86 | XXXX <del>-10</del> XX | | 48 | | | ▶ 🥳 out79_im[7:0] | 2 | X | | 2 | | | | ▶ 🚟 out80_re[7:0] | -82 | X -110 | ************************************** | | -82 | | | ▶ 🚟 out80_im[7:0] | 68 | X 86 | X9XXX86XX | | 68 | | | ▶ 😽 out81_re[7:0] | 2 | X 2 | X 4 X | | 2 | | | out81_im[7:0] | 2 | X | 2 X | | 2 | | | ▶ 🚟 out82_re[7:0] | 66 | X -84 | ************************************** | | 66 | | | out82_im[7:0] | 0 | X -2 | X • X | | 0 | | | out83_re[7:0] | 98 | X -84 | ************************************** | | 98 | | | ▶ 🚟 out83_im[7:0] | 2 | XX | | 2 | | | | ▶ 📆 out84_re[7:0] | 84 | X O | \$XXXXXXXXX | | 84 | | | ▶ 🥳 out84_im[7:0] | -12 | X 84 | \$0000000 | | -12 | | | ▶ 🔣 out85_re[7:0] | 2 | X 2 | X 4 X | | 2 | | | | | | | | | | | | | X1: 1,000.000 ns | | | | | | Name | Value | 0 ns | | 200 ns | 400 ns | 600 ns | 800 ns | |-----------------------------------|-------|---------------|----------|----------------------------------------------------------------|----------|--------|----------| | ■ out100_mi[7.0] ■ out107_re[7:0] | 1 | X | | $\hat{\infty}\hat{\infty}\hat{\infty}\hat{\infty}\hat{\infty}$ | | -48 | | | out107_im[7:0] | ll . | X | $\equiv$ | | 2 | | | | out108_re[7:0] | | X | -100 | XXXXXXXX | | 62 | | | out108_im[7:0] | -14 | X | -104 | XXXXXXX | | -14 | | | <b>out109_re</b> [7:0] | | X | 2 | X | | 4 | | | out109_im[7:0] | | X | | 2 X | | 2 | | | out110_re[7:0] | H | X | | XXXXXXX | | 44 | | | out110_im[7:0] | | X | -2 | X | | Þ | | | out111_re[7:0] | H | X | -86 | XXXXXXXX | | 104 | | | out111_im[7:0] | Н | x | | | 2 | | | | out112_re[7:0] | -68 | X | -110 | XXXXXXXX | | -68 | | | out112_im[7:0] | -10 | X | 86 | XXXXXXXX | | -10 | | | out113_re[7:0] | 2 | X | 2 | X 4 X | | 2 | | | out113_im[7:0] | 2 | X | | 2 | | 2 | | | out114_re[7:0] | -126 | X | -30 | XXXX | | -126 | | | | | $\overline{}$ | | V . V | | | | | | | X1: 1,000.000 | ins | | | | | | | | ,0001000 | | | <u> </u> | | <u> </u> | | | | | | | | | 1,000.000 ns | |----------------|-------|---------------|------|------------|--------|--------|--------------| | Name | Value | 0 ns | | 200 ns | 400 ns | 600 ns | 800 ns | | out140_im[7:0] | 48 | X | -104 | XXXXXXXX | | 48 | | | out141_re[7:0] | 2 | X( | 2 | X 4 X | | 2 | | | out141_im[7:0] | 2 | X | | 2 X | | 2 | | | out142_re[7:0] | 110 | X | -86 | XXXXX-14XX | | 110 | | | out142_im[7:0] | 0 | X | -2 | X • X | | 0 | | | out143_re[7:0] | -18 | X | -86 | XXXX-10XX | | -18 | | | out143_im[7:0] | 2 | X( | | | 2 | | | | out144_re[7:0] | -100 | X | -110 | XXXXXXXX | | -100 | | | out144_im[7:0] | 72 | X | 86 | X4XXX86XX | | 72 | | | out145_re[7:0] | 4 | X | 2 | X | | 4 | | | out145_im(7:0) | 2 | X | | | 2 | | | | out146_re[7:0] | -16 | X( | -30 | XXXX56XX | | -16 | | | out146_im[7:0] | 0 | X | -2 | X | | 0 | | | out147_re[7:0] | -124 | X | -30 | XXXX55XX | | -124 | | | out147_im[7:0] | 2 | X | | | 2 | | | | out148_re[7:0] | -52 | X | 122 | XXXX:::XX | | -52 | | | | | X1: 1,000.000 | ns | | | | | | | | | | | | | 1,000.000 ns | |----------------|-------|---------------|---------------|-----------------------|--------|--------|--------------| | Name | Value | 0 ns | | 200 ns | 400 ns | 600 ns | 800 ns | | out167_im[7:0] | 2 | X | $\overline{}$ | | 2 | | | | out168_re[7:0] | 68 | X | 12 | XXXX42XX | | 68 | | | out168_im[7:0] | -2 | X | 2 | XX2X • X | | -2 | | | out169_re[7:0] | 4 | X | 2 | X | | 4 | | | out169_im[7:0] | 2 | X | | | 2 | | | | out170_re[7:0] | 4 | X | 102 | XXXXX-24XX | | 4 | | | out170_im(7:0) | 0 | X | -2 | X | | D | | | out171_re[7:0] | -92 | X | 102 | XXXXIXX | | -92 | | | out171_im[7:0] | 2 | X | | | 2 | | | | out172_re[7:0] | -92 | X | 2 | XXXX | | -92 | | | out172_im(7:0) | 108 | X | -102 | XXXX <del>2</del> 0XX | | 108 | | | out173_re[7:0] | 2 | X | 2 | X 4 X | | 2 | | | out173_im[7:0] | 2 | X | | 2 X | | 2 | | | out174_re[7:0] | -10 | X | 12 | XXXXX=50XX | | -10 | | | out174_im[7:0] | 0 | X | -2 | X O X | | 0 | | | out175_re[7:0] | 14 | X | 12 | XXXX86XX | | 14 | | | | | X1: 1,000.000 | ns | | | | | Fig. 4: 256-point SMSS based Reconfigurable FFT Processor Using Vedic Multiplier. Assuming that the processor have same throughput rate i.e. (64\*clock rate) =8.036GS/s at 126 MHz clock rate. The hardware complexity and computation speed are compared in the performance table. The hardware complexity is compared with the existing SMSS based 256-point FFT processor in terms of ASIC design metric gate count. The equivalent gate count for the proposed processor is derived from the synthesis report of XILINX XST tool. Table 1: Performance Comparison | | SMSS based | SMSS based | Radix-4<br>Reconfigurable | | |-----------------------|-----------------------|---------------|---------------------------|--| | Technology | Reconfigurable | FFT processor | | | | | FFT(Vedic multiplier) | | FFT | | | No. of Data path bits | 8 | 8 | 8 | | | Architecture | Pipelined | Pipelined | Pipelined | | | FFT Size | 256 | 256 | 256 | | | Clock Rate(MHz) | 126 | 126 | 126 | | | Delay(ns) | 13.423 | 15.531 | 29.981 | | | Gate Count | 6,23,708 | 7,60,000 | 9,35,816 | | The gate count is calculated by selecting target device as Virtex-6 FPGA. The performance comparison is shown in above Table ### **VI. Conclusion** This paper proposed high speed and low hardware complexity SMSS based reconfigurable FFT/IFFT processor. The reconfigurable FFT processor can reduce the hardware complexity when compared with the existing 256-point SMSS based FFT processor. The proposed Vedic multiplier improves the computation speed. The performance results shows that the proposed reconfigurable FFT processor gives less hardware complexity i.e. area reduced by 17% and speed is increased by 11% with a throughput rate of 8.036GS/s. In addition the proposed architecture can apply any FFT size greater than 256 point using additional stages. #### References - [1] S.-J. Huang, S.-G. Chen, "A green FFT processor with 2.5-GS/s for IEEE 802.15.3c (WPANs)," In Proc. Int. Conf. Green Circuits Syst., Jun. 2010, pp. 9-13. - [2] P.-Y. Tsai, T.-H. Lee, T.-D. Chiueh, "Power-efficient continuous flow memory-based FFT processor for WiMax OFDM mode," In Proc. Int. Symp. Intell. Signal Process. commun. Syst., Dec. 2006, pp. 622-625. - [3] X. Guan, Y. Fei, H. Lin, "Hierarchical design of an application specific instruction set processor for high-throughput and scalable FFT processing," IEEE Trans. Very Large Scale Integr. (VLSI) Syst., Vol. 20, No. 3, pp. 551-563, Mar. 2012. - [4] S. He, M. Torkelson, "Designing pipeline FFT processor for OFDM (de)modulation," In Proc. Int. Symp. Signals, Syst., Electron., Sep. 1998, pp. 257-262. - [5] C.-H. Yang, T.-H. Yu, D. Markovic, "Power and area optimization of reconfigurable FFT processors: A 3Gpp-LTE example," IEEE Trans. Very Large Scale Integr. (VLSI) Syst., Vol. 47, No. 3, pp. 757-768, Mar. 2012. - [6] T. Cho, H. Lee, "A high-speed low-complexity modified radix- FFT processor for high rate WPAN applications," IEEE Trans. Very Large Scale Integr. (VLSI) Syst., Vol. 21, No. 1, pp. 187-191, Jan. 2013. - [7] C. Chung, S. Wang, C. Li, "Area-efficient architectural design of radix-4 pipeline fast Fourier transform processor," In Proc. Workshop Consum. Electron. Signal Process., Nov. 2004, pp. 1-4. - [8] E. J. Kim, M. H. Sunwoo, "High speed eight-parallel mixedradix FFT processor for OFDM systems," In Proc. IEEE Int. Symp. Circuits Syst., May 2011, pp. 1684–1687. - [9] Eun Ji Kum, Jea Hack Lee, Myuang Hoon Sunwoo, "Novel Shared Multiplier Scheduling Scheme for Area-Efficient FFT/IFFT Processors", IEEE Trans. Very Large Scale Integr. (VLSI) Syst., Vol. 23, No.9, pp. 1689–1699, sep. 2015. [10] Harpreet Singh Dhillon, AbhijitMitra, "A Digital Multiplier Architecture using Urdhva Tiryakbhyam Sutra of Vedic Mathematics", IIT-G, pp. 1-4, 2010. B. ANIL BABU is received his B.Tech (ECE) degree from Laki reddy balireddy college of engineering, Mylavaram, India in 2014. Currently he pursuing M.Tech (Embedded Systems) degree from Gudlavalleru Engineering College, Gudlavalleru, India .His research interests includes VLSI Architectures and DSP processor designs. V.SREE LAKSHMI is received her B.Tech (ECE) degree from Gudlavalleru Engineering College, Gudlavalleru, India and M.Tech degree from CVR engineering college, Hyderabad in the VLSI system design specialization. Currently she is working as Assistant Professor in Gudlavalleru Engineering College. She has a teaching experience from 2005 onwards. Her research interests are VLSI design of Systems and its applications.