# **IBM Research Report**

## A 119mW 11.1Gb/s 5-Tap DFE Receiver with Digitally Calibrated Current-Integrating Summers in 65nm CMOS

John F. Bulzacchelli, Timothy O. Dickson, Zeynep Toprak Deniz, Herschel A. Ainspan, Benjamin D. Parker, Michael P. Beakes, Sergey V. Rylov, Daniel J. Friedman

> IBM Research Division Thomas J. Watson Research Center P.O. Box 218 Yorktown Heights, NY 10598



Research Division Almaden - Austin - Beijing - Cambridge - Haifa - India - T. J. Watson - Tokyo - Zurich

LIMITED DISTRIBUTION NOTICE: This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). Copies may be requested from IBM T. J. Watson Research Center, P. O. Box 218, Yorktown Heights, NY 10598 USA (email: reports@us.ibm.com). Some reports are available on the internet at <a href="http://domino.watson.ibm.com/library/CyberDig.nsf/home">http://domino.watson.ibm.com/library/CyberDig.nsf/home</a>.

## A 119mW 11.1Gb/s 5-Tap DFE Receiver with Digitally

## **Calibrated Current – Integrating Summers in 65nm CMOS**

John F. Bulzacchelli, Timothy O. Dickson, Zeynep Toprak Deniz, Herschel A. Ainspan, Benjamin D. Parker, Michael P. Beakes, Sergey V. Rylov, and Daniel J. Friedman

IBM T. J. Watson Research Center, Yorktown Heights, NY

### ABSTRACT

A 65nm CMOS 5-tap DFE receiver employs half-rate S/Hs and current-integrating summers to achieve 11.1Gb/s operation while dissipating 119mW. RX logic calibrates the summer bias currents to stabilize their performance over process variations and different data rates. Equalization of a 30" PCB trace and a 16" Tyco backplane is demonstrated at 11.1Gb/s and 10Gb/s, respectively.

Extending data rates to meet the I/O needs of future computing and network systems is complicated by limited channel bandwidth. While a DFE [1] can be used to compensate channel distortion, its power dissipation reduces link energy efficiency, which is vitally important in complex systems. One way of reducing DFE power consumption is to use current-integrating summers [2-5]. Previously published current-integrating DFEs operating above 5Gb/s [3, 5] were demonstrated on simple test chips lacking support circuitry for CDR and DFE adaptation functions. The architecture presented here includes additional data paths based on current-integrating summers to realize a fully integrated RX with CDR and continuous DFE adaptation. The design also features a digital calibration loop for setting the summer bias currents so that high performance is achieved over process variations and different data rates.

The top-level RX architecture is shown in Fig. 1. The input data path is similar to that described in [1], except that a peaking amplifier is added to provide linear equalization which complements the operation of the DFE. The peaking amplifier, which uses a zero-peaked topology with switched capacitive degeneration, has 8 peaking settings, with a nominal range of 0-6 dB at half-baud frequency. The DFE employs a half-rate architecture; the tap weights for even and odd halves are adapted independently to improve tolerance against duty-cycle distortion [6]. Half-rate clocks (C2) are generated by CML phase interpolators. Converting these clocks to CMOS rail-to-rail signals saves power in their distribution to the DFE, phase detector, and 2:8 DEMUX. In addition to the clocks ( $Clk_D$  and  $Clk_E$ ) used to sample the center and edges of the bits, a third clock ( $Clk_A$ ) is generated which can be independently swept to monitor horizontal eye opening. An integrator calibration circuit provides operating point information to the integrator calibration logic which sets the integrator bias currents. The RX is powered from external analog (VDDA, nominally 1.2V) and digital (VDD, nominally 1.0V) supplies. A third supply VREG (nominally

1.0V) is generated from VDDA by a linear regulator with less than 40mVpp ripple; this supply is used to power noise-critical circuits, such as the CMOS C2 buffers within the DFE.

The block diagram of a DFE half is shown in Fig. 2. The first DFE tap (H1) is realized by twopath speculation. This costs more area and power than a direct architecture [5] but ensures that all DFE feedback signals are fully established at the beginning of integration, improving summation accuracy. The two most critical DFE timing paths are the H2 feedback loop and the MUX select path. To meet timing constraints, these paths are realized in CML. A DCVS latch is used to convert CML to CMOS levels, so that the later tap (H3-H5) circuitry can be implemented in static CMOS to save power and area. The CML circuits are powered from VDDA; the DCVS and static CMOS circuits are powered from VREG. The path used for eye monitoring is clocked by Clk<sub>A</sub>. Each current-integrating summer includes a passgate S/H so that the sampled input is held constant during the integration period, thereby avoiding the 3.9dB loss penalty incurred if the changing input signal were directly integrated [5]. For highest sampling bandwidth, the passgate S/H uses low-Vt thin-oxide devices. Because VDDA (1.3V maximum) exceeds the rated gate-to-source voltage of such devices, a clock level-shifter is used to translate the high and low clock levels to VDDA and VDDA- $\alpha$ VREG, where  $\alpha$  is determined by the capacitive divider of the ac-coupling. The ac-coupling capacitors are selected so that  $\alpha$  exceeds 0.9. The levelshifted clock also drives the integrator reset switches.

Fig. 3(a) illustrates how input data is sampled and integrated within the DFE. The CDR adjusts the phase of  $Clk_D$  so that the S/H captures the voltage at eye center. This held voltage is integrated for 1UI, and then the polarity of the integrator output is sensed by the decision-making

latch. The same S/H and current-integrating summer circuits are used in the Alexander-type phase detector, which employs a half-rate structure to sample data transitions on both rising and falling edges of  $Clk_E$  (Fig. 3(b)). When the edges of  $Clk_E$  are aligned with the data transitions, the S/H samples zero voltage, and no signal is integrated. When the clock is misaligned, the S/H samples a nonzero voltage, and the polarity of the integrated output indicates an early or late signal for the CDR.

To maintain the gain and linearity of the current-integrating summers, their output commonmode at the end of integration must be maintained within an acceptable operating range. Because the I/C ratio is process-dependent, and the integration period depends on bit rate, active calibration is needed to achieve a robust system. Fig. 4 depicts the digital calibration loop activated during power-up in this design. The calibration circuit [3] consists of a replica integrating summer matched to those in the data path, with differential inputs tied to a commonmode voltage. A latch (L1) is used as a clocked comparator to check whether the output common-mode at the end of integration is above or below  $V_{REF}$  (300 mV below VDDA). The latch output is averaged 16 times by the calibration logic before deciding to increase or decrease integrator bias currents with a 4b IDAC. A binary search algorithm is used to find the optimum CALDAC value. The calibrated integrator bias also scales the DFE tap weight currents, as  $I_{OUT}$  is used as the reference current for the DFE IDACs.

To evaluate its performance, the RX was integrated with TX and PLL macros similar to those described in [1] to form a two-port I/O core (Fig. 7), which was fabricated in 65nm bulk CMOS technology. The area of one RX is 0.22mm<sup>2</sup>. The test chip was packaged in a plastic BGA module mounted on a socketed evaluation board. With clean input data, RX input sensitivity is

38mVppd at 8.5Gb/s and 46mVppd at 12Gb/s. Equalization of lossy channels was demonstrated by using a 3-tap FFE in the TX as well as the 5-tap DFE in the RX. Over a 30" PCB trace with 16dB of loss at 5.55GHz, the RX recovers error-free (BER<10<sup>-15</sup>) 11.1Gb/s PRBS31 data with a horizontal eye opening of 29.6%. At 11.1Gb/s, the RX consumes 119.3mW. Over a 16" Tyco backplane, which has 28dB of loss at 5GHz and completely distorts a 10Gb/s data eye (Fig. 5(a) and (b)), the RX recovers error-free (BER<10<sup>-15</sup>) 10Gb/s PRBS7 data with a horizontal eye opening of 18.8% (Fig. 5(c)). For comparison, the 5-tap DFE RX of [1] achieved a 22% horizontal eye opening when equalizing 10Gb/s data over this same channel, but its analog circuitry dissipated 55% more power. Other performance data are given in Fig. 6.

#### Acknowledgments:

We gratefully acknowledge partial support of this work through MPO contract #H98230-07-C-0409. The authors also thank M. Meghelli, T. Beukema, W. Kelly, V. R. Norman, P. Metty, R. Tompkins, J. Garlett, G. Ritter, K. Guay, J. Eastman, D. Hafer, Y. M. Huang, H. Park, K. Heilmann, and G. Froese for help and advice, and S. Gowda and M. Soyuer for managerial support.

#### References:

[1] J. F. Bulzacchelli et al., "A 10-Gb/s 5-Tap DFE/4-Tap FFE Transceiver in 90-nm CMOS Technology," *IEEE J. Solid-State Circuits*, vol. 41, pp. 2885-2900, Dec. 2006.

[2] S. Bae et al., "A 2Gb/s 2-Tap DFE Receiver for Mult-Drop Single-Ended Signaling Systems with Reduced Noise," *ISSCC Dig. Tech. Papers*, pp. 244-245, Feb. 2004.

[3] M. Park et al., "A 7Gb/s 9.3mW 2-Tap Current-Integrating DFE Receiver," *ISSCC Dig. Tech. Papers*, pp. 230-231, Feb. 2007. [4] H.-J. Chi et al., "A 3.2Gb/s 8b Single-Ended Integrating DFE RX for 2-Drop DRAM Interface with Internal Reference Voltage and Digital Calibration," *ISSCC Dig. Tech. Papers*, pp. 112-113, Feb. 2008.

[5] T. Dickson et al., "A 12-Gb/s 11-mW Half-Rate Sampled 5-Tap Decision Feedback Equalizer with Current-Integrating Summers in 45-nm SOI CMOS Technology," *Dig. Symp. VLSI Circuits*, pp. 58-59, June 2008.

[6] J. Zerbe, "High-Performance Wireline Equalization: Issues, Designs, and Tradeoffs," *ISSCC* 2008 ATAC Design Forum F5: Future of High-Speed Transceivers, Feb. 2008.



Figure 1: Receiver architecture.



Figure 2: Block diagram of the even DFE half with insets showing circuit details of clock levelshifter (CLK\_LS) and current-integrating summer.



Figure 3: Timing diagrams illustrating use of S/Hs and current-integrating summers inside (a) DFE halves and (b) phase detector block. The S/H and integrator outputs are differential signals.



Figure 4: Integrator calibration with digital control loop.



Figure 5: (a) Frequency response of 16 inch Tyco backplane and package, (b) 10Gb/s PRBS7 eye diagram after channel, and (c) equalized bathtub curve.

| Technology                                         |               | IBM 65nm Bulk CMOS                         |
|----------------------------------------------------|---------------|--------------------------------------------|
| DFE Architecture                                   |               | 5-Tap Half-Rate with<br>1 Speculative Tap  |
| RX Area (with Shared Logic Amortized)              |               | 0.22mm <sup>2</sup>                        |
| Supply Voltages                                    |               | 1.2V (VDDA), 1.0V (VDD)                    |
| Sensitivity<br>@ BER=10 <sup>-8</sup>              | 8.5Gb/s       | 38mVppd                                    |
|                                                    | 12Gb/s        | 46mVppd                                    |
| Horizontal Eye Opening<br>@ BER=10 <sup>-9</sup> , | Тусо 16"      | 44.0% (8.5Gb/s PRBS7)                      |
|                                                    | Backplane     | 18.8% (10Gb/s PRBS7)                       |
| 3-Tap FFE + 5-Tap DFE                              | 30" PCB Trace | 29.6% (11.1Gb/s PRBS31)                    |
| Power Dissipation                                  | 8.5Gb/s       | 105.0mW (66.1mW on VDDA, 38.9mW on VDD)    |
|                                                    | 11.1Gb/s      | 119.3mW (72.5mW on VDDA,<br>46.8mW on VDD) |

Figure 6: Receiver performance summary.



Figure 7: Micrograph of the two-port I/O core used to evaluate receiver performance.