# **IBM Research Report**

## Power-Efficient Decision-Feedback Equalizers for Multi-Gb/s CMOS Serial Links

John F. Bulzacchelli, Alexander V. Rylyakov, Daniel J. Friedman

IBM Research Division Thomas J. Watson Research Center P.O. Box 218 Yorktown Heights, NY 10598



Research Division Almaden - Austin - Beijing - Haifa - India - T. J. Watson - Tokyo - Zurich

LIMITED DISTRIBUTION NOTICE: This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). Copies may be requested from IBM T. J. Watson Research Center, P. O. Box 218, Yorktown Heights, NY 10598 USA (email: reports@us.ibm.com). Some reports are available on the internet at <a href="http://domino.watson.ibm.com/library/CyberDig.nsf/home">http://domino.watson.ibm.com/library/CyberDig.nsf/home</a>.

### Power-Efficient Decision-Feedback Equalizers for Multi-Gb/s CMOS Serial Links

John F. Bulzacchelli, Alexander V. Rylyakov, and Daniel J. Friedman

#### IBM Research Division, T. J. Watson Research Center, Yorktown Heights, NY 10598 USA

Abstract — A decision-feedback equalizer (DFE) can compensate for severe signal distortion due to limited channel bandwidth, but its typical power consumption is too high for some applications. This paper describes three CMOS DFEs which embody different design techniques for improved power efficiency. The first one, with two taps, uses a soft decision technique to reduce the critical path delay of the first feedback tap, so that the analog summers can be operated at low currents. This DFE consumes 4.8 mW at 6 Gb/s. The second one, with one tap, employs speculation to relax the critical timing. Speculation increases the number of parallel data paths, but the power dissipation of each path is kept low by using a single switched-capacitor circuit for both sampling and DFE summation. This DFE consumes 5.0 mW at 6 Gb/s. The third one, with two taps, also employs speculation. High power efficiency (9.3 mW at 7 Gb/s) is achieved by implementing the analog summers as resettable integrators.

*Index Terms* — CMOS, decision-feedback equalizers, high-speed I/O, low power, receivers, serial links.

#### I. INTRODUCTION

As serial link data rates approach and even surpass 10 Gb/s, sophisticated equalizers such as decision-feedback equalizers (DFEs) [1]-[4] are required to compensate for severe signal distortion due to limited channel bandwidth. The power consumption (e.g., 50 mW) of a high-speed DFE can be excessive, however, in some applications. For instance, in a high-end server, each processor chip may have thousands of inputs/outputs (I/O). To avoid having the I/O circuitry consume most of the system power budget, low power (< 10 mW) DFEs need to be developed. This paper presents three different DFE circuits with power dissipations between 4.8 and 9.3 mW. These designs will be discussed after a review of the equalization problem and a study of DFE architectures.

#### II. BACKGROUND

#### A. Channel Equalization

The bandwidth of an electrical channel may be reduced by several physical effects, including skin effect, dielectric loss, and reflections due to impedance discontinuities. In the time domain, limited bandwidth causes broadening of the transmitted pulses. Fig. 1 shows the response of a lossy channel to a single 1 bit at a data rate of 10 Gb/s. Because the pulse is broadened over several unit intervals (UI), it contributes to the received signal not only at the desired sample position (the main cursor) but also at sample positions corresponding to other bits in the data stream. The pre-cursors indicate how this pulse interferes with detection of the preceding bits, and the post-cursors indicate how this pulse interferes with the succeeding bits. If this intersymbol interference (ISI) is large enough, the eye diagram of the received data signal will be closed, and equalization will be required to recover the data bits.



Fig. 1. Response of lossy channel to single 1 bit at 10 Gb/s. The solid dots show sample points at a spacing of 1 UI = 100 ps.

#### **B.** Basic DFE Architectures

An effective method of equalizing a high-loss channel is to implement a DFE in the receiver. Unlike a linear equalizer such as a peaking amplifier, a DFE is able to compensate for ISI without amplifying noise or crosstalk [1]. A DFE, whose block diagram is shown in Fig. 2, operates by canceling the post-cursor ISI from previous bits. To accomplish this, the previous bits are fed back with weighted tap coefficients (H1, ..., HN) and added to the data input with an analog summing amplifier (summer). If the tap weights are adjusted to match the channel characteristics, post-cursor ISI is canceled.

A key challenge in the design of a DFE is ensuring that the feedback signals are accurately established at the slicer input before the next bit decision is made. The dashed gray line in Fig. 2 indicates the critical timing path for the DFE, whose loop delay must be less than 1 UI. Meeting this timing constraint becomes difficult at data rates close to 10 Gb/s. The timing constraint on the first (H1) feedback tap can be relaxed by adopting a technique known as speculation or loop unrolling [5]. Fig. 3 illustrates how speculation is used to implement the H1 tap in the half-rate DFE architecture of [1] and [4]. In each DFE half, both +H1 and –H1 are added as dc offsets to the data input, and both sums are sliced to binary values. Once the previous data bit is known, a MUX selects the slicer output corresponding to the correct polarity of H1 feedback. With the H1 tap realized by speculation, the H2 feedback loop becomes the new critical timing path, whose delay must be less than 2 UI.



Fig. 2. Block diagram of DFE. Dashed gray line shows critical timing path.



Fig. 3. Half-rate DFE architecture with speculative first (H1) tap. Dashed gray line shows new critical timing path.

A 5-tap DFE with the architecture of Fig. 3 has been shown to operate reliably at 10 Gb/s when implemented in high-power current-mode logic [4]. This type of "heavy duty" equalizer establishes a benchmark against which the low-power DFEs can be compared. Fig. 4 presents the input and output eye diagrams of such a DFE equalizing a 16" Tyco backplane channel at 10 Gb/s. The loss of this channel at half-baud frequency (5 GHz) is 20 dB. The input eye diagram (Fig. 4(a)) is completely closed, yet the DFE is able to recover the data (Fig. 4(b)) with a bit-error rate (BER) <  $10^{-13}$ . This particular 5-tap DFE is essentially the design of [4], mapped from 90-nm to 65-nm

technology. The performance of this equalizer, as well as the low-power DFEs, is summarized in Table I. Note the high power dissipation (50.7 mW).



Fig. 4. Equalization of 16" Tyco backplane channel at 10 Gb/s with "heavy duty" 5-tap DFE. (a) Receiver input eye diagram. (b) Recovered half-rate (5 Gb/s) output data.

#### **III. LOW-POWER DFE DESIGNS**

#### A. DFE Using Soft Decisions

A direct (non-speculative) DFE has fewer high-speed components than a speculative DFE. This makes a direct DFE more area-efficient. It could also make it more power-efficient, provided the savings accrued from a lower component count are not offset by extra power needed to close the H1 timing loop. The design described here introduces a soft decision technique [6] which reduces H1 loop delay without raising power dissipation.

Fig. 5 shows a block diagram of a quarter-rate 2-tap DFE using soft decisions. Ck1-Ck4 are quarter-rate clocks offset by 1 UI from each other. In each path, input data is sampled by a sample-and-hold (S/H) and added in a weighted fashion to the two previous bits. Because the slicer in each path is a simple latch, not an edge-triggered master-slave, the H1 feedback signal is provided to the summer not only after the previous bit has been latched (as occurs in a hard decision approach), but even (to a partial degree) while the previous bit decision is being computed. The benefit of this approach can be seen in the timing diagrams of Fig. 6. In a hard decision design, the slicer output S1 would not change prior to the falling edge of Ck2 and could not affect the summer output A2 until that time, resulting in long summer settling time,  $\Delta t_{hard}$ . In contrast, the soft decision design makes the evolving output S1 available to the summer prior to the falling edge of Ck2, resulting in a shorter settling time,  $\Delta t_{soft}$ .

Fig. 7 shows the schematics of the S/H and summer used in this DFE design. Because the settling time of the summer is reduced by soft decisions, the load resistors which convert the summed currents into voltages can have relatively high values (3 k $\Omega$ ), so the current levels (and power dissipation) are low.

| DFE PERFORMANCE COMPARISON       |             |                             |                                             |                                                    |                                             |
|----------------------------------|-------------|-----------------------------|---------------------------------------------|----------------------------------------------------|---------------------------------------------|
|                                  |             | Heavy-<br>duty 5-tap<br>DFE | 2-tap DFE<br>using soft<br>decisions<br>[6] | 1-tap DFE,<br>switched-<br>capacitor<br>summer [7] | Current-<br>integrating<br>2-tap DFE<br>[8] |
| CMOS Technology                  |             | 65-nm                       | 90-nm                                       | 90-nm                                              | 90-nm                                       |
| Supply voltage (V)               |             | 1.2, 1.0                    | 1.0                                         | 1.0                                                | 1.0                                         |
| Data<br>Rate<br>prbs7            | Short cable | 13.5 Gb/s                   | 10 Gb/s                                     | 10 Gb/s                                            | 8 Gb/s                                      |
|                                  | 16" Tyco    | 10 Gb/s                     |                                             | 6 Gb/s                                             | 7 Gb/s                                      |
|                                  | 10' Cable   |                             | 6 Gb/s                                      |                                                    |                                             |
| Input sensitivity                |             | 80mVppd<br>@ 10 Gb/s        | 20mVppd<br>@ 6 Gb/s                         | 80mVppd<br>@ 10 Gb/s                               | 61mVppd<br>@ 7 Gb/s                         |
| Power consumption                |             | 50.7 mW<br>@ 10 Gb/s        | 4.8 mW<br>@ 6 Gb/s                          | 5.0 mW @<br>6 Gb/s                                 | 9.3 mW<br>@ 7 Gb/s                          |
| DFE core area (µm <sup>2</sup> ) |             | 90 x 130                    | 45 x 98                                     | 70 x 150                                           | 65 x 85                                     |

TABLE I DFE Performance Comparison



Fig. 5. Quarter-rate sampling receiver with 2-tap DFE using soft decisions.



Fig. 6. Timing diagrams illustrating benefit of soft decisions.

The soft decision DFE was fabricated in 90-nm CMOS technology. Over a short cable (low ISI channel), the receiver operated error-free at 10 Gb/s. The DFE showed modest equalization capabilities. It recovered data at 6 Gb/s with BER <  $10^{-12}$  over 10 feet of cable (6.5 dB loss

at 3 GHz) while dissipating 4.8 mW of power. However, the lack of proper input termination and insufficient range on the tap weights did not allow equalization of the Tyco channel. These implementation problems are not inherent to the DFE design and can be easily corrected.



Fig. 7. Sample-and-hold and summer schematics.

#### B. DFE With Switched-Capacitor Summation

While the soft decision technique speeds up the response of the summer to the H1 feedback signal, higher speed is still achieved with speculation, which eliminates the need for settling of the H1 feedback tap altogether. The use of parallel data paths in speculation can significantly increase power dissipation, unless each path is very power-efficient. A power-hungry circuit in each path is the analog summer. The next two sections discuss design techniques for realizing low-power summers.

An alternative to a current-mode summer like that shown in Fig. 7 is the charge/voltage-mode summer [7] shown in Fig. 8 (for simplicity, only the half-circuit of the differential structure is drawn). This switched capacitor circuit operates as a sampler-and-adder. The switches S1, S1d, and S1B are turned on with clock phases Ck1, Ck1d, and Ck1B, respectively. During the sampling phase, both S1 and S1d are on, so the voltage stored across C<sub>s</sub> is V<sub>in</sub>- $V_{CM}$ , where  $V_{CM}$  is a common-mode voltage. During the hold/equalize phase, S1B is turned on, and Vout becomes  $2V_{CM}$  +  $\alpha V_{ref}$  -  $V_{in}$ . The  $\alpha V_{ref}$  term represents the DFE feedback added to the data input and can be adjusted to match channel characteristics. Unlike a current-mode summer, the switched-capacitor summer requires no dc bias current. Dynamic power is consumed in clocking the switches, which are implemented as small PMOS passgates.

To evaluate the switched-capacitor summer, a 1-tap speculative DFE was designed and fabricated in 90-nm CMOS technology. The quarter-rate architecture of the DFE is fairly conventional and not shown here. (For such details, refer to [7].) This DFE was able to equalize the 16" Tyco channel (11 dB loss at 3 GHz) at 6 Gb/s with

PRBS-7 data and BER <  $10^{-12}$ . The power dissipation at 6 Gb/s was 5.0 mW.



Fig. 8. Schematic and clocking of switched-capacitor samplerand-adder.

#### C. Current-Integrating DFE

A low-power summer [8] can be realized by replacing the load resistors of Fig. 7 with resettable capacitors to form a current integrator (Fig. 9). Highest power efficiency is obtained if the output capacitors are just the parasitics [9]. At the beginning of the integration period, the capacitor voltages have been reset to the power supply. During the integration period (one UI long), charge is integrated on the capacitors and represents the sum of the currents from the input stage and the current switches for the DFE feedback. At the end of the integration period, the capacitor voltages are sampled by a slicer and then reset by PMOS switches. Since charge is integrated onto parasitic capacitances only, large signals can be generated with small currents due to the large I/C ratio.



Fig. 9. Current-integrating summer.

A 2-tap DFE employing current-integrating summers was designed and fabricated in 90-nm CMOS technology. Its half-rate architecture is similar to that shown in Fig. 3, except that both H1 and H2 are added in the same summing stage. This DFE was the most effective lowpower design at equalizing the Tyco channel (15 dB loss at 3.5 GHz), achieving a BER <  $10^{-13}$  with 7 Gb/s PRBS-7 data. The power dissipation at 7 Gb/s was 9.3 mW, of which less than 0.7 mW was consumed in the summers.

#### IV. CONCLUSION

This paper has described a number of architectural and circuit techniques for reducing the power consumption of DFEs, to below 10 mW in the 6-10 Gb/s range studied here. Such approaches will be necessary for the realization of future systems requiring high-density, high-speed I/O.

#### ACKNOWLEDGEMENT

The authors wish to thank K.-L. Wong, A. Varzaghani, A. Emami-Neyestanak, and M. Park for past technical contributions. This work was supported by MPO contract H98230-04-C-0920.

#### REFERENCES

- T. Beukema *et al.*, "A 6.4-Gb/s CMOS SerDes core with feed-forward and decision-feedback equalization," *IEEE J. Solid-State Circuits*, vol. 40, no. 12, pp. 2633-2645, Dec. 2005.
- [2] R. Payne *et al.*, "A 6.25-Gb/s binary transceiver in 0.13-μm CMOS for serial data transmission across high loss legacy backplane channels," *IEEE J. Solid-State Circuits*, vol. 40, no. 12, pp. 2646-2657, Dec. 2005.
- [3] K. Krishna *et al.*, "A multigigabit backplane transceiver core in 0.13-μm CMOS with a power-efficient equalization architecture," *IEEE J. Solid-State Circuits*, vol. 40, no. 12, pp. 2658-2666, Dec. 2005.
- [4] J. F. Bulzacchelli *et al.*, "A 10-Gb/s 5-tap DFE/4-tap FFE transceiver in 90-nm CMOS technology," *IEEE J. Solid-State Circuits*, vol. 41, no. 12, pp. 2885-2900, Dec. 2006.
- [5] S. Kasturia and J. H. Winters, "Techniques for high-speed implementation of nonlinear cancellation," *IEEE J. Sel. Areas Commun.*, vol. 9, no. 5, pp. 711-717, June 1991.
- [6] K.-L. Wong, A. Rylyakov, and C.-K. Yang, "A 5-mW 6-Gb/s quarter-rate sampling receiver with a 2-tap DFE using soft decisions," *IEEE Symp. VLSI Circuits Dig. Tech. Papers*, pp. 190-191, June 2006.
- [7] A. Emami-Neyestanak *et al.*, "A low-power receiver with switched-capacitor summation DFE," *IEEE Symp. VLSI Circuits Dig. Tech. Papers*, pp. 192-193, June 2006.
- [8] M. Park, J. Bulzacchelli, M. Beakes, and D. Friedman, "A 7Gb/s 9.3mW 2-tap current-integrating DFE receiver," *IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers*, pp. 230-231, Feb. 2007.
- [9] S. Sidiropoulos and M. Horowitz, "A 700-Mb/s/pin CMOS signaling interface using current integrating receivers," *IEEE J. Solid-State Circuits*, vol. 32, no. 5, pp. 681-690, May 1997.