# **IBM Research Report**

# 10+ Gb/s 90nm CMOS Serial Link Demo in CBGA package

Sergey Rylov<sup>1</sup>, Scott Reynolds<sup>1</sup>, Daniel Storaska<sup>2</sup>, Brian Floyd<sup>1</sup>, Mohit Kapur<sup>1</sup>, Thomas Zwick<sup>1,3</sup>, Sudhir Gowda<sup>1</sup>, Michael Sorna<sup>2</sup>

> <sup>1</sup>IBM Research Division Thomas J. Watson Research Center P.O. Box 218 Yorktown Heights, NY 10598

<sup>2</sup>IBM Microelectronics Division Hopewell Junction, NY

<sup>3</sup>currently with Siemens VDO, Weissensberg, Germany



Research Division Almaden - Austin - Beijing - Haifa - India - T. J. Watson - Tokyo - Zurich

LIMITED DISTRIBUTION NOTICE: This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. Ithas been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publicher, its distributionoutside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g. payment of royalties). Copies may be requested from IBM T. J. Watson Research Center, P. O. Box 218, Yorktown Heights, NY 10598 USA (email: reports@us.ibm.com). Some reports are available on the internet at <a href="http://domino.watson.ibm.com/library/CyberDig.nsf/home">http://domino.watson.ibm.com/library/CyberDig.nsf/home</a>

# 10+ Gb/s 90nm CMOS Serial Link Demo in CBGA package

Sergey Rylov<sup>1</sup>, Scott Reynolds<sup>1</sup>, Daniel Storaska<sup>2</sup>, Brian Floyd<sup>1</sup>, Mohit Kapur<sup>1</sup>, Thomas Zwick<sup>1,3</sup>, Sudhir Gowda<sup>1</sup>, and Michael Sorna<sup>2</sup>,

<sup>1</sup>IBM T.J. Watson Research Center, Yorktown Heights, NY, USA <sup>2</sup>IBM Microelectronics Division, Hopewell Junction, NY, USA <sup>3</sup>currently with Siemens VDO, Weissensberg, Germany

# Abstract

We report a 10+ Gb/s serial link demo chip with NRZ signaling in 90-nm CMOS. It consists of a full-rate 4:1 MUX with 8-tap feed-forward equalizer, a half-rate 1:4 DEMUX with programmable peaking pre-amplifier, and a parallel port interface. All coefficients of the 8-tap FIR filter have programmable polarity and magnitude. The chip is housed in CBGA package and has ESD protection devices on all pins. All clock signals are supplied externally. The measured maximum speeds of stand-alone transmitter and receiver are 11.7 Gb/s and 13.3 Gb/s respectively, and maximum back-to-back operation speed (transmitter+receiver) is 11.4 Gb/s. The chip operates at 10 Gb/s over 20 ft of lossy cable with 20 dB attenuation at 5 GHz. All circuits in the chip use a single 1.0V power supply, except TX output driver and RX input termination network, which use 1.4V supply. Total power consumption of TX and RX from the two supplies is 280 mW.

# Introduction

The primary motivation for this work was to demonstrate feasibility of an integrated and packaged 10+ Gb/s serial link in 90-nm CMOS technology using NRZ signaling. The intended application for such links is high-density serial I/O for advanced ASICs and microprocessors operating over short and medium distances (on-board and board-to-board). High channel losses and reflections common for electrical serial links operating at 10+ Gb/s require channel equalization, which is most commonly done using transmitter feed-forward equalizer (FFE), receiver decision-feedback equalizer (DFE), and/or receiver peaking pre-amplifier (1-3). The chip presented in this work equalizes the channel using a feed-forward equalizer or FFE in the transmitter and a programmable peaking amplifier in the receiver.

# Architecture

The serial link chip consists of three individual macros: transmitter TX, receiver RX, and a PC Enhanced Parallel Port (EPP) interface. Figure 1 shows the block diagram of the transmitter, which consists of a 4:1 multiplexer (MUX) and an 8-tap finite impulse response (FIR) filter. The 4:1 MUX uses conventional tree architecture and consists of three 2:1 multiplexers, two operating at a quarter-rate clock C4 and one operating at half-rate clock C2. The clock signals are produced on-chip by dividing the full-rate clock C1 (supplied externally) using 1:2 static dividers. An additional 1:8 divider generates a low-speed clock C32 (1/32 of full rate) that is used in EPP interface macro.

The output of the MUX is applied to an 8-stage shift register forming the tapped delay line of the FIR filter. The shift register operates on full-rate clock C1 and consists of transparent latches with even and odd stages using opposite clock polarity, so each stage creates a delay of a half clock period. The eight delayed copies of the MUX output are applied to XOR gates along with the respective polarity bits of the tap coefficients P0-P7 and then multiplied by eight FIR filter coefficients using CML buffer stages with programmable tail currents. The outputs of these stages are summed together on a common resistive load yielding the final output of the FIR filter.

Figure 2 shows the block diagram of the receiver, consisting of a programmable peaking amplifier followed by 1:4 demultiplexer (DEMUX) and four single-ended output drivers. The peaking amplifier consists of two CML buffers, peaking and non-peaking, connected in parallel and using programmable

tail currents to control the relative amount of peaking. The peaking CML buffer uses split tail with RC shunt and operates as a high-pass filter. The shunt capacitor is realized as a switchable bank; this allows one to program the high-pass frequency boundary in a range of 0.5 to 6 GHz. The common resistive load of the peaking amplifier has a spiral inductor for bandwidth extension, and yields high frequency bandwidth of 10 GHz. The nominal gain of the amplifier is 8. DEMUX section uses conventional tree topology, and consists of three 1:2 demultiplexers, one operating at a half-rate clock C2 (supplied externally) and two operating at a quarter-rate clock C4 (generated locally with a 1:2 static divider).

#### Implementation

Both TX and RX macros of the serial link chip were implemented with CML circuits in a triple-well 90nm CMOS process (4). All CML gates use conventional topology with polysilicon resistor loads (5). The drivers for single-ended quarter-rate RX outputs are implemented as pairs of complementary source followers, with one of the outputs terminated to 50 Ohms locally on-chip, while the other one is connected to the chip pad. This approach allows one to reduce variations in current drawn from the power supply. The parallel port macro is designed using standard static CMOS logic. All CML circuits of the transmitter and receiver as well as static CMOS circuits of the parallel port macro are powered from a single power supply VDD (1.0V nominal) There are two exceptions: the TX output stage and corresponding RX input termination network use higher supply VTT (1.4V nominal), and EPP interface I/O circuits use a 2.5V supply.

The layout of the serial link chip is shown in Figure 3, with TX and RX macros located in the bottom left corner, EPP macro located in the center, and EPP I/O circuitry located in the upper right corner. All signal pins have ESD protection. The chip size is 5x5 mm. It is packaged in a custom high-speed 728-ball CBGA package with coaxial vias which was developed for 6 Gb/s applications

### Test setup

Figure 4 shows a block diagram of the test setup. The chip with its three macros, Tx, Rx and EPP is shown in the center. When tested as a complete link, the transmitter side is driven by a pattern generator providing four quarter-rate inputs, the output of the transmitter is then connected to the receiver through a physical link, and finally one of the quarter rate receiver outputs is connected to the bit error rate tester. Alternatively one can test the transmitter and receiver independently, by connecting the transmitter output to the BERT, or by feeding the receiver directly from the full-rate output of the pattern generator, as shown on the diagram.

All external clocks are derived from a single full-rate clock source, as shown in Fig. 4. Use of external clock source allows one to test the chip in a wide range of clock frequencies. All settings of the transmitter and receiver are controlled through parallel port interface by a PC running LabView software. The same software is used to control the test instruments via GPIB interface. The current consumed by CML circuitry on chip is determined by three reference currents which control the transmitter logic, transmitter driver and receiver logic respectively. Although all on-chip logic uses a common 1 V power supply, one can measure individual power consumption of different parts of the chip switching these currents on and off.

The cable connection between the transmitter and receiver in a back-to-back test is differential and consists of 3 ft (0.9m) of 20-GHz coaxial cable connected to the TX output, same length of 20-GHz cable connected to the RX input, and a bandwidth-limited physical channel in the middle. Besides the cables the link path also contains chip pads with ESD protection, CBGA package (which was optimized for lower data rate of 6 Gb/s), the chip socket and the test board with connectors.

# **Test results**

Figure 5 shows the output eye diagram of the stand-alone transmitter operating at 11.7 Gb/s with bit error rate below  $10^{-13}$ . This eye diagram is taken at the end of a 3 ft of 20 GHz cable. The transmitter does use equalization to compensate for channel losses related to ESD devices and the package, which would otherwise make this eye nearly closed. In the tests of a stand-alone receiver we observed correct operation up to 13.3 Gb/s. In a back-to-back test (TX+RX) over short link (6 ft of high-bandwidth cable) the chip runs up to 11.4 Gb/s, which is close to 11.7 Gb/s speed of a stand-alone transmitter.

We have successfully tested the link at 10 Gb/s over bandwidth-limited channels using 10 and 20 feet of thin coaxial cable with 1dB/ft attenuation at 5 GHz. In both cases, in the absence of transmitter equalization, the eye diagram at the end of the channel is completely closed. Transmitter equalization opens the eye, enabling the link to operate with a bit error rate better than  $10^{-13}$ . We also successfully operated the link at 8+ Gb/s over 16" and 30" of Tyco HM-Zd XAUI test backplane, which nominally supports only 3.125 Gb/s operation. Figure 6 shows eye diagrams of optimally equalized transmitter at the far end of the 16" channel at 8 Gb/s and 10 Gb/s. The measured error rate in these two cases is better than  $10^{-13}$  and  $10^{-12}$ , respectively. At 8 Gb/s the link also operated over 30" of backplane length with error rate below  $10^{-13}$ .

We tested a total of 11 chips, which passed prescreening for basic parallel port functionality, and all of them were found functional at high speed. Specifically, they all would run a back-to-back test with 6 feet of 20 GHz cable, and 5 out of 11 would run this test at 11+ Gb/s, with a maximum achieved speed of 11.4 Gb/s. The fabrication yielded nominal FETs, but the resistors were 33% high, meaning proportionally lower bandwidth of CML circuits. Measured nominal power consumption of TX and RX was 182 mW and 98 mW respectively for a total of 280 mWsr, including quarter-rate I/O circuits (43 mW). Table I

provides the summary of chip performance.

| Technology                             | 90 nm CMOS, triple-well          |
|----------------------------------------|----------------------------------|
| Power supplies                         | 1.0V and 1.4V - core             |
|                                        | 2.5V - EPP I/O                   |
| Chip area                              | 5x5 mm <sup>2</sup>              |
| Core area                              | 700x450 μm <sup>2</sup> TX       |
|                                        | 550x450µm <sup>2</sup> RX        |
|                                        | $320x90 \ \mu m^2 EPP$ interface |
| Chip package                           | 728-ball CBGA                    |
| Maximum speed at 10 <sup>-13</sup> BER | 11.7 Gb/s TX                     |
|                                        | 13.3 Gb/s RX                     |
|                                        | 11.4 Gb/s TX+RX                  |
| Total power TX+RX                      | 280 mW                           |

TABLE I CHIP SUMMARY

# Conclusion

We have demonstrated a 10+ Gb/s serial link demo chip in 90 nm CMOS consisting of a transmitter with 8-tap FFE (TX), a receiver with programmable peaking amplifier (RX) and a parallel port interface. The chip was packaged, had ESD protection devices on all pins, and was tested on a PCB card. We have demonstrated TX operation up to 11.7 Gb/s and RX operation up to 13.3 Gb/s. We have demonstrated back-to-back link operation with external clocking at 11.4Gb/s over 6 ft of high-bandwidth cable (20GHz) with BER<10<sup>-13</sup>. At 10Gb/s the same link operated over 20 ft of lossy cable (20 dB loss at 5GHz) with BER<10<sup>-13</sup>, and over 16 inches of Tyco XAUI backplane with BER<10<sup>-12</sup>.

# Acknowledgement

The authors thank Alexander Rylyakov, Herschel Ainspan, and Jose Tierno for help with circuit design, Michael Beakes, Donald Beisser, and Russell Rose for physical design support, Phillip Metty and Kevin Kramer for PCB design and packaging support, and Daniel Friedman, Mehmet Soyuer and Keith Heilmann for management support. This work was funded in part by NSA contract H98230-04-C-0920.

## References

- J. Jaussi et al, "An 8Gb/s binary source-synchronous I/O link with adaptive receiver-equalization, offset cancellation and clock deskew is implemented in 0.13μm CMOS", ISSCC Digest of technical papers, pp. 246-247, 2004.
- (2) J. Zerbe et al, "Equalization and Clock Recovery for a 2.5 10Gb/s2-PAM/4-PAM Backplane Transceiver Cell", ISSCC Digest of technical papers, pp. 80-81, 2003.
- (3) M.L. Schmatz, "*High-speed and high-density chip-to-chip interconnections: trends and techniques*", IEEE Conference on Electrical Performance of Electronic Packaging, pp. 23-24, 2000
- (4) "Foundry Technologies 90-nm CMOS", IBM Microelectronics Product Brief, http://www.ibm.com/chips/services/foundry/
- (5) A. Rylyakov, S. Rylov, H. Ainspan and S. Gowda, "A 30-Gb/s 1:4 Demultiplexer in 0.12 μm CMOS", ISSCC Digest of technical papers, pp. 176-177, 2003.



Figure 1. Block diagram of the transmitter section using full-rate half-baud spaced 8-tap FIR filter. All high-speed signal interconnects are differential (CML), except for four data inputs, which are single-ended.



Figure 2. Block diagram of the receiver section. All high-speed signal interconnects are differential (CML). The driver outputs are single-ended



Figure 3. Layout of the serial link demo chip, consisting of TX and RX sections and Enhanced Parallel Port interface



C1/C4 ref clock to BERT

Figure 4. Block diagram of the test setup for serial link evaluation. Multiple-position switches on the diagram indicate different ways to connect parts of the setup; the connection shown corresponds to a stand-alone TX test.



Figure 5. Differential output of TX operating at 11.7 Gb/s with  $BER < 10^{-13}$ , with equalization. Signal from TX driver travels to BERT through CBGA package, socket, test board and 3ft of 20 GHz cable.



Figure 6. TX output eye diagram after 16" Tyco XAUI backplane with two connectors at (a) 8 Gb/s (BER< $10^{-13}$ ) and (b) 10 Gb/s (BER< $10^{-12}$ )