# **IBM Research Report**

# Effect of Noise on Timing or Data-Pattern Dependent Delay Variation When Transmission-Line Effects Are Taken into Account for On-Chip Wiring

A. Deutsch, H. H. Smith<sup>1</sup>, C. Vakirtzis<sup>1</sup>, J. Kozhaya<sup>2</sup>, L. M. Greenberg<sup>1</sup>

IBM Research Division Thomas J. Watson Research Center P.O. Box 218 Yorktown Heights, NY 10598

<sup>1</sup>IBM Systems and Technology Group Poughkeepsie, NY

<sup>2</sup>IBM Systems and Technology Group Burlington, VT



Research Division Almaden - Austin - Beijing - Haifa - India - T. J. Watson - Tokyo - Zurich

LIMITED DISTRIBUTION NOTICE: This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). Copies may be requested from IBM T. J. Watson Research Center, P. O. Box 218, Yorktown Heights, NY 10598 USA (email: reports@us.ibm.com). Some reports are available on the internet at <a href="http://domino.watson.ibm.com/library/CyberDig.nsf/home">http://domino.watson.ibm.com/library/CyberDig.nsf/home</a>.

## Effect of Noise on Timing or Data-Pattern Dependent Delay Variation When Transmission-Line Effects are Taken Into Account for On-Chip Wiring

A. Deutsch, H. H. Smith<sup>1</sup>, C. Vakirtzis<sup>1</sup>, J. Kozhaya<sup>2</sup>, L. M. Greenberg<sup>1</sup>

IBM T. J. Watson Research Center, 1101 Kitchawan Road, Yorktown Heights, N.Y. 10598,

Phone: (914) 945-2858, Fax: (914) 945-2141, email:deutsch@us.ibm.com

<sup>1</sup>IBM Systems and Technology Group, Poughkeepsie, NY

<sup>2</sup>IBM Systems and Technology Group, Burlington, VT

#### Abstract

The impact of data-pattern variation on timing for on-chip interconnect timing is investigated for typical local, global, and clock wiring. The validity of the methodology to combine noise and timing engines is benchmarked against accurate non-linear simulations with R(f)L(f)C circuit representation and recommendations for CAD tool development are given.

#### Introduction

High-performance microprocessor and ASIC chips are operating at increasingly higher data-rates. Clock frequencies are projected to advance from 4-5 GHz to 10 GHz in the 2005-2010 timeframe. Interconnect delay is generally kept at a 25% contribution level of the total cycle time. So for 100-200 ps bit-time, the wire delay and risetimes are about 25-50 ps. Delay prediction accuracy then has to be in the order of 5 ps or less.

Typical CAD timing tools on the market [1] or developed by system companies [2-3] have mostly a static approach. The on-chip interconnects are represented by distributed RC circuits and the effective line capacitance is scaled to account for the charging of the mutual capacitance to the adjacent neighboring lines. This methodology is very efficient but could introduce large errors in timing prediction due to the variation of drive and load conditions of the neighboring lines and their timing in relation to the line of interest. Such scaling would differ from layer to layer in an 8-10 level wiring stack. Various improvements have been made to use iterative techniques to reduce the inaccuracy. Dynamic approaches are also pursued such as explained in [3] where a large linear multi-port coupled system is used to capture the interactions with all aggressors. Obviously the system can become exceedingly complex and non-linear simulation is too costly.

An alternative approach has been proposed where a hybrid method can be used to simplify the dynamic analysis but improve the accuracy of the static tools. The hybrid technique relies on noise calculations to adjust delay and slews. In this paper, we are analyzing the validity of the hybrid approach by using accurate, non-linear timing and noise simulations with transmission-line properties based on distributed frequency-dependent R(f), L(f), and C parameters.

#### **Interconnect Characteristics**

On-chip interconnect performance has not been scaling as rapidly as device speeds. Because of this, designers have been forced to drive the wires with larger buffers and thus consume more power. It is not uncommon, nowadays, to have even inter-macro wiring, with maybe 2x dimensions (twice the minimum dimensions) of 0.2 µm width, for example, be

driven by devices having 25-50  $\Omega$  effective impedance  $Z_{drv}$ . Such wires might be used for up to 300-500 µm lengths. Global wires between macros are being shortened and rebuffered aggressively to achieve the needed performance. Typical maximum lengths might be 1-1.5 mm with 8x widths of 0.8 µm and  $Z_{drv}$  under 25  $\Omega$ . The line resistance in these two cases can be around 315  $\Omega$ /mm to 23  $\Omega$ /mm, respectively. Even in the case of ASIC chips, clock frequencies have been rising from 1 to 3 GHz. Typical risetimes are around 100 ps. Clock distribution on topmost layers for such chips might have resistance *R* of 8.2-21.5  $\Omega$ /mm, lengths of 0.5-1.5 mm, and be driven by buffers with  $Z_{drv}$  of 5 to 10  $\Omega$ .

It has been shown in [4] that transmission line effects need to be taken into account when propagation delay  $\tau l$  is equal or exceeds half the propagated risetime,  $t_r$ , or

$$\tau l \ge 0.5 t_r \tag{1}$$

If these lines are maintained in an *LC*-mode of propagation, then very fast transitions can be guaranteed and delay will be determined by the wave propagation delay or  $\tau l = CZ_0$  and will be proportional to length *l*. In order to be in this type of behavior, the line losses have to be contained [4], or

$$Rl/2Z_0 < 1 \tag{2}$$

and

$$Z_{drv} < Z_0 \tag{3}$$

Due to the very stringent timing requirements explained above, transmission-line effects will need to be taken into account for a large portion of the on-chip wiring. For the 2x wires discussed earlier,  $Rl/2Z_0 = 0.28 - 1.4$ , for the 8x wires  $Rl/2Z_0 = 0.14 - 0.7$ , and for the clock wires, 0.2 - 0.7. The line impedance  $Z_0$  in these three cases are 55.5  $\Omega$ , 41  $\Omega$ , and 13-23  $\Omega$  and so transmission line effects come into play. Most of the timing tools that are in use or in development, use only distributed RC circuit representation. Crosstalk analysis is at a more advanced stage with frequency-dependent representation being used in tools such as described in [5]. This progress has been made because crosstalk depends on transmission-line effects even for much slower frequencies of operation. As technologies and designs are advancing, risetime, and then delay start requiring the same rigor in analysis as crosstalk. Examples will be given in the next sections indicating this trend.

#### **Analysis Methodology**

Local and global wiring were analyzed for both in-phase and interleaved data-bus configurations as shown in Figs. 1a and 1b. Some cluster net distribution examples will be shown as well. The line characteristics, namely capacitance C and inductance L will be data-pattern dependent. This behavior is

generally miss-labeled as effect of noise on timing. On-chip, the capacitive coupling  $K_C$ , which is the ratio of the mutual  $C_{lat}$  to self capacitance  $C_{tob}$  is very high. In the case of the 2x and 8x wiring,  $K_C = 0.38$  and 0.44. The static approach scales C by a factor K depending whether the neighboring lines switch in opposite direction, -+-, or late-mode, or in phase, early mode with +++. So C will be  $C_{tot} + K C_{lat}$ , or  $C_{tot}$ , or  $C_{tot}$ -  $KC_{lat}$  for -+-, 0+0, and +++ data-patterns, respectively. Similarly, the line inductance will change from  $L_{tot}$  -  $L_{lat}$ ,  $L_{tot}$ , to  $L_{tot} + L_{lat}$  depending on how the current return from the n lines through the ground conductors are flowing for the -+-, 0+0, +++ cases. Due to the high resistance of the return, the effective resistive path will also be altered and affect the effective impedance. This behavior is then correctly labeled data-pattern dependent line characteristic variation. A complete transmission-line analysis includes frequencydependent R(f), L(f) and C for coupled lines in order to capture such effects. Most timing tools assume a single-line RC-based model in 0+0 configuration.



Fig. 1 a) In-phase and b) interleaved wiring configurations. Each buffer stage has progression of device sizes.

The hybrid technique proposes to use this result and alter it based on the crosstalk obtained from +V+ configuration where V is the victim line of interest. The crosstalk can be obtained with *RC* ( $V_{noiseRC}$ ) or R(f)L(f)C ( $V_{noiseRLCf}$ ) analysis. An *RC*-based noise analysis greatly simplifies the tool implementation. Fig. 2 sketches the methodology. The switching threshold at 50% of the swing, or  $V_{DD}/2$ , is up- or down-shifted for late or early mode. The shift is either  $V_{noise} \times$  $CF_e$  or  $V_{noise} \times CF_l$  and ideally the correction factors are equal, or  $CF_e = CF_l$ .

In this study, simulations were performed of both the stage delay and crosstalk with *RC*, *RLC*, and R(f)L(f)C representations. The switching threshold, where delay is monitored, was then shifted up or down for the 0+0 single-line *RC* simulation until it matched the correct R(f)L(f)C three-line simulation. By knowing this new threshold value and the crosstalk, one can determine the *CF* factor. Since most timing tools cannot detect the difference between in-phase and interleaved wiring (shown in Fig. 1), the same *CF* needs to be developed for both circuits. The late-mode timing is the most critical for cycle time target while the early-mode is easier to correct for. Interleaved wiring is generally avoided as much as possible, as it will be shown later, due to the high crosstalk. Moreover,  $V_{noiseRC}$  is preferred over  $V_{noiseRLCf}$  because of CAD tool simplicity.

### **Analysis Results**

a) Logic Macros

Fig. 3a shows a comparison of the total delay, as defined in



Fig. 2 Threshold shifting for late and early-mode switching. Fig. 1, namely wire and buffer, for the 2x lines with  $Z_{drv}$  of 50  $\Omega$ . Delay is over-predicted by the *RC* and *RLC* circuits for



Fig. 3 Total delay for 2x wiring with width=space=0.2  $\mu$ m,  $Z_{drv} = 50 \Omega$ , with configuration of a) Fig. 1a and b) Fig. 1b.

For shorter lengths, an *RC*-based analysis is adequate, however CAD tools need to be able to simulate three coupled line configurations, unlike the single-line, 0+0, type available currently. Fig. 3b shows the results for interleaved lines. Here the early-mode case needs *RLC* simulation starting for even shorter lengths of l > 0.2 mm. Once again, the *RC* circuit underpredicts delay for longer lines. Fig. 4a shows late-mode, -+-, interleaved waveforms for l = 0.25 mm when all three lines switch synchronously and Fig. 4b shows the in-phase bus when the center line has a 20 ps skew.



Fig. 4 a) Simulated waveforms for l=0.25 mm interleaved 2x lines with  $Z_{drv} = 50 \Omega$ ; b) in-phase bus with 20 ps skew.

Notice the over-prediction of the propagated risetime by the *RC* circuit and also the risetime distortion in the presence of skew. The no-skew case predicts the worst case delay in Fig. 4b. The distortions seen in Fig. 4b are quite different for *RC* and R(f)L(f)C simulations. Figs. 5a and 5b show the comparison of crosstalk (+V+) for in-phase and interleaved buses, or far-end, FEN, and near-end, NEN, crosstalk. FEN is under-predicted by *RC* and *RLC* simulations for l > 0.5 mm while NEN is most always under-predicted. Notice also how much larger NEN is compared to FEN. These results highlight the need for R(f)L(f)C crosstalk analysis. Fig. 6a shows the propagated risetime and the over-prediction for *RC* circuits especially for l > 0.3 mm. In Fig. 6b, the correction factors *CF* for late-mode switching are plotted for in-phase and

interleaved simulations based on *RC* and R(f)L(f)C noise analyses. The *RC*-based noise generates *CF* values that are closer to each other, as desired, than the  $V_{noiseRLCf}$ -based. For early-mode case, only the  $V_{noiseRLCf}$ -based method can be used in order to have equal *CF* for in-phase and interleaved buses. This is why a CAD tool that can generate  $V_{noiseRLCf}$  is very beneficial.



Fig. 5 a) FEN and b) NEN crosstalk for 2x wiring for  $Z_{drv} = 25 \ \Omega$  and 50  $\Omega$ 



Fig. 6. a) Propagated risetime and b) correction factor CF for late mode, -+-, in-phase and interleaved 2x wiring with,  $R = 314.7 \ \Omega/mm$  and  $Z_{drv} = 50 \ \Omega$ .

#### b) Global Wiring

In the case of 8x wiring that is less resistive, the delay inaccuracies are larger. Fig. 7 indicates that both RLC and R(f)L(f)C simulations would predict much less jitter than an RC circuit. This can be seen by observing the much larger variation between -+- and +++ timing compared to 0+0 for the RC case than for the other two cases. An RC circuit could predict late-mode timing fairly accurately while early-mode could be predicted fairly well by an RLC circuit if such a feature was available in the current timing tools. Fig. 8a shows and example of early-mode waveforms for l = 2 mmwhere it is evident that the RC circuit has large error in delay and risetime. Fig. 8b shows late-mode switching with 0 ps, 5 ps, and 20 ps skew and R(f)L(f)C simulation. The risetime distortion is quite severe and line dimensional, device, or power supply and temperature tolerances could move these glitches into the switching threshold level. This would affect timing and possibly generate logic failure.



Fig. 7 Total delay for -+-, 0+0, and +++ for 8x wiring with  $Z_{drv} = 25 \Omega$ .

Figs. 9a and 9b show the crosstalk results for the 8x wires. FEN is overpredicted by *RLC* and under-predicted by *RC* circuit. NEN crosstalk is under-predicted by *RC* and *RLC* and it is much larger than FEN. This is why interleaved buses need to be avoided. This large noise also contributes to signal distortion as seen in Fig. 10a for late-mode. The *RLC* and R(f)L(f)C circuits can capture this distortion but not the *RC* representation. Fig. 10b highlights the large over-prediction



Fig. 8 a) Signal propagation for 8x wiring for early-mode, +++, 1 = 2 mm, with R = 23  $\Omega$ /mm and Z<sub>drv</sub> = 25  $\Omega$ ; b) late-mode, -+-, with skew of 0ps, 5ps, 20ps, and *R*(*f*)*L*(*f*)*C* representation.



Fig. 9 a) FEN and b) NEN for 8x wiring, with R = 23  $\Omega$ /mm and Zdrv = 25  $\Omega$  and 50  $\Omega$ .

of propagated risetime by an *RC* circuit. This results in higher power and noise due to over-design of buffers. Once again, an *RLC*-based coupled-line tool would capture the correct risetimes if the tool was available. Fig. 11a shows the delay results for the interleaved buses. Notice that the *RC* circuit under-predicts the nominal delay which is used in the hybrid methodology as the base value for shifting the threshold. Finally, Fig. 11b shows the *CF* that are found in this case. For early-mode switching, +++, the *CF* values are quite close for both  $V_{noiseRC}$  and  $V_{noiseRLCf}$ . However, when either of these *CFs* are used, the error in delay prediction is too large if applied to both in-phase and interleaved buses and thus this methodology is not useable.



Fig. 10 a) Signal propagation for 8x interleaved wiring for late mode, -+-, l = 2mm, with  $R = 23 \Omega/mm$ ,  $Z_{drv} = 25 \Omega$ ; b) propagated risetime for 8x interleaved wiring with  $Z_{drv} = 25 \Omega$ , with width = space=0.8 µm.

For late-mode, -+-, the *CF* which is non-linear in Fig. 11b, based on  $V_{noiseRC}$ , can be used for both in-phase and interleaved cases. Although the  $V_{noiseRLCf}$  based *CF* in Fig. 11b is close to 1, and more desirable, it cannot be used.



Fig. 11 a) Total delay for 8x interleaved wiring with  $Z_{drv} = 25 \Omega$ ; b) correction factors CF for 8x in-phase wiring for late (upper curves) and early modes (lower set of curves) for  $V_{noiseRC}$  and  $V_{noiseRLCf}$ .



Fig. 12 Cross section for 6.46  $\mu m$  wide clock line with R = 8.2  $\Omega/mm$  and  $Z_{\rm o}$  = 13  $\Omega.$ 



Fig. 13 Typical clock-tree layout.



Fig. 14 a) Wire delay for width of 6.46  $\mu$ m,  $Z_{drv} = 6.25 \Omega$ , and b) propagated risetime for *RC*, *RLC*, and *R(f)L(f)C*.



Fig. 15 a) Signal propagation for clock wiring with width = 2.472  $\mu$ m,  $Z_o = 23 \Omega$ ,  $R = 21.5 \Omega$ /mm and b) width = 6.46  $\mu$ m,  $Z_o = 13 \Omega$ , and  $R = 8.2 \Omega$ /mm.

#### c) Clock Wiring

Clock distribution networks need to transmit the clock signals across the entire chip with minimum skew. The typical tree configuration [4] uses very low-resistance lines on the topmost layers and the timing control is even more stringent than for the data buses, even for slower chips such as ASICs that operate at around 1 - 3 GHz. As indicated before, these lines could be 0.5 - 1.5 mm in length but they are driven by very large size buffers. Fig. 12 shows a typical cross section of a topmost wide line with width of 6.46 µm and the ground conductors on surrounding layers. Fig. 13 shows a portion of

a typical clock-tree. Figs. 14 and 15 show typical errors given by the RC representation for both delay and signal transition. The same effects are seen even in cluster nets.

#### Conclusions

It has been shown that the *effect of noise on timing* is actually a *data-pattern dependent variation of the interconnect characteristics*. This change in the effective impedance of the lines needs to be analyzed using frequency-dependent R(f), L(f), C transmission-line characteristics. Transmission-line effects were shown to come into play for delay and risetime prediction as the timing requirements for many GHz operation are increasingly more stringent. Such effects used to be critical mostly for crosstalk and clock timing prediction in earlier generations of technology.

The hybrid methodology of using single-line delay simulations with switching threshold shifting based on crosstalk results can be used for limited cases. For delay prediction, the shift of the threshold using *RC*-based noise calculation can be used only for late mode, -+-, switching. For early-mode, +++, noise based on R(f)L(f)C representation is needed for very resitive lines with  $R >> 100 \Omega/\text{mm}$  and cannot be used for lower resistance global lines. For the high-resistance lines, and RC-based coupled-line tool could be adequate and might be a better solution. This is especially true if it could also be coupled with non-linear device macromodels that would allow for capturing of non-linear effects.

In all cases, *RC* circuit over-predicts risetime and this results in higher power usage and higher noise generation than needed because designers will over-design the size of the buffers when using such tools. An *RLC*-based tool would be quite adequate in predicting propagated risetimes much more accurately. This is especially significant when skews and tolerances are introduced. The distortions can be so large that timing and logic operation can be severely impacted and an *RC*-based tool could not flag this. Large crosstalk, found especially on interleaved buses, distorts the risetimes and an *RLC* circuit can again capture this effect. Accurate crosstalk prediction requires R(f)L(f)C representation. Interleaved bus configurations need to be avoided due to excessively large NEN crosstalk.

While the hybrid approach has somewhat higher accuracy than the earlier static tools, dynamic approaches, with multiple coupled lines together with fast non-linear simulators are definitely needed. The simple single-line timing simulation needs to evolve from *RC* to *RLC* coupled-line capability for the many-GHz era.

#### References

- 1) <u>www.cadence.com</u> and <u>www.synopsis.com</u>
- R. Y. Chen, et al., "Timing Window Applications in UltraSPARC-III Microprocessor Design", Proceedings of 2002 IEEE ICCD, pp. 1-4, 2002.
- R. Arunachalam, K. Rajagopal, L. T. Pileggi, "TACO: Timing Analysis with Coupling", Proceedinmgs of DAC, pp.266-269, June 2000.
- A. Deutsch, et al, "On-Chip Wiring Design Challenges for Gigahertz Operation", IEEE Proceedings, pp.529-555, vol. 89, no. 4, April 2001.
- 5) H. Smith, et al., "Frequency dependent RLC crosstalk evaluation of a high performance S/390 microprocessor chip", Proc. Dig. IEEE 9<sup>th</sup> Topical Meeting Elec. Perf. Of Electronic Packaging, Scottsdale, AZ., Oct. 23-25, 2000, pp. 321-324.