# **IBM Research Report**

# New Methodology for Combined Simulation of Delta-I Noise Interaction with Interconnect Noise for Wide, On-chip Data-buses Using Lossy Transmission-line Power-blocks

A. Deutsch, H. H. Smith<sup>1</sup>, B. J. Rubin, B. L. Krauter<sup>2</sup>, G. V. Kopcsay

IBM Research Division Thomas J. Watson Research Center P.O. Box 218 Yorktown Heights, NY 10598

<sup>1</sup>IBM Systems and Technology Group 2455 South Road Poughkeepsie, NY 12601

<sup>2</sup>IBM Systems and Technology Group 11400 Burnet Road Austin, TX 78758



Research Division Almaden - Austin - Beijing - Haifa - India - T. J. Watson - Tokyo - Zurich

LIMITED DISTRIBUTION NOTICE: This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publicher, its distributionoutside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g. payment of royalties). Copies may be requested from IBM T. J. Watson Research Center, P. O. Box 218, Yorktown Heights, NY 10598 USA (email: reports@us.ibm.com). Some reports are available on the internet at <a href="http://domino.watson.ibm.com/library/CyberDig.nsf/home">http://domino.watson.ibm.com/library/CyberDig.nsf/home</a>

## New Methodology for Combined Simulation of Delta-I Noise Interaction with Interconnect Noise for Wide, On-chip Data-buses Using Lossy Transmission-line Power-blocks

A. Deutsch, H. H. Smith<sup>1</sup>, B. J. Rubin, B. L Krauter<sup>2</sup>, G. V. Kopcsay,

IBM T. J. Watson Research Center, 1101 Kitchawan Road, Yorktown Heights, N.Y. 10598,

Phone: (914) 945-2858, Fax: (914) 945-2141, email:deutsch@us.ibm.com

<sup>1</sup>IBM Systems and Technology Group, 2455 South Road, Poughkeepsie, NY 12601

<sup>2</sup>IBM Systems and Technology Group, 11400 Burnet Road, Austin, TX 78758

*Abstract* – A new technique is described for reducing computational complexity and improving accuracy of combined power distribution and interconnect noise prediction for wide, on-chip data-buses. The methodology uses lossy transmission-line power-blocks with frequency-dependent properties needed for the multi-GHz clock frequencies. The interaction between delta-I noise, common-mode noise, and crosstalk and their effect on timing is illustrated with simulations using representative driver and receiver circuits and on-chip interconnections.

Keywords - On-chip power distribution noise, on-chip simultaneously switching noise, delta-I noise, on-chip interconnect noise.

#### Introduction

The high level of integration on high-performance microprocessor chips allows placement of very large data-caches on the same die with one or more processor units. Communication between processor units and these data-caches involves hundreds of simultaneous data-bit transfers. This means that hundreds of lines can switch simultaneously and charge the interconnecting transmission lines. This large surge of current has to be supplied by the power distribution system. Power is fed into the chip from the supporting packaging structure through solder bonds that are on a coarse grid of  $200 - 400 \,\mu\text{m}$ pitch. On the other hand, the logic cells have power rails on approximately  $5 - 10 \mu m$  pitch. Because of the very thin metallization used, the power and ground reference conductors forming the power distribution mesh on many layers will introduce a finite impedance between the solder bonds and the actual devices. This finite impedance,  $Z_{eff}$ , could be predominantly resistive in the bottom thinner layers, but it could be inductive and resistive in nature on the topmost layers. The difference in spatial distance between the solder balls and the actual power device contacts (400 to 10 µm) and the signal-to-power line ratio on various layers will generate a frequency-dependent behavior in the power mesh. The voltage drop along this finite impedance generates common-mode noise, (CMN), on the interconnections, as explained in [1]. At the same time, because of this finite impedance, the driver devices cannot instantaneously supply the desired current and thus a power supply "droop" develops generally known as delta-I noise, or simultaneously-switching noise, (SSN). On packages, since low resistance power planes are generally used for reference, and the connecting vias are on extremely coarse pitch, the nature of the power mesh impedance is mostly inductive. The typical analyses and CAD tools used as seen in [2] and [3] model the entire power mesh as a single big distributed lumped circuit that includes the frequency-independent R, L, and C of all the planes and vias. Newer techniques such as shown in [4] and [5] include frequency-dependence for the entire model, while in [6] signal line models are added to the large plane models.

These types of models are extremely large since they have to cover an entire multi-chip module, or an entire chip. In addition, ideal linear current sources are used to represent the driver circuits. Power supply noise is estimated and added to the other noise sources in the system such as crosstalk and reflections. Simplified formulas are developed to assess the effect of this noise on timing, for example as rise-time degradation. The slew impact is calculated as  $V_{noise} / t_r$ , where  $V_{noise}$  is the generated SSN and  $t_r$  is the propagated risetime on the interconnect transmission lines. This methodology is very approximate and time consuming. As the clock frequencies and the number of drivers switching are both increasing, simplified frequency-independent power mesh analyses fail to predict accurately the SSN contribution. Non-linear effects of the noise interaction between SSN, CMN, and crosstalk, as explained in [1], should be taken into account. Non-linear driver circuits should be used in the simulation and all noise sources simulated simultaneously. This places tremendous requirement on present CAD tools that are generally relying on linear solvers only.

This paper proposes a methodology to reduce computational cost and improve the accuracy of the noise prediction for the high frequencies that are desired in high-performance microprocessor chips. The technique relies on defining only two power-blocks for each system design in terms of per-unit-length properties and being able to perform non-linear simulations of both power and interconnect noise with actual driver and receiver circuits. Examples are shown for analysis of the effect of on-chip decoupling, device intrinsic capacitance, multiple voltage rails, impact on timing, isolation from package, and the balancing between CMN and SSN noise contributions.

#### **Description of Methodology**

As the clock frequencies are predicted to be raised toward 10-15 GHz and the number of transistor circuits to go from 450 million in 2004 to 2,200 in 2010 on chips that are 20 x 20 mm in size, the expected delta-I noise could rise to excessive levels. It should be noted that due to the high wiring density needed on chip, the reference return is comprised of a sparse supply of reference conductors spread on many layers and interconnected with vias. Typical signal-to-reference conductor

ratios are 4:1 to 2:1 in the topmost layers but could be as high as 10:1 in the lower layers of the on-chip wiring stack. In order to control the characteristics of the data-bus wiring and to contain the level of simultaneously switching noise as well as crosstalk and common-mode noise between signal lines, increased number of power conductors are included on chip, especially in the area around the data-buses. Such large supply of power conductors on layers 1, ..., n-3, n-2, n-1, n with alternating X and Y direction, start forming a fairly regular mesh. The power conductors can have several different power voltage levels for on-chip logic units; for input/output, I/O, circuits; for memory devices; and for isolation between processor units due to high leakage. Nevertheless, the regularity of the power-mesh with power rails  $V_1, V_2, ..., V_n$ , can be captured by defining it as made of similar blocks with per-unit-length properties, or lossy transmission-lines. Such building blocks require much smaller models than an entire chip power mesh. Two types of blocks need to be generated as shown in Fig. 1, one in X and one in Y direction. The building blocks have the power rails  $V_1, V_2, ..., V_n$  included as lossy transmission lines that are referenced to an ideal ground plane and are oriented orthogonal to the direction of the signal wires in the data-bus, in order to capture the worst-case current return path or worst-case Zeff. Only two such blocks, X and Y directed, are needed for each chip design. The driver and receiver circuits are interconnected by hundreds of parallel databus lines. The data-buses are represented as lossy-transmission lines and can be 1-5 mm in length. They are referenced to the same ideal ground plane as the power blocks. The power-block transmission lines are much smaller in size than a fullchip model and yet able to capture a repetitive pattern that includes all the layers in the wiring stack. By defining them as orthogonal to the signal line direction, a worst case modeling assumption is made. The actual current distribution will be someplace in-between this case and the ideal case where an ideal reference return is considered. Examples will be shown of modeling with the ideal reference.

Fig. 2 shows in more detail the concept of the driver and receiver side power blocks for the transmission lines on layer n. The power-blocks in this case contain  $V_1$ ,  $V_2$ ,  $V_3$  power lines that are analyzed as lossy transmission lines with frequencydependent behavior as shown in [1]. For the signal wiring only the parallel power conductors are shown with solid lines in order to simplify the schematic but the complete multi-layer mesh is included in the actual model. Twelve signal lines are shown for the data-bus. The length of the two power-blocks, (in the direction perpendicular to the signal lines), is determined by the distance between the driver or receiver circuits to the closest decoupling capacitors and can easily be varied for sensitivity analyses. In the definition of the model, the primary concern is to obtain the minimum size model that is repetitive in pattern. The length used in simulation is a conceptual length based on knowledge of physical distance to decoupling devices. The repetitive unit captures the complete properties of transmission-lines  $V_1$ ,  $V_2$ ,  $V_3$ . The width of the block is determined by the spread of the various devices driving the data-bus. This cannot be made too large because other devices are relying on the same return conductors and some current sharing will occur. Fig. 3 shows three-dimensional models of such power-blocks, with width x length of 64 x 600 µm and 100 x 200 µm in size that include the topmost four layers for the X and Y directed cases. The models were chosen such as to capture representative repetitive units in the power mesh and show the actual pattern, unlike the ideal schematics of Figs. 1 and 2. In actual simulations, arbitrary lengths can be defined but the per-unit-length properties are captured in the model of Fig. 3. Such power-blocks can easily be inserted in pre-layout optimization or post-layout noise verification CAD tools. If the voltage "droop" generated by Zeff is to be compensated, one may place decoupling capacitors close to the switching drivers. These capacitors can be easily connected in the blocks between V<sub>1</sub>, V<sub>2</sub>, ..., V<sub>n</sub>, in a distributed manner. Moreover, the circuits that are not switching in the silicon underneath the long data-lines, will have additional intrinsic device capacitance that can be included in simulations along the signal wires between the power rails. Fig. 4 shows typical models of the blocks used for the 12 signal transmission lines on layers n and n-1. Three additional conductors are also defined as "signal lines" namely V1, V2, V3, for a block of 15 lines referenced to an ideal ground plane just as the power-blocks in Fig. 3. By adding these V1, V2, V3 conductors in all the layers of the model, the interaction between signal and power conductors is accurately captured, unlike the present modeling practice that uses separate power and signal modeling.

Frequency-dependent impedance and admittance characteristics are synthesized for both the signal-block and powerblock transmission-lines and inserted as segmented distributed models in non-linear simulations, with non-linear device models for the driver and receiver circuits as shown in Fig. 5 for two sets of data-buses. The synthesis uses distributed, lumped-circuit segments with Foster-type, low-pass filters that matches the per-unit-length Z(f) and Y(f) matrices obtained with a field solver [1]. A typical representation might include 21 sections with six poles for a broadband model up to 20 GHz. The distributed nature of the circuit allows for easy insertion of decoupling and device capacitance in simulations and the distributed placement is very close to actual physical implementation on chip. The signal transmission-lines plus the extra power conductors capture the signal-to-power noise interaction along the 1 - 5 mm length, while the small powerblocks at the two ends capture the driver and receiver circuit effects. The driver and receiver circuits are injecting current into this fully-connected mesh without having to model the entire extent of reference planes for a full chip or full package.

It was found that in typical cases for topmost signal wires, two-dimensional, 2D, frequency-dependent parameter extraction gave crosstalk and common-mode noise estimation error that was under 10% compared to the full, threedimensional, analysis that has been validated experimentally [1] but has 50-200 times slower computation time. Fig. 6 shows the 3D and 2D models that were considered for the 12 signal lines placed in three power bays on topmost layer n, with pitch of 12.0  $\mu$ m and wide power conductors on 396  $\mu$ m pitch. Fig. 7 shows the simulated crosstalk results and Table I compares the 2D and 3D common-mode noise for the 12 signal lines (734.9 mV versus 781.9 mV). Fig. 8 shows the calculated *R(f)* and *L(f)* matrix terms for the 3D and 2D models of Fig. 6 and the excellent agreement even for far-coupling terms  $R_{17}(f)$  and  $L_{17}(f)$ . This justifies the use of 2D modeling for the signal lines. On the other hand, the power-block modeling requires three-dimensional analysis to capture the frequency-dependent behavior, shown by the self and mutual  $R_{ii}(f)$  and  $R_{ij}(f)$  terms in Fig. 9. The sparse signal-to-ground conductor assignment on the various layers (3:1 on topmost two) results in the frequency-dependent increase in the  $R_{ij}(f)$  term that becomes significant above 2 GHz. This explains the need for frequency-dependent modeling of  $Z_{eff}(f)$  for the upcoming clock frequencies that was not as critical a requirement for previous microprocessor generations.

#### **Results and Discussion**

Fig. 10 shows simulation results for the 5-mm long bus having four signal lines in a power bay as shown in Figs. 2 and 6. The power conductors are  $V_1 = 1.5$  V,  $V_2 = 2.5$  V, and  $V_3 = V$ gnd. Fig. 10 compares the signal at the receiver circuit input for an ideal power distribution (with  $V_1 = V_2 = V_3 = GND$ ) and with the actual power-block. The delta-I noise monitored on  $V_1$  and  $V_2$  with respect to the local  $V_3$  at the driver end is shown in Fig. 10 and Table I for a power block length of 396  $\mu$ m which in this case was also the pitch of the solder-ball contacts. The propagated waveforms on the active lines have much slower risetime when the delta-I noise effect is taken into account. This slow-down of the signal transition reduces the noise on the signal lines. This reduction is not linear with the number of drivers and can only be captured with this type of nonlinear simulations. Simple simulations with ideal current sources as predicted today would give inaccurate results. Table I and Fig. 11 show the results for 6, 12, and 24 line data-buses. As the number of active lines and thus the number of in-phase signals is increased, the higher the effective signal line impedance which leads to faster propagated signals. This can be seen for the 6 and 12 lines with ideal ground reference in Table I and Fig. 10. The 12-line case has higher CMN noise and faster risetimes. When the power-blocks are included, the slow-down of the drivers reduces the noise on the quiet center line, V, for the same number of lines switching. The switching pattern in all cases is ---- +V+ ----. There is a compensating effect taking place between delta-I noise and common-mode noise and the resultant noise seen at the receiver input will depend on the driver strength, length of line, length of power-block, decoupling used, and ground conductor ratio for the wiring cross section considered.

Fig. 12 compares the ideal power distribution, without power-blocks, with the case when the power-block length is varied from 198 to 396  $\mu$ m and no decoupling is used along the block itself. As the power-block length is increased, the delta-I noise increases, driver risetime slows down and noise on the interconnect, at the receiver end, is reduced. The reduction is not linear with the length of mesh.

Fig. 13 compares the ideal power distribution with the case when the chip power-blocks nodes in Fig. 5 are also connected to the chip-carrier or package power distribution that has a  $Z_{eff}$  with R = 30 m\Omega and L = 1 nH. The package model can be enhanced beyond such simplified assumptions and attached to the nodes in Fig. 5. It should be noted, that Fig. 5 includes two sets of signal lines to account for the loading of subsequent stages of wiring on the initial set of lines of interest. In addition, interaction between noise at driver and receiver ends can thus easily be captured. The third case in Fig. 13 has the inclusion of a total of 122 pF distributed decoupling capacitors along the driver power-block. The decoupling capacitors on chip isolate the circuits from the damaging effects of the package  $Z_{eff}$ . The decoupling capacitors help restore the risetime of the propagated signal and reduce the delay caused by delta-I noise and also reduce the delta-I noise on the V<sub>1</sub> power rail. The position and size of decoupling needed can easily and accurately be determined. While timing is restored by the use of decoupling, the signal line noise is increased again and thus other methods are needed to reduce CMN and crosstalk noise, such as the reduction of signal-to-power conductor ratio in the wiring cross section. For the two examples shown in Figs. 2 and 4, the 4:1 signal-to-power ratio had 52% and 26% total line noise and delta-I noise, respectively, while the 3:1 case had 46% and 20% levels (in percent of V<sub>1</sub> swing), respectively. Decoupling capacitors make the chip insensitive to package  $Z_{eff}$ . The intrinsic device capacitance along the signal transmission-lines similarly help reduce delta-I noise and restore the risetime. Their availability, however, depends on the logic switching activity.

Reducing the number of voltage rails from three to two, for example, also helps reduce delta-I noise and restore the timing as shown in Fig. 14 and Table I. Delta-I noise was reduced in Fig. 14 by 13%, while the propagated risetime became faster by 10 ps. It was also found that noise on  $V_1$  couples into the  $V_2$  voltage rail and in the case of short signal lines, less than 1 mm, delta-I noise coupled into the receiver power-block and was higher than for longer signal lines as shown in Fig. 15 for the lines of Fig. 4. Fig. 16 shows the reduction in delta-I noise when even only 19% of available device intrinsic capacitance (in this case 90 pF) is used along a 3-mm long signal line bus together with the decoupling capacitance for a 200 µm power-block length (in this case 64 pF). Both types of decoupling are very effective but the quiet device help is less predictable than the specially introduced decoupling capacitors. The package finite impedance, however, can have extremely adverse effects on chip and needs to be protected against. Delta-I noise on the power rail was about the same for the quiet receiver in far-end (FEN) or near-end (NEN) wiring configurations, however, the total noise at the receiver input was much higher in NEN than in FEN topology and could result in rerouting changes. Total noise on the signal lines tends to saturate even for as few as 12 parallel lines, while delta-I noise increase only slows down for n > 12 drivers. Interconnect total noise at the receiver can increase due to switching signal timing skews while delta-I noise was fairly insensitive. Even a 15 ps timing skew between the center active lines and the rest of the lines in the bus can result in 6% higher noise at the receiver input. In-phase switching when CMN and crosstalk are canceling each other, or ---- --V- ----, could result in lower signal line noise (in this case -7.4%), higher delta-I noise (by +11.7%) and delay but slower propagated risetime (by 47%).

While simplified linear simulations could possibly capture delta-I noise levels, only the non-linear simulations shown in Fig. 5 could accurately predict the actual device transition slow-down and noise-caused timing changes. Fig. 17 shows some examples of signal distortion and delay increase caused by the package  $Z_{eff}$ , by long distance to closest decoupling, and the benefits of decoupling in timing restoration and isolation from the package effect. Overall, the driver delay and risetime are increased by delta-I noise, while the interconnect delay remains constant.

In conclusion, it is believed that the multi-GHz operation will drive the use of increasingly regular pattern of power mesh design on large chips in order to contain noise and control signal line characteristics. Once such repetitive patterns can be identified, the use of power-blocks with per-unit-length properties can easily be implemented for capturing the power distribution noise. The size of this power-block model is much smaller than full-chip or full-package reference plane models presently used. Only two such blocks are needed for each chip design. If there are different power patterns in the core, for example, from the I/O drivers, then maybe four models need to be generated but can be reused for many simulations. It was also shown that such power-block models need to provide the full frequency-dependent  $Z_{eff}(f)$  properties just as the interconnect models have frequency-dependent series impedance representation for capturing both crosstalk and commonmode noise. The very fast transitions and short cycle times that are desired also require increased timing accuracy predictions and thus non-linear simulations with actual driver and receiver models become imperative. For clock frequencies of 6 - 10 GHz, where the cycle time is 100 - 166 ps in width, and risetimes are 40 - 60 ps, even 10 - 15 ps skew or risetime degradation due to noise can be significant. The frequency-dependent properties of the effective impedance Z<sub>eff</sub> (f) of the power distribution mesh for both voltage rails and signal lines have to be included to assess the compensating or additive non-linear noise effects of SSN, CMN, and crosstalk and their effect on delay and propagated risetime. The power and signal integrity can no longer be analyzed separately and non-linear simulation is needed. The amount of power decoupling added or present from quiet devices needs to be accurately determined and provided to isolate from the influence of the non-ideal package. A methodology was shown that could reduce the computational burden and greatly increase accuracy of noise and timing prediction for CAD tools used in the design of high-performance microprocessor chips. The simultaneous simulation of both power distribution noise and interconnect noise and resultant signal propagation integrity will be increasingly needed for multi-GHz clock frequencies. Once such complex models, as shown in Fig. 5. become manageable in size, the shortcomings of present SPICE-type circuit simulators become the computational bottleneck. Progress has been shown with ultra-fast simulators and device macromodels that can circumvent these problems in the future and it is believed that combined, non-linear power and signal integrity simulations will become the general practice.

#### References

- [1]. A. Deutsch, H. H. Smith, G. V. Kopcsay, B. L. Krauter, C. W. Surovic, A. Elfadel, D. J. Widiger, "Understanding Common-Mode Noise on Wide Data-Buses", Digest of 12<sup>th</sup> IEEE Topical Meeting on Electrical Performance of Electronic Packaging, Oct. 27-29, 2003, Princeton, NJ, pp. 309-312.
- [2]. B. McCredie and W. Becker, "Modeling, measurement, and simulation of simultaneous switching noise", IEEE Trans. Comp. Packg., Manuf. Tech., vol. 19, pp. 461-472, Aug. 1996.
- [3]. H. H. Chen, J. S. Neely, "Interconnect and Circuit Modeling techniques for Full-Chip Power Supply Noise Analysis", IEEE trans. Comp., Packg., Manuf. Tech. – Part B, vol. 21, no. 3, pp. 209-215, Aug. 1998.
- [4]. N. Na, J. Choi, S. Chun, M. Swaminathan, and J. Srinivasan, "Modeling and transient simulation of planes in electromagnetic packages", IEEE Trans. Comp. Manuf. Tech., B, vol. 21, pp. 157-163, May 1998.
- [5]. J. H. Kim and M. Swaminathan, "Modeling of Multilayer Power Distribution Planes Using Transmission Matrix Method", IEEE Trans. Adv. Pack., vol. 25, pp.189-199, May 2002.
- [6]. T. Watanabe, K. Srinivasan, H. Asai, M. Swaminathan, "Modeling of Power Distribution Networks with Retardation Using the Transmission Matrix method", Digest of 13<sup>th</sup> IEEE Topical Meeting on Electrical Performance of Electronic Packaging, Oct. 25-27, 2004, Portland, OR, pp. 233-236.

### TABLE I

|                            | Signal Line Noise at Receiver | Risetime | Delay    | Noise on V <sub>1</sub> | Noise on V <sub>2</sub> |
|----------------------------|-------------------------------|----------|----------|-------------------------|-------------------------|
| Ideal GND – 3D – 12 signal | l lines 734.9 mV              | 33.6 ps  | 89.0 ps  |                         |                         |
| Ideal GND – 2D - 6 signal  | lines 530.0 mV                | 51.8 ps  | 96.0 ps  |                         |                         |
| Ideal GND - 2D - 12 signal | l lines 781.9 mV              | 26.2 ps  | 92.0 ps  |                         |                         |
| 396 µm Power-Block 6lines  | 504.6 mV                      | 76.0 ps  | 102.0 ps | 235.3 mV                | 90.3 mV                 |
| 396 µm Power-Block 12lines | 657.0 mV                      | 73.6 ps  | 104.0 ps | 410.0 mV                | 212 mV                  |
| 396 µm Power-Block 12lines | $V_1 = V_2$ 703.7 mV          | 63.0 ps  | 101.0 ps | 355.0 mV                |                         |
| 396 µm Power-Block 24lines | 683.5 mV                      | 105.8 ps | 111.5 ps | 567.0 mV                | 313 mV                  |

Simulated Delta-I and Interconnect Noise with l = 5 mm and  $Z_{drv}$  = 25  $\Omega$ 



Fig. 1 Block diagram of on-chip power distribution showing the X and Y directed power-blocks at the driver and receiver circuit ends.





Fig. 2 Schematic of power-blocks in a) X and b) Y direction and signal blocks in Y and X direction, respectively, with 4:1signal-to-power case (pitch =  $12 \mu m$ ) and decoupling and device capacitors shown in distributed manner.

a)



Fig. 3 Example of power-block models with three voltage levels  $V_1$ ,  $V_2$ ,  $V_3$  referenced to an ideal ground plane for a) Y powerblock and b) X power-block and using the top four wiring layers in the stack. The power conductor pitch is 8 and 14.4  $\mu$ m on topmost layers and 200  $\mu$ m for the wide power conductors connecting to the solder balls connecting to the package. Some of the power conductors are coalesced into equivalent wider conductors to reduce the size of the model.



Fig. 4 Example of signal line block with 3:1 signal-to-power referencing (pitch = 8  $\mu$ m) and total of 15 total signal lines (12+3) for a) X-directed transmission lines on layer n and b) Y-directed transmission lines on layer n-1 with width x thickness = 0.8 x 1.2  $\mu$ m.



Fig. 5 Circuit diagram used in simulation with two signal blocks and three power-blocks connected to three sets of driver and receiver circuits. The three voltage levels are Vdd, Vcs, and Vgnd. The buffer circuits driving the signal bus with l = 3mm have  $Z_{drv} = 25 \Omega$ .



Fig. 6. a) Three-dimensional, 3D, and b) two-dimensional, 2D, transmission line model for the 12-line data-bus on the topmost layer with 4:1 signal-to-ground ratio and ideal ground return reference where all the voltage levels are considered to have the same potential. The ground conductor pitch is 12  $\mu$ m around the signal lines and 396  $\mu$ m for the wide conductors connecting to solder balls. The signal lines have width x thickness = 1.26 x 1.2  $\mu$ m.



Fig. 7. Simulation results for the 4:1 case with the signal lines shown in Fig. 6 showing the signal at the output of the active center driver, at the end of the 5-mm long lines at the receiver input, and at the receiver input on the victim, quiet line. Two configurations were simulated, namely with the victim receiver at the same end with the active receivers (FEN, solid and

dashed), and the victim receiver at the active driver end (NEN, dot-dashed and dotted).  $Zdrv = 25 \Omega$  in all cases, and the ---- + V + --- switching pattern was assumed for worst case summation of crosstalk and common-mode noise.









Fig. 9 Calculated a)  $R_{ii}(f)$ , b)  $R_{ij}(f)$ , c)  $L_{ii}(f)$ , and d)  $L_{ij}(f)$  for power-block with 3:1 case and 8 µm power pitch shown in Fig. 3a. Voltage rails are  $V_{dd}$ ,  $V_{gnd}$ , and  $V_{cs}$ .



Fig. 10 Simulation results for the 4:1 case with 12  $\mu$ m power pitch and signal lines on topmost layer n. V<sub>1</sub> = V<sub>dd</sub> = 1.5 V. 6 (solid and dot-dashed) and 12 (dashed and solid) line switching is shown and interconnect noise is shown at the receiver input for the victim center line. The power block length is 396  $\mu$ m, decoupling is not included, and the ideal case is also shown, without the power-block (solid and dashed). Delta-I noise on V<sub>1</sub> and V<sub>2</sub> rails are shown at the receiver input with respect to the local ground, V<sub>3</sub> and ---- + V + ---- switching pattern was assumed. The lines were 5-mm long with R = 135  $\Omega$ /cm and Z<sub>drv</sub> = 25  $\Omega$ .



Fig. 11. Simulation results for the 4:1 case with 12  $\mu$ m power pitch and signal lines on topmost layer n. V<sub>1</sub> = V<sub>dd</sub> = 1.5 V. 6 (solid), 12 (dashed), 20 (dot-dashed), and 24 (solid) line switching is shown and interconnect noise is shown at the receiver input for the victim center line. The power block length is 396  $\mu$ m long in all cases and decoupling is not included Delta-I noise on V<sub>1</sub> rail is shown at the receiver input with respect to the local ground, V<sub>3</sub> and ---- + V + ---- switching pattern was assumed. The lines were 5-mm long with R = 135  $\Omega$ /cm and Z<sub>drv</sub> = 25  $\Omega$ .





Fig. 13. Simulation results for the 4:1 case with 12  $\mu$ m power pitch and signal lines on topmost layer n. V<sub>1</sub> = V<sub>dd</sub> = 1.5 V. 12 line switching is shown and interconnect noise is shown at the receiver input for the victim center line. The power block length is 396  $\mu$ m long and the ideal case is also shown (solid), without the power-block, and compared to the cases when decoupling is not included (dashed) and with 122 pF distributed decoupling (dot-dashed) and an equivalent package impedance,  $Z_{eff}$  of R = 30 m $\Omega$  and L = 1 nH. Delta-I noise on V<sub>1</sub> rail is shown at the receiver input with respect to the local ground, V<sub>3</sub> and ----+V + ---- switching pattern was assumed. The lines were 5-mm long with R = 135  $\Omega$ /cm and  $Z_{drv} = 25 \Omega$ .



Fig. 14. Simulation results for the 4:1 case with 12  $\mu$ m power pitch and signal lines on topmost layer n. 12 line switching is shown and interconnect noise is shown at the receiver input for the victim center line. The power block length is 396  $\mu$ m, decoupling is not included, and the ideal case is also shown (solid), and compared to the cases when  $V_1 = V_{dd} = 1.5$  V and  $V_2 = V_{cs} = 2.5$  V (dashed), and  $V_1 = V_2 = 1.5$  V (dot-dashed). Delta-I noise on  $V_1$  rail is shown at the receiver input with respect to the local ground,  $V_3$  and --- + V + --- switching pattern was assumed. The lines were 5-mm long with R = 135  $\Omega$ /cm and  $Z_{drv} = 25 \Omega$ .



Fig. 15. Simulation results for the 3:1 case with 8  $\mu$ m power pitch, 12 signal lines on topmost layer n. Delta-I noise is shown as a function of interconnect length, the power block length is 200  $\mu$ m, and decoupling is not included. V<sub>1</sub> = V<sub>dd</sub> = 1.0 V and V<sub>2</sub> = V<sub>cs</sub> = 1.0 V. Delta-I noise on V<sub>1</sub> rail is shown at the driver with respect to the local ground, V<sub>3</sub>, (solid), at the receiver (dashed), on V<sub>2</sub> rail at the driver (dot-dashed), and at the receiver (dotted) and --- + V + --- switching pattern was assumed. The lines had R = 229  $\Omega$ /cm and Z<sub>drv</sub> = 25  $\Omega$ .



Fig. 16. Simulation results for the 3:1 case with 8  $\mu$ m power pitch, 12 signal lines on topmost layer n. Delta-I noise is shown with a power block length of 200  $\mu$ m and decoupling not included (solid) with V<sub>dd</sub> = 1.0 V. Delta-I noise on V<sub>1</sub> rail is shown at the driver with respect to the local ground, V<sub>3</sub>, (solid), with a package Z<sub>eff</sub> having R = 30 mΩ and L = 1 nH (solid, highest amplitude), with Z<sub>eff</sub> and 64 pF decoupling (dot-dashed), with Z<sub>eff</sub> and 90.3 pF of intrinsic device decoupling along the interconnect (dashed), and Z<sub>eff</sub> and both decoupling along the power-block and device capacitance along the interconnect (solid, smallest amplitude). --- + V + --- switching pattern was assumed and the lines had l = 3 mm, R = 229 Ω/cm, and Z<sub>drv</sub> = 25 Ω.





Fig. 17. Simulation results for the 3:1 case with 8  $\mu$ m power pitch, 12 signal lines on topmost layer n. Propagated signal on the center line is shown for 166-ps-wide pulse and a) +++ ++++++++ switching pattern. The ideal case, without powerblock (solid) and V<sub>1</sub> = V<sub>dd</sub> = 1.0 V is compared with the cases when a 200  $\mu$ m power-block is assumed and no decoupling (dashed), 200  $\mu$ m power-block and package Zeff (dot-dashed), 200  $\mu$ m power-block, package Zeff and 64 pF decoupling along the power-block and 90.3 pF intrinsic device capacitance along the interconnect (dotted), and 400  $\mu$ m power-block without decoupling (dashed, slower signal). b) Simulation of propagated 166-ps pulse for the ideal case (solid), for ideal case with 0 0 0 0 0 0 + 0 0 0 0 switching pattern (solid), with 200  $\mu$ m power-block and +++ +++ ++ ++ ++ ++ pattern (dot-dashed), with 200  $\mu$ m power-block and --- -- + -- - pattern (dot-dashed, most distorted). The signal lines had 1 = 3 mm, R = 229  $\Omega$ /cm, and Z<sub>drv</sub> = 25  $\Omega$ .