# **IBM Research Report**

# Accounting for Circuitry Type in Assessments of Wire-Length Distribution Models for ULSI Chips

G. Fiorenza, R. Rand, M. Y. Lanzerotti

IBM Research Division Thomas J. Watson Research Center P.O. Box 218 Yorktown Heights, NY 10598



Research Division Almaden - Austin - Beijing - Haifa - India - T. J. Watson - Tokyo - Zurich

LIMITED DISTRIBUTION NOTICE: This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publication, its distributionoutside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g. payment of royalties). Copies may be requested from IBM T. J. Watson Research Center, P. O. Box 218, Yorktown Heights, NY 10598 USA (email: reports@us.ibm.com). Some reports are available on the internet at <a href="http://domino.watson.ibm.com/library/CyberDig.nsf/home">http://domino.watson.ibm.com/library/CyberDig.nsf/home</a>

#### Abstract

Existing assessments of on-chip wirelength distribution models do not into account take the effect of ULSI circuitry type in the analysis. The models have been derived to consider functional circuitry, yet existing assessments are based on the entire group of circuitry in ULSI chip designs. In particular, the existing assessments calculate model inputs and wirelength measurements without taking into account the circuitry types - functional circuitry and synchronization circuitry - into which ULSI chip designs can be partitioned. Since the models are appropriate for functional circuitry, a proper accounting for circuitry type in assessments of on-chip wirelength distribution models is therefore needed. This paper explains how to take circuitry type into account. This procedure involves: (1) partitioning the chip design circuitry into two circuitry types and extracting the netlist for functional circuitry; (2) measuring wirelength contributions of the functional circuitry; (3) deriving model inputs. In this paper, the netlist for functional circuitry is referred to as the *functional netlist* and the model inputs that are derived from the functional netlist are referred to as *functional Rent parameters*.

The goal of this paper is to provide an assessment of on-chip wirelength distribution models in which the analysis properly accounts for circuitry type. To achieve this goal, the paper reviews previous work and then discusses: (1) characteristics of functional circuitry and synchronization circuitry; (2) functional netlists obtained by partitioning design circuitry into two circuitry types; (3) wirelength measurements for functional circuitry; and (4) functional Rent parameters extracted from the functional netlist. Wirelength estimates for functional circuitry are obtained by evaluating the existing models as functions of the functional Rent parameters extracted from the chip design data. As examples, 100 ASIC-like control logic designs in the 1.3GHz *POWER4* microprocessor are selected for wirelength assessments. In this analysis, the contribution of functional circuitry is correctly taken into account, and model estimates show a slight improved agreement with measurements for 64 (64%) of the 100 designs, compared with previous work [11]. Improved qualitative agreement is also seen between each measured wirelength distribution and model wirelength distribution. The paper describes reasons for the improved agreement and reviews additional factors that may contribute to the remaining lack of agreement.

#### **Keywords**

*ULSI*, wirelength distribution models, recursive partitioning, external Rent parameters, topological Rent parameters, synchronization circuitry, functional circuitry.

#### I. INTRODUCTION

A fundamental assumption of existing on-chip wirelength distribution models is the existence of *ULSI* designs that are composed solely of circuitry that implements some logic function. However, the content of today's high-performance chips is not limited

June 8, 2004

DRAFT

#### EXTERNAL PUBLICATION

solely to circuitry that implements functional logic. In fact, real chip designs typically incorporate an additional complex network of circuitry that is designed to synchronize the functional logic circuitry and to ensure correct chip operation at a desired frequency. In this paper, this complex network is referred to as *synchronization circuitry*, and the circuitry that implements the functional logic is referred to as *functional circuitry*. Signals that perform some synchronization task are referred to as *synchronization signals*, and signals that connect functional logic gates are referred to as *functional signals*.

To obtain accurate wirelength estimates for functional circuitry in *ULSI* chips, a new assessment is needed to identify and to account correctly for the separate contributions of the functional circuitry and synchronization circuitry, particularly since synchronization circuitry can occupy a relatively large portion of the design real estate area. For example, in the *POWER4* microprocessor [1], [2], synchronization circuitry occupies 25% - 50% of the allocated real estate area.

In this paper, circuitry type is taken into account in a new analysis of on-chip wirelength distribution models. This procedure involves: (1) partitioning the chip design circuitry into two circuitry types and extracting the netlist for functional circuitry; (2) measuring wirelength contributions from the functional circuitry; (3) deriving new model inputs from the netlist for functional circuitry; and (4) evaluating existing models as functions of the new model inputs. In this paper, a netlist for functional circuitry is referred as a *functional netlist*, and model inputs that are derived from a functional netlist are referred to as *functional Rent parameters*.

Although prior work has considered only function type signals in existing wirelength distribution models [3], [4], [5], [6], [7], [8], [9], [10], assessments of these models have considered the contributions of the entire group of circuitry, rather than accounting for the two circuitry types - functional circuitry and synchronization circuitry - into which ULSI designs can be partitioned. Previous work showed that wirelength distributions obtained from the models tended to underestimate the number of interconnections with large wirelength, although some of the distributions [11]. This work showed that wirelength estimates provided by the Davis (1998) model [3], [12], [13] underestimated the total actual

wirelength requirements by 38% to 70% in each of the six units. In the *POWER4* core, the total wirelength estimate underestimates the measured total wirelength requirement by 58%. Average wirelength estimates provided by the Donath model [4], [5] agreed with average wirelength measurements to within 1% to -18%; average wirelength estimates provided by the Christie (2000) model [6] agreed with measurements to within 24% to 36%.

The goal of this paper is to provide an assessment of on-chip wirelength distribution models that properly accounts for circuitry type in the analysis. The main contributions of this paper are: (1) a description of placement strategies for, and real estate occupied by, functional circuitry and synchronization circuitry; (2) methods to partition circuitry into these two circuitry types and to obtain a functional netlist; (3) a method to obtain wirelength measurements for functional circuitry; (4) a method to extract functional Rent parameters from a functional netlist; and (5) new assessments of existing models using, as examples, 100 ASIC-like control logic designs in the 1.3GHz *POWER4* microprocessor [2]. For this study, the same 100 *POWER4* designs are selected as in previous work [11]. The assessments of the wirelength distribution models presented in this paper properly account for the contribution of synchronization circuitry, and model estimates for 64 (64%) of the 100 designs show improved agreement with measurements. This paper then describes reasons for the improved agreement and reviews additional factors that may contribute to the remaining lack of agreement.

#### II. BRIEF REVIEW OF EXISTING WIRELENGTH DISTRIBUTION MODELS

This paper considers the wirelength distribution models of Donath [4], [5], Davis [3], [12], [13], and Christie [6], [7], [14]. These models were published following Landman and Russo's 1971 interpretation [9] of E. F. Rent's two internal (unpublished) IBM memoranda. Briefly, the models assume that a chip design is a fully-packed tiled array of square gates with height described as the *gatepitch*. Expressions in the Davis (1998) model are evaluated as functions of the *external Rent parameters*<sup>1</sup>; this model is described in detail in [3], [12], [10]. Expressions in the Donath (1979) model, Donath (1981) model, and Christie (2000)

<sup>&</sup>lt;sup>1</sup>External Rent parameters are obtained from least-squares linear fits to log-log plots of signal input and output terminals (IO) as a function of the number of gates [3], [10], [12], [13].

#### III. Two circuitry types

This section describes relevant characteristics of synchronization circuitry and functional circuitry in today's *ULSI* chip designs. An understanding of these characteristics is important for developing methods to obtain *functional* Rent parameters, as described in the subsequent sections.

#### A. Circuitry characteristics

As discussed in the introduction, today's ULSI chip designs typically contain two types of circuitry - functional circuitry and synchronization circuitry - and two types of signals - functional signals and synchronization signals. Examples of functional signals are those that perform functions such as addition or multiplication. Examples of synchronization signals are: (1) the local ac global clock signal that drives inputs of local clock buffers (*lcbs*) (note that the global clock distribution drives the local global clock); (2) two ac signals associated with the two clock phases that drive the latch inputs from the *lcb* outputs; and (3) various dc control signals that perform clock-related tasks [1], [2], [15].

Tables I-V show characteristics of the circuitry in *POWER4* designs. These characteristics are the number of gates, number of input/output pins, average fanout, and real estate occupancy for these designs. Note that the number of functional gates  $N_g(f)$  can be less than or equal to the total number of gates  $N_g$ , and the number of input/output pins for functional circuitry  $T_{IO}(f)$  can be less than or equal to the total number of input/output pins  $T_{IO}$ . Also note that the average fanout of signals in functional circuitry is typically lower than the average fanout of signals in the entire group of circuitry, since high-fanout synchronization signals are excluded. The tables show that the synchronization circuitry occupies  $18\% \pm 5\%$  with range [6%, 29%] in the *IFU* designs; 3% and 20% in two of the *FPU* designs; 7%, 18%, and 70% in three of the four *FXU* designs;  $13\% \pm 3\%$  in the *IDU* designs with range [9%, 18%];  $16\% \pm 5\%$  in the 16 *ISU* designs with range [8%, 29%]; and

<sup>&</sup>lt;sup>2</sup> Topological Rent parameters are obtained from least-squares linear fits to log-log plots of the number of terminals as a function of the number of gates for recursively partitioned design hypergraphs [6], [7], [8], [14].

 $18\% \pm 3\%$  with range [11%, 24%] in the 32 LSU designs.

#### B. Placement strategies

Synchronization circuitry is not taken into account in floorplans assumed by existing wirelength distribution models. These floorplans typically assume a topology such as that represented schematically in Fig. 2(a). This figure shows a schematic representation of the model assumption that a chip design consists of a fully-packed tiled array of square gates.

However, an accounting for synchronization circuitry is needed for assessments of wirelength requirements in the majority of today's ULSI chip designs. In particular, for high-performance microprocessors such as the POWER4, the fraction of the total real estate area occupied by the synchronization circuitry can be large, as quantified in the next section, and the choice of placement strategy for the synchronization circuitry can change the floorplan topology. For example, in one placement strategy, the real estate occupied by the synchronization circuitry can be modelled as circular or polygonal regions surrounded by rectangular gates, where the gates represent the functional logic and where the spaces between the gates represent potential locations for non-functional logic such as decoupling capacitors and spare logic gates. This strategy produces a *swiss-cheese*-type floorplan, as shown in Fig. 2(b), where each region represents the locations of one *lcb* and the clustered latches driven by the *lcb*. Gates associated with functional circuitry are interspersed around the regions and around the gates associated with synchronization circuitry.

A proper accounting for the existence of synchronization regions in floorplans assumed by wirelength distribution models will change the values of the wirelength estimates and wirelength distributions provided by the models. Intuitively, the impact of the floorplan topologies assumed by wirelength distribution models in *ULSI* designs is that the existence of regions occupied by synchronization circuitry will tend to reduce the number of very short interconnections and will tend to increase the number of longer interconnections as required to span over the regions. Implementation of these changes will tend to decrease the magnitude of the (negative) slope of the wirelength distributions predicted by the models, and as a result, it is expected that the models will tend to approximate more closely the wirelength distributions measured in *ULSI* chip designs.

#### C. Real estate area

As briefly mentioned in a previous section, the portion of the real estate occupied by synchronization circuitry can be substantial in today's *ULSI* chip designs. This portion tends to be greater than might be expected from an analysis of the gate count allocated to synchronization circuitry, since the real estate area occupied by each gate associated with synchronization circuitry is greater than the real estate area occupied by a typical functional logic gate.

Gate selection for functional logic in ASIC-like control logic designs is determined by an automated logic synthesis program that selects and assembles a group of gates to implement a specified logic function with constraints such as cycle time. The automated program specifies the quantity, drive strength, and type of gates. In contrast, gate selection for synchronization circuitry can occur according to a different process: first the number of latches is specified by human intervention, and then an automated program selects a number of *lcbs* that is appropriate to drive the prespecified quantity of latches. As a result, the number of *lcbs* can take on different values for different chip designs.

The fraction of the real estate occupied by the gates associated with synchronization circuitry is referred to as the occupancy  $O_s$  obtained by taking the total area occupied by these gates and dividing by the total real estate area occupied by the design. The value of the occupancy  $O_s$  is dominated by the area occupied by latches and *lcbs*. The occupancy  $O_f$  of functional circuitry is obtained by taking the total area occupied by the logic gates and dividing by the total design area. The overall design occupancy O is given by the expression  $O = O_s + O_f$ . Since designs typically contain empty space allocated for non-function gates and decoupling capacitors, O < 1. The portion of the occupied area that is filled with functional circuitry is represented as  $f_f = O_f/O$ , and the portion that is filled with synchronization circuitry is represented as  $f_s = O_s/O$ ; thus  $f_f + f_s = 1$ .

Tables I-V show the values of  $O_s$  in designs in the six functional units of the *POWER4*. A comparison of  $O_f$  with  $O_s$  for the 89 designs that contain synchronization circuitry shows that  $O_s$  is typically nearly twice as large as  $O_f$ . For the 18 *IFU* designs,  $O_s \sim$  $45\% \pm 13\%$ , with range [17%, 64%], where only  $18\% \pm 5\%$  of the logic gates are associated with synchronization circuitry. Since the average occupancy O for all circuitry in *IFU* 

June 8, 2004

designs is  $O \sim 72\% \pm 13\%$ , it follows that  $f_s = 63\%$ . Some extreme cases exist; for example, for 6 of the *IFU* designs, the synchronization circuitry occupies approximately three times as much area as that occupied by logic circuitry. Similar results are obtained for designs in the other 5 functional units; in particular, the tables show that the synchronization circuitry typically comprises fewer gates yet occupies a greater amount of real estate area compared with the gate count and real estate of the functional circuitry.

#### IV. CIRCUITRY PARTITIONS

In the previous section, characteristics of *functional circuitry* and *synchronization circuitry* are described, and examples are provided. The purpose of this section is to describe a method to allocate the circuitry of *POWER4* ASIC-like control logic designs into two partitions after the functional circuitry and synchronization circuitry have been identified. The two partitions are: (a) a partition of synchronization circuitry and (b) a second partition of functional circuitry.

For the case of the *POWER4* chip, the partitioning method exists as a result of the existence of a project-wide naming convention that is designated to describe non-function signals early in the design development process. The existence of such a naming convention is recommended and for this study has enabled (1) an allocation of the signals and circuitry into two partitions, and (2) the extraction of modified design hypergraphs for the functional circuitry in each design. In this paper, a hypergraph file that corresponds to the design functional circuitry<sup>3</sup> is referred to as a *functional hypergraph file*.

The first step is to identify the project naming convention. A list of synchronization signals is then compiled. The names of the signals on the list are provided as inputs into a program that generates new hypergraph files; this program processes the complete design netlist and converts it from Cadence database format to an ascii hypergraph text file while excluding the following signals and circuitry: (1) signals on the list and (2) circuitry associated with signals on the list. The new hypergraph files that are generated in this

<sup>3</sup>Each hypergraph file is a textual description of the design netlist and specifies the signals (hyperedges) that are connected to the logic books (vertices) [17], [24]. Correct generation [6], [7], [8], [14] of a hypergraph file should (1) exclude power, ground, and the global clock signal; (2) omit signals that are connected to only one logic book; (3) omit weights in order to obtain a gate count after partitioning; (4) include the external IO terminals and associated signals in the hypergraph file[17], [24]. Terminals are modelled as though they are connected to different cells [24].

The list is also used to obtain wirelength measurements for functional signals. In this step, each signal on the list is identified in each design, and the wirelength segments of the remaining signals are measured and summed.

This process differs from that described in previous work [11] in which the entire group of signals and circuitry was considered for the wirelength measurements and hypergraph file generation. In this paper, the wirelength measurements, measured interconnection distribution functions, and functional hypergraph files are generated for only *functional* signals and *functional* circuitry. For example, the cumulative interconnection distribution function is obtained by plotting the total number of *functional* signals as a function of wirelength.

#### V. Rent parameters for functional logic circuitry

We now describe methods to extract functional Rent parameter pairs from *ULSI* designs. There are two types of functional Rent parameters; they are: (1) external functional Rent parameters and (2) topological functional Rent parameters. In this paper, the term external functional Rent parameters refers to external Rent parameters that are appropriate for functional circuitry; here, these parameters are described with the notation  $\{k^f, p^f\}$ . The term topological functional Rent parameters refers to topological Rent parameters that are appropriate for functional circuitry; here, these parameters are described with the notation  $\{k^f, p^f\}$ .

#### A. Extraction of external functional Rent parameters for functional circuitry

To extract external functional Rent parameters, a count is made of the number of input/output pins  $T_{IO}(f)$  that are connected to functional signals and circuitry in each design<sup>4</sup>, where the parameter f represents functional signals and circuitry. A count is also made of the number of gates  $N_g(f)$  that implements the logic function in each design. For each of the six functional units, a log-log plot of  $T_{IO}(f)$  as a function of  $N_g(f)$  is generated [3], [12], [10]. Each datapoint in this figure is obtained from a single design. For

<sup>&</sup>lt;sup>4</sup>The term *input pin* (I) refers to a pin that connects an input signal from an external design and drives the functional logic in the design; the term *output pin* (O) refers to a pin that drives an output signal generated from within the functional logic to an external design

$$Log(T_{IO}(f)) = Log(k^f) + p^f \times Log(N_g(f)).$$
(1)

From the data generated from the designs in the *POWER4*, six plots are generated and correspond to each of the six functional units; these plots are shown in Fig. 3. External functional Rent parameters  $\{k^f, p^f\}$  are extracted from the plots with Eqn. 1, and the values of these parameters are shown in Table VI. This table shows a comparison of  $\{k^f, p^f\}$  with the Rent parameter pair  $\{k, p\}$ , where the values for  $\{k, p\}$  are derived for all circuitry in [11]. The values and ranges within one standard deviation for both parameter pairs are also shown in the table. The table shows that when the entire group of circuitry is considered,  $0.79 \le k \le 23.3$  and  $0.3 \le p \le 0.69$ , whereas when only functional circuitry is considered,  $0.68 \le k^f < 37.9$  and  $0.21 \le p^f \le 0.72$ . Note that the ranges spanned by  $\{k^f, p^f\}$  are slightly larger than the ranges spanned by  $\{k, p\}$ . The value of  $p^f$  is similar to that of p for three of the units (*IFU*, *FPU*, *ISU*). The value of  $p^f$  exceeds that of p by  $\ge 10\%$  for two of the units (*FXU*, *LSU*), and exceeds that of p for one of the units (*IDU*).

#### B. Extraction of topological functional Rent parameters for functional circuitry

For functional circuitry, the topological functional Rent parameters  $\{k^f *, p^f *\}$  are extracted from functional hypergraph files by computing a series of k-way partitions with multilevel recursive bisection with the hMeTIS[24] software package<sup>5</sup> developed at the University of Minnesota[24]. For these partitions, k is chosen to be given by the list  $\{2, 4, 8, 16, 32, \ldots, \}$  with the highest value in the sequence chosen such that no partition contains zero gates. For these studies, the hMeTIS tool completes the partitioning tasks in less than 30 seconds with the default configurations (shMeTIS)[24], [26], [27] on an IBM RS6000 workstation with 2GB memory running AIX4.3. The inputs to the tool are: the functional hypergraph file; the number of desired partitions (that is,  $k = \{2, 4, 8, \ldots\}$ ); and minimum (1%) allowed imbalance between the partitions during recursive bisection; this choice of imbalance is the same as that in previous work [7], [14], [28]. The number of terminals T for each partition is plotted as a function of the number of gates G for

<sup>&</sup>lt;sup>5</sup>The hMeTIS package tries to directly minimize the number of the hyperedges that span multiple partitions

all partitions on a log-log plot, where no averaging is used.<sup>6</sup> Values for the topological functional Rent parameters  $\{k^f *, p^f *\}$  are obtained from least-squares recursive linear fits of this data to the expression,

$$Log(T) = Log(k^{f}*) + p^{f}* \times Log(G).$$
<sup>(2)</sup>

For the *POWER4* chip, Figs. 4(a)-4(d) show recursive bipartitioning plots for four *POWER4 IFU* designs: *i*1, *i*3, *i*9, and *i*18; in these figures, the raw data are shown as open circles in Region I. Values for  $\{k^f *, p^f *\}$  and  $\{k *, p *\}$  for the four designs are shown in Table VII. This table shows that for three of the four designs  $\{i1, i9, i18\}$ , the values of  $p^f *$  are less than the values of p \*.

# VI. Comparison of model estimates with wirelength measurements for functional logic circuitry

This section presents a comparison of wirelength estimates obtained with existing models with actual wirelength measurements for functional logic circuitry. Comparisons are provided for the following wirelength characteristics:

- (1) Interconnection distributions;
- (2) Average signal wirelength  $L_a(f)$ ;
- (3) Total wirelength requirement  $L_{tot}(p^f)$ .

Model estimates for wirelength requirements of functional circuitry in *ULSI* designs are obtained by evaluating the models reviewed in Section II as functions of the appropriate functional Rent parameters. Actual wirelength distributions and values for actual average wirelengths for functional circuitry in real chip designs; examples are shown for the *POWER4* chip in Fig. 5 and Tables VIII-XIV. Table X summarizes estimates for the total wirelength in each of the six *POWER4* functional units.

#### A. Interconnection distributions

The interconnection distribution in the Donath (1979) model is obtained by evaluating Eqn. 8 in [4] as functions of  $\{k^f *, p^f *\}$ . The interconnection distribution in the Donath

<sup>&</sup>lt;sup>6</sup>Methods for intermediate averaging [6], [8] and geometric averaging [7] of the raw data (open circles) have also been described in the literature [6], [7], [14], [18].

(1981) model is obtained by evaluating Eqn. 8 in [5] as functions of  $\{k^f *, p^f *\}$ . The wirelength distribution in the Davis (1998) model is obtained by evaluating Eqns. 3-5 in [3] as functions of  $\{k^f, p^f\}$ . Two expressions for wirelength distribution models for placement in a plane are obtained from the Christie (2000) model by evaluating Eqns. 23-24 in [6] as functions of  $\{k^f *, p^f *\}$ .

Figure 5 shows a comparison of the measured interconnection density function (solid circles) and measured cumulative interconnection density function (hollow squares) with model distributions provided by the Donath (1979) model (solid line), Donath (1981) model (dashed line), Davis (1998) model (dotted lines), and Christie model (dashed red line) for four *POWER4 IFU* designs: (a) i1, (b) i3, (c) i9, and (d) i18. The three dotted lines represent the model distribution, lower bound, and upper bound provided by the Davis model.

Figure 5 also shows that distributions obtained with the models are improved compared with distributions obtained in previous work [11]. In particular, Figs. 5(c) and 5(d) show some of these qualitative improvements. For example, the slope, curvature, and relatively narrow spread of the distributions obtained with the Davis, Christie, and Donath (1981) models more closely approximate the measured distributions at larger wirelength for large designs. Another qualitative improvement shown in the figures is that the spread in the wirelength distributions provided by the models is much smaller than the spread observed in previous assessments [11], and in fact the estimates more closely approximate the measured wirelength distribution over the middle range of wirelength values. Moreover, the range in wirelength over which the distributions qualitatively agree with the measured distributions is also improved compared with previous work [11]. The figures also show that the distributions obtained with the Christie models (Eqn. 24 in [6] shown by the dashed red line, and Eqn. 23 in [6] shown by the green solid straight line) also more closely approximate the measured wirelength distributions.

#### B. Average wirelength

Table VIII shows a comparison of the measured average wirelength  $L_a(f)$  for functional circuitry for four *POWER4* designs with average wirelength estimates for functional circuitry obtained with the Donath (1979) model (shown in upper half of table) and Christie

June 8, 2004

(2000) model (shown in lower half of table). The estimates are obtained by evaluating the models as functions of the three sets of values of  $\{k^f *, p^f *\}$ . The values of  $\overline{R}_{p^f *}$  are obtained by evaluating Eqn. 15 in [4]; values of  $L_{avg}(p^f *)$  are obtained by evaluating Eqn. 3 in [3] as functions of  $\{k^f *, p^f *\}$ . The errors (in %) are given by the expressions,  $E(\overline{R}_{p^f *}) = \frac{(\overline{R}_{p^f *} - L_a) \cdot 100}{L_a}$  and  $E(L_{avg}(p^f *)) = \frac{(L_{avg}(p^f *) - L_a) \cdot 100}{L_a}$ , where  $E(\overline{R}_{p^f *})$  compares the Donath model estimate with  $L_a$ , and where  $E(L_{avg}(p^f *))$  compares the Christie model estimates with  $L_a$ . The table shows that while the estimates for functional circuitry obtained for the Donath model less closely approximate the values of the measured average wirelength, the estimates obtained with the Christie (2000) model are similar to that of previous work [11], as shown on the left side of the table.

Tables IX and XI-XIV compare the measured average wirelength  $L_a(f)$  for functional circuitry with wirelength estimates  $L_{avg}(p^f)$  obtained for the same functional circuitry with the Davis (1998) model for 100 *POWER4* designs. The error (in %) is given by the expression  $E(L_{avg}(p^f)) = \frac{(L_{avg}(p^f)-L_a)\cdot 100}{L_a}$ . These tables show that the values of the average wirelength estimates obtained with the Davis model tend to approximate more closely the measured values compared with previous results [11], particularly for the largest designs with  $N_g \geq 2300$  gates. For large *IFU* designs with  $N_g \geq 2300$  gates, the errors for the estimates obtained with the Davis model range from -22% to -49%; for small designs with  $N_g \leq 231$  gates, the error ranges from -17% to -40%.

Average wirelength estimates are improved for 13 (72%) *IFU* designs, 3 (75%) *FXU* designs, 13 (81%) *ISU* designs, and 32 (100%) *LSU* designs. The wirelength estimates for the *FPU* and *IDU* designs tend to be relatively unchanged compared with previous work. Overall, the model estimates show improved agreement with measurements for 64 (64%) of the 100 *POWER4* designs, compared with previous work [11].

#### C. Total wirelength

Table X compares the total measured wirelength  $L_T(f)$  for functional circuitry in each *POWER4* unit with an estimate of the total wirelength provided by the Davis (1998) model. The estimate for total wirelength for functional circuitry  $L_{tot}(p^f)$  is obtained with

the expression,

$$L_{tot}(p^f) = \sum_{i=1}^{N_{designs}} N_{signals} \times L_{avg}(p^f), \qquad (3)$$

where  $N_{signals}$  is the number of functional signals in each design, and where the sum is taken over all the designs  $N_{designs}$  in each unit. The error (in %) is given by the expression,  $E(L_{tot}(k^f, p^f)) = \frac{(L_{tot}(k^f, p^f) - L_T) \cdot 100}{L_T}$ . The table shows that these estimates more closely approximate the total wirelength measurements  $L_{tot}(p^f)$  compared with previous results [11].

#### VII. DISCUSSION

The previous section presents an assessment of wirelength distribution models for functional circuitry. The results show that wirelength estimates provided by the models agree more closely with wirelength measurements in actual chip designs when circuitry type is correctly taken into account. Compared with previous work [11], the assessments are improved for 71 (71%) of the 89 *POWER4* chip designs that contain synchronization circuitry. The results are summarized in Table IX-XIV; the results presented in these tables show that the values of the measured wirelengths are reduced approximately 10% in each design when functional circuitry is considered, since the wirelength contributions of the (typically lengthy) synchronization signals are omitted.

This paper also reports qualitative improvement in agreement between the curvature of the measured wirelength distributions and that of the distributions provided by the models. This improvement follows from the reduction in number of synchronization signals, since these signals are omitted in the measurements. Therefore, in the measured distributions, the proportion of signals with large length is reduced, and the proportion of signals with shorter wirelength is increased.

A few reasons for the remaining differences between the model estimates and measurements are:

(1) The models assume square floorplans that are tiled with square blocks, whereas real chip designs are rectangular and are incompletely tiled (that is, O < 1) with rectangular logic gates. Future work is needed to compare designs with aspect ratios that deviate greatly from square with models that take into account the effects of rectangular design [20].

(2) The models assume signals with unity fanout. Future work is needed to compare chip designs with  $\overline{f} >> 1$  with models that take into account the effects of multi-terminal nets [19].

(3) Methods described in the literature (as, for example, in Ref. [3]) to extract  $\{k, p\}$  from linear fits to log-log plots of  $T_{IO}$  as a function of  $N_g$  are based on a 1971 interpretation [9] of two IBM internal (unpublished) memoranda written by E. F. Rent in 1960.

(4) The range of applicability of Rent's rule covers approximately one-half of the range of the gate partition sizes (in Region I of Fig. 4, for example), which is similar to that observed previously [11]. The effect of the remaining gate partition sizes in Region II on wirelength estimates has been modelled with a differential equation by Christie (2001) [29].

#### VIII. CONCLUSIONS

This paper presents a method to take the contribution of circuitry type correctly into account in new assessments of on-chip wirelength distribution models for ULSI chips. Improved comparisons of model estimates with measurements are obtained for the majority of ASIC-like control logic designs in a high-performance microprocessor. The results show that, after properly accounting for circuitry type, the model distributions exhibit a tighter and improved spread compared with previous work [11]. Average wirelength estimates provided by the Davis (1998) model show improved agreement with measurements for 64 of the 100 POWER4 designs that contain both functional circuitry and synchronization circuitry; model estimates show improved agreement with measurements for 8 (73%) of the 11 designs that contain only functional circuitry. Total wirelength estimates obtained with the Davis (1998) model are also improved.

#### IX. ACKNOWLEDGMENTS

We thank George Karypis and Selva Navaratnasothie at the University of Minnesota for discussions about hMeTIS. We thank Dirk Stroobandt and Joni Dambre for pointing us to copies of their Ph.D. theses on the internet. We thank Izzy Bendrihem and Kelvin Lewis for supporting a seamless computing environment.



Fig. 1. Schematic of clocking circuitry associated with a clock distribution. A schematic of the global clock distribution is shown on the left-hand side of the figure [16]. A schematic of the local clock wiring in control logic designs is shown on the right-hand side, where the global clock signal clk is input into local clock buffers that generate dc signals as well as two ac clock signals that drive the latches. In the example shown, the local clock buffer is driving 16 latches.



Fig. 2. Schematics of (a) a floorplan assumed by existing wire-length distribution models, (b) a floorplan for functional circuitry in today's ULSI designs, and (c) a floorplan for combined functional circuitry and synchronization circuitry in today's ULSI designs. In (b) and (c), the small empty regions between the rectangular gates indicate potential locations for decoupling capacitors and spare gates. In (c), the two large empty regions indicate the locations of the synchronization circuitry (not shown).



Fig. 3. (a) The number of input/output pins  $T_{IO}$  as a function of used gates  $N_{gates}$  for the six functional units in the *POWER4* core [11]; (b)The number of external input/output terminals  $T_{IO}(f)$  for functional signals as a function of the number of gates  $N_g(f)$  associated with these functional signals in six *POWER4* units. Least-squares linear fits to each dataset to Eqn. 1 provide values for  $\{k^f, p^f\}$ shown in Table VI.

June 8, 2004



Fig. 4. Number of terminals as a function of number of gates for functional signals in four *POWER4* IFU designs: (a) *i1*, (b) *i3*, (c) *i9*, and (d) *i18*. Open circles indicate values obtained by recursively partitioning each design hypergraph with hMeTIS [24]. Least-squares linear fits to the data-points (open circles) in Region I to Eqn. 2 provide values for  $\{k^f *, p^f *\}$  shown in Table VII.

June 8, 2004



Fig. 5. Comparison of the measured interconnection density function for functional circuitry (solid circles) and measured cumulative interconnection density function (hollow squares) with Donath (1979) model (solid line), Donath (1981) model (dashed line), Davis (1998) model (dotted line), and Christie model (dashed red line) for four *POWER4 IFU* designs: (a) *i1*, (b) *i3*, (c) *i9*, and (d) *i18*. The Donath and Christie models are evaluated as functions of  $\{k^f *, p^f *\}$  shown in Table VII, and the Davis model is evaluated as functions of  $\{k^f, p^f\}$  shown in Table VI.

June 8, 2004

20

# TABLE I

Physical design characteristics for POWER4 IFU, where  $f_s = O_s/O$ , and  $f_f = O_f/O$ .

| IFU        | All   | All Circuitry [11] |     |      | 1        | Functiona   | al Cire | cuitry |       | Synci | hronization Circuitry |
|------------|-------|--------------------|-----|------|----------|-------------|---------|--------|-------|-------|-----------------------|
| Design     | $N_g$ | $T_{IO}$           | f   | 0    | $N_g(f)$ | $T_{IO}(f)$ | f       | $f_f$  | $O_f$ | $f_s$ | $O_s$                 |
| <i>i1</i>  | 70    | 24                 | 1.6 | 0.53 | 50       | 18          | 1.6     | 0.71   | 0.12  | 0.29  | 0.41                  |
| i2         | 220   | 94                 | 1.9 | 0.80 | 159      | 88          | 1.6     | 0.72   | 0.21  | 0.28  | 0.59                  |
| i3         | 225   | 33                 | 1.7 | 0.67 | 170      | 27          | 1.5     | 0.76   | 0.12  | 0.24  | 0.55                  |
| <i>i4</i>  | 231   | 22                 | 2.0 | 0.78 | 183      | 16          | 1.9     | 0.79   | 0.23  | 0.21  | 0.55                  |
| i5         | 779   | 24                 | 1.9 | 0.69 | 651      | 18          | 1.8     | 0.84   | 0.26  | 0.16  | 0.43                  |
| i6         | 964   | 162                | 1.9 | 0.69 | 769      | 154         | 1.7     | 0.80   | 0.24  | 0.20  | 0.45                  |
| i7         | 967   | 79                 | 2.0 | 0.71 | 809      | 73          | 1.6     | 0.84   | 0.31  | 0.16  | 0.40                  |
| i8         | 1042  | 36                 | 1.9 | 0.77 | 860      | 30          | 1.7     | 0.83   | 0.27  | 0.17  | 0.50                  |
| i9         | 1053  | 114                | 1.7 | 0.83 | 848      | 108         | 1.5     | 0.81   | 0.26  | 0.19  | 0.57                  |
| <i>i10</i> | 1118  | 35                 | 1.9 | 0.83 | 921      | 29          | 1.7     | 0.82   | 0.29  | 0.18  | 0.54                  |
| <i>i11</i> | 1250  | 110                | 1.7 | 0.82 | 989      | 104         | 1.5     | 0.79   | 0.18  | 0.21  | 0.64                  |
| <i>i12</i> | 2323  | 173                | 2.0 | 0.31 | 1946     | 166         | 1.7     | 0.84   | 0.14  | 0.16  | 0.17                  |
| <i>i13</i> | 2561  | 607                | 2.3 | 0.88 | 2137     | 599         | 1.9     | 0.83   | 0.35  | 0.17  | 0.53                  |
| i14        | 2691  | 128                | 2.0 | 0.76 | 2335     | 122         | 1.7     | 0.87   | 0.40  | 0.13  | 0.36                  |
| <i>i15</i> | 2746  | 223                | 2.0 | 0.72 | 2310     | 217         | 1.7     | 0.84   | 0.30  | 0.16  | 0.42                  |
| <i>i16</i> | 2871  | 379                | 2.1 | 0.81 | 2697     | 373         | 2.0     | 0.94   | 0.62  | 0.06  | 0.19                  |
| <i>i17</i> | 4934  | 282                | 2.0 | 0.64 | 4229     | 275         | 1.6     | 0.86   | 0.29  | 0.14  | 0.35                  |
| <i>i18</i> | 5459  | 407                | 1.9 | 0.72 | 4607     | 401         | 1.6     | 0.84   | 0.32  | 0.16  | 0.40                  |

#### TABLE II

Physical design characteristics for  $POWER4\ FPU$  (upper table) and  $POWER4\ FXU$ 

| FPU        | All   | circu    | itry [. | 11]  | 1        | Functiona   | al Cire | cuitry |       | Sync  | hronization Circuitry |
|------------|-------|----------|---------|------|----------|-------------|---------|--------|-------|-------|-----------------------|
| Design     | $N_g$ | $T_{IO}$ | f       | 0    | $N_g(f)$ | $T_{IO}(f)$ | f       | $f_f$  | $O_f$ | $f_s$ | $O_s$                 |
| f1         | 41    | 23       | 1.4     | 0.24 | 41       | 23          | 1.4     | 1.0    | 0.24  | 0.0   | 0.0                   |
| f2         | 46    | 29       | 1.3     | 0.39 | 46       | 29          | 1.3     | 1.0    | 0.39  | 0.0   | 0.0                   |
| f3         | 133   | 58       | 1.6     | 0.40 | 133      | 58          | 1.6     | 1.0    | 0.40  | 0.0   | 0.0                   |
| <i>f</i> 4 | 201   | 76       | 1.9     | 0.68 | 201      | 76          | 1.9     | 1.0    | 0.68  | 0.0   | 0.0                   |
| f5         | 219   | 92       | 1.6     | 0.74 | 176      | 86          | 1.6     | 0.80   | 0.18  | 0.20  | 0.56                  |
| f6         | 236   | 102      | 1.6     | 0.51 | 236      | 102         | 1.6     | 1.0    | 0.51  | 0.0   | 0.0                   |
| f7         | 257   | 45       | 2.1     | 0.45 | 257      | 45          | 2.1     | 1.0    | 0.45  | 0.0   | 0.0                   |
| f8         | 300   | 68       | 1.9     | 0.64 | 300      | 68          | 1.9     | 1.0    | 0.64  | 0.0   | 0.0                   |
| f9         | 356   | 184      | 1.6     | 0.52 | 356      | 184         | 1.6     | 1.0    | 0.52  | 0.0   | 0.0                   |
| f10        | 398   | 169      | 1.7     | 0.73 | 398      | 169         | 1.7     | 1.0    | 0.73  | 0.0   | 0.0                   |
| f11        | 497   | 156      | 1.8     | 0.47 | 484      | 154         | 1.8     | 0.97   | 0.45  | 0.03  | 0.02                  |
| f12        | 555   | 82       | 2.1     | 0.55 | 555      | 82          | 2.1     | 1.0    | 0.55  | 0.0   | 0.0                   |
| FXU        | All   | Circu    | itry [  | [11] | ]        | Functiona   | al Cire | cuitry |       | Sync  | hronization Circuitry |
| Design     | $N_g$ | $T_{IO}$ | f       | 0    | $N_g(f)$ | $T_{IO}(f)$ | f       | $f_f$  | $O_f$ | $f_s$ | $O_s$                 |
| x1         | 5     | 14       | 1.0     | 0.48 | 5        | 14          | 1.0     | 1.0    | 0.46  | 1.0   | 0.0                   |
| x2         | 33    | 24       | 1.6     | 0.75 | 10       | 9           | 1.5     | 0.30   | 0.01  | 0.70  | 0.74                  |
| xЗ         | 304   | 190      | 1.1     | 0.65 | 283      | 182         | 1.1     | 0.93   | 0.56  | 0.07  | 0.09                  |
| <i>x</i> 4 | 2283  | 443      | 2.1     | 0.64 | 1868     | 435         | 2.0     | 0.82   | 0.24  | 0.18  | 0.40                  |

(LOWER TABLE).

DRAFT

## TABLE III

# Physical design characteristics for $POWER4\ IDU.$

| IDU            | All   | Circu    | itry [ | [11] | 1        | Functiona   | l Cire | cuitry |       | Synci | hronization Circuitry |
|----------------|-------|----------|--------|------|----------|-------------|--------|--------|-------|-------|-----------------------|
| Design         | $N_g$ | $T_{IO}$ | f      | Ο    | $N_g(f)$ | $T_{IO}(f)$ | f      | $f_f$  | $O_f$ | $f_s$ | $O_s$                 |
| d1             | 86    | 78       | 1.0    | 0.57 | 86       | 78          | 1.0    | 1.0    | 0.22  | 0.0   | 0.35                  |
| $d\mathcal{2}$ | 252   | 118      | 1.5    | 0.71 | 206      | 110         | 1.2    | 0.82   | 0.25  | 0.18  | 0.46                  |
| $d\beta$       | 513   | 227      | 1.6    | 0.54 | 423      | 217         | 1.3    | 0.82   | 0.19  | 0.18  | 0.35                  |
| d4             | 524   | 99       | 2.1    | 0.61 | 444      | 91          | 2.0    | 0.85   | 0.23  | 0.15  | 0.38                  |
| d5             | 585   | 99       | 1.7    | 0.78 | 493      | 88          | 1.5    | 0.84   | 0.32  | 0.16  | 0.46                  |
| d6             | 677   | 166      | 1.7    | 0.70 | 593      | 164         | 1.5    | 0.84   | 0.44  | 0.12  | 0.26                  |
| $d\gamma$      | 755   | 56       | 1.9    | 0.58 | 755      | 56          | 1.9    | 0.88   | 0.49  | 0.0   | 0.09                  |
| d8             | 1238  | 298      | 1.9    | 0.53 | 1115     | 290         | 1.9    | 1.0    | 0.28  | 0.10  | 0.25                  |
| d9             | 1464  | 258      | 2.0    | 0.69 | 1290     | 250         | 1.7    | 0.90   | 0.29  | 0.12  | 0.40                  |
| <i>d10</i>     | 1497  | 143      | 2.1    | 0.86 | 1357     | 136         | 1.9    | 0.88   | 0.49  | 0.09  | 0.37                  |
| d11            | 1498  | 255      | 1.9    | 0.60 | 1301     | 244         | 1.7    | 0.91   | 0.26  | 0.13  | 0.34                  |
| d12            | 1500  | 260      | 1.9    | 0.60 | 1299     | 249         | 1.7    | 0.87   | 0.28  | 0.13  | 0.32                  |
| d13            | 1587  | 280      | 1.9    | 0.69 | 1385     | 269         | 1.7    | 0.87   | 0.31  | 0.13  | 0.38                  |
| d14            | 1697  | 130      | 2.0    | 0.46 | 1697     | 130         | 2.0    | 1.0    | 0.40  | 0.0   | 0.06                  |
| d15            | 2008  | 386      | 1.9    | 0.75 | 1791     | 375         | 1.7    | 0.89   | 0.38  | 0.11  | 0.37                  |
| d16            | 2082  | 136      | 2.0    | 0.54 | 2082     | 136         | 2.0    | 1.0    | 0.49  | 0.0   | 0.05                  |
| d17            | 2091  | 136      | 2.0    | 0.56 | 2091     | 136         | 2.0    | 1.0    | 0.51  | 0.0   | 0.05                  |
| d18            | 2685  | 147      | 2.0    | 0.57 | 2685     | 147         | 2.0    | 1.0    | 0.51  | 0.0   | 0.06                  |

| ISU        | All   | Circu    | itry [ | [11] | 1        | Functiona   | al Cire | cuitry |       | Synci | hronization Circuitry |
|------------|-------|----------|--------|------|----------|-------------|---------|--------|-------|-------|-----------------------|
| Design     | $N_g$ | $T_{IO}$ | f      | 0    | $N_g(f)$ | $T_{IO}(f)$ | f       | $f_f$  | $O_f$ | $f_s$ | $O_s$                 |
| s1         | 323   | 73       | 1.7    | 0.83 | 246      | 66          | 1.5     | 0.76   | 0.17  | 0.24  | 0.66                  |
| s2         | 331   | 83       | 2.0    | 0.69 | 269      | 76          | 1.7     | 0.81   | 0.3   | 0.19  | 0.39                  |
| s3         | 360   | 150      | 1.6    | 0.90 | 281      | 142         | 1.2     | 0.78   | 0.27  | 0.22  | 0.63                  |
| <i>s</i> 4 | 656   | 281      | 1.6    | 0.78 | 533      | 274         | 1.5     | 0.81   | 0.2   | 0.19  | 0.58                  |
| s5         | 1188  | 476      | 1.9    | 0.87 | 847      | 469         | 1.7     | 0.71   | 0.13  | 0.29  | 0.74                  |
| s6         | 1277  | 137      | 1.9    | 0.69 | 1081     | 130         | 1.9     | 0.85   | 0.24  | 0.15  | 0.45                  |
| s7         | 1299  | 326      | 1.8    | 0.75 | 1019     | 319         | 1.4     | 0.78   | 0.18  | 0.22  | 0.57                  |
| s8         | 1649  | 226      | 1.9    | 0.77 | 1459     | 219         | 1.7     | 0.88   | 0.38  | 0.12  | 0.39                  |
| s9         | 1719  | 592      | 1.7    | 0.71 | 1521     | 585         | 1.5     | 0.88   | 0.39  | 0.12  | 0.32                  |
| s10        | 1798  | 252      | 2.1    | 0.91 | 1552     | 245         | 2.1     | 0.86   | 0.37  | 0.14  | 0.54                  |
| s11        | 2347  | 254      | 2.0    | 0.84 | 2004     | 247         | 1.9     | 0.85   | 0.36  | 0.15  | 0.48                  |
| s12        | 2485  | 245      | 2.1    | 0.82 | 2103     | 238         | 1.8     | 0.85   | 0.3   | 0.15  | 0.52                  |
| s13        | 3207  | 337      | 2.1    | 0.54 | 2745     | 329         | 2.0     | 0.86   | 0.2   | 0.14  | 0.34                  |
| s14        | 3766  | 155      | 1.7    | 0.73 | 3458     | 148         | 1.6     | 0.92   | 0.39  | 0.08  | 0.34                  |
| s15        | 3962  | 410      | 2.2    | 0.69 | 3408     | 382         | 2.2     | 0.86   | 0.29  | 0.14  | 0.40                  |
| s16        | 6578  | 173      | 2.5    | 0.62 | 5496     | 166         | 2.4     | 0.84   | 0.25  | 0.16  | 0.37                  |

# TABLE IV

# Physical design characteristics for $POWER4\ ISU.$

## TABLE V

| LSU          | All   | Circui   | itry [1 | [1]  | 1        | Functiona   | l Cire | cuitry |       | Synci | hronization Circuitry |
|--------------|-------|----------|---------|------|----------|-------------|--------|--------|-------|-------|-----------------------|
| Design       | $N_g$ | $T_{IO}$ | f       | Ο    | $N_g(f)$ | $T_{IO}(f)$ | f      | $f_f$  | $O_f$ | $f_s$ | $O_s$                 |
| <i>l1</i>    | 117   | 25       | 1.3     | 0.75 | 90       | 16          | 1.0    | 0.77   | 0.19  | 0.23  | 0.56                  |
| 12           | 259   | 152      | 1.5     | 0.43 | 205      | 133         | 1.1    | 0.79   | 0.10  | 0.21  | 0.33                  |
| 13           | 294   | 155      | 1.5     | 0.50 | 235      | 128         | 1.1    | 0.80   | 0.12  | 0.20  | 0.38                  |
| <i>l4</i>    | 506   | 95       | 2.0     | 0.46 | 401      | 86          | 1.7    | 0.79   | 0.14  | 0.21  | 0.32                  |
| <i>l5</i>    | 567   | 67       | 2.2     | 0.91 | 457      | 59          | 2.1    | 0.81   | 0.37  | 0.19  | 0.54                  |
| <i>l6</i>    | 641   | 167      | 1.8     | 0.77 | 502      | 158         | 1.6    | 0.78   | 0.25  | 0.22  | 0.52                  |
| <i>l</i> 7   | 687   | 100      | 2.0     | 0.68 | 584      | 91          | 1.6    | 0.85   | 0.21  | 0.15  | 0.47                  |
| 18           | 1011  | 147      | 1.7     | 0.70 | 825      | 128         | 1.6    | 0.82   | 0.20  | 0.18  | 0.50                  |
| 19           | 1024  | 124      | 1.8     | 0.80 | 835      | 115         | 1.7    | 0.82   | 0.25  | 0.18  | 0.55                  |
| <i>l10</i>   | 1191  | 354      | 1.8     | 0.58 | 1014     | 345         | 1.5    | 0.85   | 0.23  | 0.15  | 0.35                  |
| l11          | 1235  | 367      | 1.8     | 0.63 | 1055     | 358         | 1.5    | 0.85   | 0.31  | 0.15  | 0.32                  |
| l12          | 1392  | 156      | 2.0     | 0.83 | 1200     | 130         | 1.6    | 0.86   | 0.38  | 0.14  | 0.45                  |
| l13          | 1527  | 193      | 2.0     | 0.71 | 1291     | 184         | 1.8    | 0.85   | 0.27  | 0.15  | 0.44                  |
| l14          | 1655  | 304      | 1.9     | 0.76 | 1396     | 296         | 1.9    | 0.84   | 0.28  | 0.16  | 0.48                  |
| l15          | 1722  | 397      | 1.8     | 0.59 | 1375     | 388         | 1.6    | 0.80   | 0.16  | 0.20  | 0.43                  |
| l16          | 1835  | 277      | 1.9     | 0.68 | 1516     | 268         | 1.5    | 0.83   | 0.22  | 0.17  | 0.46                  |
| <i>l17</i>   | 1892  | 320      | 1.9     | 0.67 | 1466     | 293         | 1.8    | 0.77   | 0.22  | 0.23  | 0.45                  |
| l18          | 1920  | 539      | 1.8     | 0.73 | 1637     | 530         | 1.5    | 0.85   | 0.29  | 0.15  | 0.44                  |
| <i>l19</i>   | 1954  | 483      | 1.8     | 0.66 | 1600     | 474         | 1.7    | 0.82   | 0.18  | 0.18  | 0.48                  |
| 120          | 2241  | 71       | 2.0     | 0.65 | 1847     | 64          | 2.0    | 0.82   | 0.23  | 0.18  | 0.42                  |
| l21          | 2348  | 350      | 1.7     | 0.55 | 1874     | 342         | 1.6    | 0.80   | 0.22  | 0.20  | 0.33                  |
| l22          | 2353  | 424      | 1.9     | 0.63 | 1947     | 415         | 1.5    | 0.83   | 0.24  | 0.17  | 0.39                  |
| l23          | 2368  | 216      | 1.9     | 0.60 | 1988     | 207         | 1.7    | 0.84   | 0.25  | 0.16  | 0.35                  |
| l24          | 2516  | 395      | 1.8     | 0.66 | 2027     | 386         | 1.7    | 0.81   | 0.19  | 0.19  | 0.47                  |
| l25          | 2569  | 317      | 1.9     | 0.64 | 2180     | 309         | 1.8    | 0.85   | 0.24  | 0.15  | 0.40                  |
| l26          | 3398  | 238      | 2.1     | 0.76 | 3000     | 230         | 2.0    | 0.88   | 0.38  | 0.12  | 0.38                  |
| Jund 2,72004 | 3728  | 433      | 1.9     | 0.76 | 3143     | 425         | 1.5    | 0.84   | 0.33  | 0.16  | DRAF43                |
| <i>l28</i>   | 3866  | 358      | 2.1     | 0.78 | 3429     | 351         | 1.8    | 0.89   | 0.44  | 0.11  | 0.34                  |
| l29          | 4025  | 367      | 2.4     | 0.57 | 3061     | 358         | 1.8    | 0.76   | 0.17  | 0.24  | 0.40                  |
| 130          | 4622  | 604      | 1.7     | 0.73 | 3744     | 595         | 1.6    | 0.81   | 0.19  | 0.19  | 0.54                  |
| l31          | 4785  | 49       | 2.0     | 0.67 | 4053     | 40          | 1.9    | 0.85   | 0.27  | 0.15  | 0.40                  |

# Physical design characteristics for *POWER4 LSU*.

# TABLE VI

Comparison of Rent parameter pair  $\{k, p\}$  with  $\{k^f, p^f\}$  for the Six *POWER4* functional units. Ranges indicates values within one standard deviation of the parameter pairs.

| POWER4                 | All Sigr         | nals [11]        | Function          | al Signals       |
|------------------------|------------------|------------------|-------------------|------------------|
| Unit (# designs)       | k [range]        | p [range]        | $k^f$ [range]     | $p^f$ [range]    |
| $IFU \ designs \ (18)$ | 0.79[0.29, 2.14] | 0.69[0.55, 0.84] | 0.68[0.24, 1.90]  | 0.72[0.57, 0.87] |
| $FPU \ designs \ (12)$ | 2.21[1.06, 4.58] | 0.66[0.52, 0.79] | 2.30[1.10, 4.81]  | 0.65[0.52, 0.79] |
| $FXU \ designs \ (4)$  | 4.36[2.81, 6.78] | 0.61[0.52, 0.69] | 3.29[1.98, 5.47]  | 0.66[0.56, 0.77] |
| $IDU \ designs \ (18)$ | 20.5[8.44, 49.8] | 0.30[0.17, 0.43] | 37.9[15.01, 95.6] | 0.21[0.07, 0.34] |
| $ISU \ designs \ (16)$ | 23.3[7.79, 69.9] | 0.31[0.16, 0.46] | 25.2[8.64, 73.4]  | 0.30[0.15, 0.45] |
| $LSU \ designs \ (32)$ | 7.33[3.02, 17.8] | 0.46[0.34, 0.58] | 5.27[2.10, 13.2]  | 0.51[0.38, 0.63] |

#### TABLE VII

Comparison of Rent parameter pairs  $\{k^*, p^*\}$  and  $\{k^f^*, p^f^*\}$ . For  $k^*$  and  $k^f^*$ , the range indicates the values within one standard deviation. For  $p^*$  and  $p^f^*$ , the value is expressed as the value  $\pm$  one standard deviation.

| IFU         | All Signa        | ls [11]       | Functional Signals |               |  |  |
|-------------|------------------|---------------|--------------------|---------------|--|--|
| Design      | k*[range]        | p*            | $k^f * [range]$    | $p^f*$        |  |  |
| <i>i1</i>   | 1.95[1.66, 2.29] | $0.71\pm0.12$ | 2.40[2.10, 2.75]   | $0.53\pm0.09$ |  |  |
| i3          | 2.39[2.16, 2.65] | $0.59\pm0.06$ | 1.62[1.46, 1.80]   | $0.75\pm0.06$ |  |  |
| i9          | 2.33[2.22, 2.44] | $0.63\pm0.02$ | 2.35[2.25, 2.45]   | $0.57\pm0.02$ |  |  |
| <i>i</i> 18 | 2.16[2.13, 2.19] | $0.73\pm0.01$ | 1.92[1.89, 1.94]   | $0.67\pm0.01$ |  |  |

## TABLE VIII

Comparison of average wirelength measurements (in gatepitches) in four POWER4 IFU designs with estimates provided by the Donath model and Christie model. Errors are shown in %.

| IFU        |       | All Signals         | [11]                   |          | Functional S          | Signals                  |
|------------|-------|---------------------|------------------------|----------|-----------------------|--------------------------|
|            | Data  | Dor                 | nath                   | Data     | Da                    | onath                    |
| Design     | $L_a$ | $\overline{R}_{p*}$ | $E(\overline{R}_{p*})$ | $L_a(f)$ | $\overline{R}_{p^f*}$ | $E(\overline{R}_{p^f*})$ |
| i1         | 3.3   | 2.9[2.7, 3.2]       | -10                    | 3.2      | 2.4[2.3, 2.6]         | -24                      |
| i3         | 4.2   | 3.4[3.0, 3.9]       | -18                    | 3.8      | 3.8[3.6, 4.1]         | 0                        |
| i9         | 4.8   | 4.9[3.9, 6.1]       | 1                      | 4.3      | 4.3[4.1, 4.4]         | -2                       |
| <i>i18</i> | 9.5   | 8.8[6.3, 12.4]      | -7                     | 8.9      | 7.1[7.0, 7.3]         | -20                      |
|            | Data  | Chr                 | istie                  | Data     | Christie              |                          |
| Design     | $L_a$ | $L_{avg}(p*)$       | $E(L_{avg}(p*))$       | $L_a(f)$ | $L_{avg}(p^f*)$       | $E(L_{avg}(p^f*))$       |
| <i>i1</i>  | 3.3   | 2.4[2.2, 2.6]       | -27                    | 3.2      | 2.0[1.9, 2.1]         | -36                      |
| i3         | 4.2   | 2.7[2.4, 3.0]       | -36                    | 3.8      | 3.0[2.8, 3.2]         | -23                      |
| i9         | 4.8   | 3.7[3.1, 4.5]       | -24                    | 4.3      | 3.3[3.2, 3.4]         | -25                      |
| <i>i18</i> | 9.5   | 6.2[4.7, 8.5]       | -34                    | 8.9      | 5.2[5.0, 5.3]         | -42                      |

#### TABLE IX

Comparison of average wirelength measurements (in gatepitches) for the POWER4 IFU designs with estimates provided by the Davis model. Errors are shown in %.

| IFU        |       | All Signals [1       | 1]              |          | Functional Sign       | nals              |
|------------|-------|----------------------|-----------------|----------|-----------------------|-------------------|
|            | Data  | Davis                | 5               | Data     | Dava                  | İS                |
|            | $L_a$ | $L_{avg}(p)$ [range] | $E(L_{avg}(p))$ | $L_a(f)$ | $L_{avg}(p^f)[range]$ | $E(L_{avg}(p^f))$ |
| <i>i1</i>  | 3.3   | 2.4[2.2, 2.6]        | -28             | 3.2      | 2.3[2.1, 2.5]         | -29               |
| i2         | 5.5   | 3.0[2.6, 3.5]        | -46             | 4.8      | 2.9[2.5, 3.3]         | -40               |
| i3         | 4.2   | 3.0[2.6, 3.5]        | -29             | 3.8      | 2.9[2.5, 3.4]         | -25               |
| i4         | 4.0   | 3.0[2.6, 3.5]        | -24             | 3.5      | 2.9[2.5, 3.4]         | -17               |
| i5         | 3.7   | 3.8[3.1, 4.8]        | 2               | 3.3      | 3.8[3.1, 4.8]         | 17                |
| i6         | 6.2   | 4.0[3.2, 5.1]        | -36             | 5.6      | 4.0[3.2, 5.0]         | -29               |
| <i>i</i> 7 | 5.7   | 4.0[3.2, 5.1]        | -29             | 4.8      | 4.0[3.2, 5.1]         | -17               |
| i8         | 4.1   | 4.1[3.3, 5.2]        | 0               | 3.5      | 4.1[3.2, 5.2]         | 17                |
| i9         | 4.8   | 4.1[3.3, 5.2]        | -16             | 4.3      | 4.1[3.3, 5.2]         | -6                |
| <i>i10</i> | 3.6   | 4.1[3.3, 5.3]        | 14              | 3.1      | 4.1[3.3, 5.3]         | 34                |
| <i>i11</i> | 4.2   | 4.2[3.3, 5.4]        | 1               | 3.7      | 4.2[3.3, 5.4]         | 15                |
| <i>i12</i> | 9.3   | 4.8[3.7, 6.4]        | -49             | 8.7      | 4.9[3.7, 6.6]         | -44               |
| i13        | 9.5   | 4.9[3.7, 6.6]        | -49             | 8.8      | 5.0[3.7, 6.8]         | -44               |
| i14        | 7.0   | 4.9[3.7, 6.7]        | -30             | 6.5      | 5.0[3.8, 6.9]         | -22               |
| <i>i15</i> | 8.1   | 4.9[3.7, 6.7]        | -39             | 7.6      | 5.0[3.8, 6.9]         | -33               |
| <i>i16</i> | 10.4  | 5.0[3.8, 6.8]        | -52             | 10.1     | 5.2[3.8, 7.2]         | -49               |
| <i>i17</i> | 8.7   | 5.6[4.1, 7.9]        | -36             | 7.8      | 5.7[4.1, 8.3]         | -26               |
| <i>i18</i> | 9.5   | 5.7[4.1, 8.2]        | -40             | 8.9      | 5.9[4.2, 8.5]         | -34               |

DRAFT

#### TABLE X

Comparison of total measured wirelength (in gatepitches) with estimates provided by the Davis model for the POWER4 chip. Errors are shown in %.

| Unit $(#)$       | Chara             | cteristics        | Data      | Davis                        |                   |
|------------------|-------------------|-------------------|-----------|------------------------------|-------------------|
| All Signals [11] | $\overline{f}$    | $\overline{O}$    | $L_T$     | $L_{tot}(p)[range]$          | $E(L_{tot}(p))$   |
| <i>IFU</i> (18)  | $1.9 \pm 0.2$     | $0.72\pm0.10$     | 226826.9  | 141256.9[106963.1, 192256.1] | -38               |
| <i>FPU</i> (12)  | $1.7 \pm 0.2$     | $0.60\pm0.16$     | 21915.9   | 11886.8[10244.9, 13999.2]    | -46               |
| <i>FXU</i> (4)   | $1.4 \pm 0.5$     | $0.67\pm0.14$     | 26030.9   | 10380.8[8991.9, 12140.4]     | -60               |
| <i>IDU</i> (18)  | $1.8 \pm 0.3$     | $0.63\pm0.10$     | 148700.6  | 59240.1[52017.7, 69503.6]    | -60               |
| <i>ISU</i> (16)  | $1.9 \pm 0.2$     | $0.76\pm0.10$     | 309901.8  | 92190.1[77970.2, 114605.0]   | -70               |
| <i>LSU</i> (32)  | $1.9 \pm 0.2$     | $0.67\pm0.11$     | 553852.8  | 220556.9[183878.7, 274265.0] | -60               |
| POWER4           | $1.8 \pm 0.3$     | $0.68\pm0.13$     | 1287230.0 | 535511.6[440066.5, 676769.3] | -58               |
| Functional       | $\overline{f(f)}$ | $\overline{O}(f)$ | $L_T(f)$  | $L_{tot}(p^f)[range]$        | $E(L_{tot}(p^f))$ |
| <i>IFU</i> (18)  | $1.7 \pm 0.1$     | $0.27\pm0.12$     | 200164.4  | 137568.6[102738.8, 189032.2] | -31               |
| <i>FPU</i> (12)  | $1.7 \pm 0.2$     | $0.48\pm0.16$     | 21805.1   | 11731.6[10109.1, 13826.8]    | -46               |
| <i>FXU</i> (4)   | $1.4 \pm 0.4$     | $0.32\pm0.24$     | 24634.4   | 10991.4[9222.6, 13301.7]     | -55               |
| <i>IDU</i> (18)  | $1.7 \pm 0.3$     | $0.35\pm0.11$     | 140842.3  | 52723.0[47032.1, 60832.6]    | -63               |
| <i>ISU</i> (16)  | $1.8 \pm 0.3$     | $0.28\pm0.09$     | 286661.6  | 87765.5[74735.9, 108056.5]   | -69               |
| <i>LSU</i> (32)  | $1.6 \pm 0.3$     | $0.25\pm0.08$     | 502571.6  | 228186.2[186731.6, 289601.9] | -55               |
| POWER4           | $1.7 \pm 0.3$     | $0.31\pm0.13$     | 1176679.4 | 528966.2[430570.1,674651.8]  | -55               |

#### TABLE XI

Comparison of average wirelength measurements (in gatepitches) for POWER4 FPU designs (upper table) and FXU designs (lower table) with estimates provided by the Davis model. Errors are shown in %.

| FPU        |       | All Signals [1       | 1]              |          | Functional Sign       | nals              |
|------------|-------|----------------------|-----------------|----------|-----------------------|-------------------|
|            | Data  | Davis                | 5               | Data     | Dava                  | $\dot{s}$         |
|            | $L_a$ | $L_{avg}(p)$ [range] | $E(L_{avg}(p))$ | $L_a(f)$ | $L_{avg}(p^f)[range]$ | $E(L_{avg}(p^f))$ |
| <i>f1</i>  | 3.3   | 2.1[2.0, 2.3]        | -36             | 3.3      | 2.1[2.0, 2.3]         | -36               |
| f2         | 2.7   | 2.1[2.0, 2.3]        | -22             | 2.7      | 2.1[2.0, 2.3]         | -22               |
| f3         | 4.5   | 2.6[2.3, 2.9]        | -42             | 4.5      | 2.6[2.3, 2.9]         | -43               |
| f4         | 5.2   | 2.8[2.5, 3.2]        | -46             | 5.2      | 2.8[2.5, 3.2]         | -46               |
| f5         | 4.2   | 2.9[2.5, 3.3]        | -32             | 4.0      | 2.7[2.4, 3.1]         | -32               |
| f6         | 4.7   | 2.9[2.5, 3.4]        | -38             | 4.7      | 2.9[2.5, 3.3]         | -39               |
| f7         | 5.8   | 2.9[2.6, 3.4]        | -50             | 5.8      | 2.9[2.6, 3.4]         | -50               |
| f8         | 5.3   | 3.0[2.6, 3.6]        | -42             | 5.3      | 3.0[2.6, 3.5]         | -43               |
| f9         | 6.0   | 3.1[2.7, 3.7]        | -48             | 6.0      | 3.1[2.7, 3.7]         | -48               |
| <i>f10</i> | 7.1   | 3.2[2.7, 3.8]        | -55             | 7.1      | 3.2[2.7, 3.8]         | -55               |
| <i>f11</i> | 5.0   | 3.3[2.8, 4.0]        | -34             | 5.0      | 3.3[2.8, 4.0]         | -34               |
| f12        | 7.3   | 3.4[2.9, 4.1]        | -53             | 7.3      | 3.4[2.8, 4.1]         | -54               |
| FXU        | Data  | Davis                | 5               | Data     | Dava                  | s                 |
|            | $L_a$ | $L_{avg}(p)$ [range] | $E(L_{avg}(p))$ | $L_a(f)$ | $L_{avg}(p^f)[range]$ | $E(L_{avg}(p^f))$ |
| <i>x1</i>  | 1.4   | 1.4[1.4, 1.5]        | 5               | 1.4      | 1.5[1.4, 1.5]         | 6                 |
| x2         | 4.3   | 2.0[1.9, 2.1]        | -54             | 3.5      | 1.6[1.6, 1.7]         | -53               |
| <i>x3</i>  | 7.1   | 2.9[2.6, 3.2]        | -60             | 6.9      | 3.0[2.7, 3.4]         | -57               |
| <i>x</i> 4 | 10.1  | 4.0[3.5, 4.7]        | -60             | 9.7      | 4.3[3.6, 5.3]         | -55               |

DRAFT

#### TABLE XII

Comparison of average wirelength measurements (in gatepitches) for the POWER4 IDU designs with estimates provided by the Davis model. Errors are shown in %.

| IDU        |       | All Signals [1]      | 1]              | Functional Signals |                       |                   |  |  |
|------------|-------|----------------------|-----------------|--------------------|-----------------------|-------------------|--|--|
|            | Data  | Davis                | 3               | Data               | Dava                  | is                |  |  |
|            | $L_a$ | $L_{avg}(p)$ [range] | $E(L_{avg}(p))$ | $L_a(f)$           | $L_{avg}(p^f)[range]$ | $E(L_{avg}(p^f))$ |  |  |
| <i>d1</i>  | 2.8   | 1.9[1.8, 2.1]        | -32             | 2.8                | 1.8[1.7, 2.0]         | -35               |  |  |
| d2         | 5.0   | 2.1[1.9, 2.3]        | -57             | 4.4                | 2.0[1.8, 2.1]         | -56               |  |  |
| d3         | 6.7   | 2.2[2.0, 2.5]        | -66             | 6.3                | 2.1[1.9, 2.3]         | -68               |  |  |
| d4         | 6.1   | 2.2[2.0, 2.6]        | -63             | 5.8                | 2.1[1.9, 2.3]         | -65               |  |  |
| d5         | 3.3   | 2.3[2.0, 2.6]        | -32             | 2.8                | 2.1[1.9, 2.3]         | -25               |  |  |
| d6         | 4.5   | 2.3[2.1, 2.6]        | -49             | 4.3                | 2.1[1.9, 2.4]         | -51               |  |  |
| d7         | 5.1   | 2.3[2.1, 2.7]        | -55             | 5.1                | 2.2[2.0, 2.6]         | -58               |  |  |
| d8         | 8.0   | 2.4[2.1, 2.8]        | -70             | 7.8                | 2.2[1.9, 2.5]         | -72               |  |  |
| d9         | 7.6   | 2.4[2.1, 2.8]        | -68             | 7.2                | 2.2[2.0, 2.5]         | -70               |  |  |
| <i>d10</i> | 5.6   | 2.4[2.1, 2.8]        | -57             | 5.2                | 2.2[2.0, 2.5]         | -58               |  |  |
| <i>d11</i> | 7.4   | 2.4[2.1, 2.8]        | -67             | 6.8                | 2.2[2.0, 2.5]         | -68               |  |  |
| <i>d12</i> | 6.0   | 2.4[2.1, 2.8]        | -60             | 5.5                | 2.2[2.0, 2.5]         | -60               |  |  |
| d13        | 6.3   | 2.4[2.1, 2.9]        | -61             | 5.9                | 2.2[2.0, 2.5]         | -63               |  |  |
| <i>d14</i> | 5.2   | 2.4[2.1, 2.9]        | -53             | 5.2                | 2.2[2.0, 2.6]         | -57               |  |  |
| <i>d15</i> | 6.9   | 2.5[2.2, 2.9]        | -64             | 6.5                | 2.2[2.0, 2.6]         | -66               |  |  |
| <i>d16</i> | 5.8   | 2.5[2.2, 2.9]        | -57             | 5.8                | 2.3[2.0, 2.7]         | -62               |  |  |
| <i>d17</i> | 5.5   | 2.5[2.2, 2.9]        | -55             | 5.5                | 2.1[1.9, 2.4]         | -59               |  |  |
| d18        | 5.5   | 2.5[2.2, 3.0]        | -54             | 5.5                | 2.2[2.0, 2.6]         | -59               |  |  |

#### TABLE XIII

# Comparison of average wirelength measurements (in gatepitches) for POWER4 ISU designs with estimates provided by the Davis model. Errors are shown in %.

| ISU        | All Signals [11] |                      | Functional Signals |          |                       |                   |
|------------|------------------|----------------------|--------------------|----------|-----------------------|-------------------|
|            | Data             | Davis                |                    | Data     | Davis                 |                   |
|            | $L_a$            | $L_{avg}(p)$ [range] | $E(L_{avg}(p))$    | $L_a(f)$ | $L_{avg}(p^f)[range]$ | $E(L_{avg}(p^f))$ |
| s1         | 4.2              | 2.2[2.0, 2.5]        | -48                | 3.7      | 2.1[1.9, 2.4]         | -43               |
| s2         | 6.4              | 2.2[2.0, 2.5]        | -66                | 5.8      | 2.1[1.9, 2.4]         | -63               |
| s3         | 3.5              | 2.2[2.0, 2.5]        | -38                | 2.8      | 2.1[1.9, 2.4]         | -24               |
| <i>s</i> 4 | 5.2              | 2.3[2.0, 2.7]        | -56                | 4.8      | 2.3[2.0, 2.6]         | -53               |
| s5         | 7.9              | 2.4[2.1, 2.9]        | -69                | 7.4      | 2.3[2.0, 2.8]         | -69               |
| s6         | 6.1              | 2.4[2.1, 2.9]        | -60                | 6.1      | 2.4[2.1, 2.8]         | -61               |
| s7         | 7.3              | 2.4[2.1, 2.9]        | -67                | 6.5      | 2.4[2.1, 2.8]         | -64               |
| s8         | 5.9              | 2.5[2.1, 3.0]        | -58                | 5.5      | 2.4[2.1, 2.9]         | -55               |
| s9         | 8.2              | 2.5[2.1, 3.0]        | -70                | 7.9      | 2.4[2.1, 2.9]         | -69               |
| s10        | 7.4              | 2.5[2.1, 3.0]        | -66                | 7.1      | 2.4[2.1, 3.0]         | -65               |
| s11        | 7.0              | 2.5[2.2, 3.1]        | -64                | 6.6      | 2.5[2.1, 3.0]         | -63               |
| s12        | 8.2              | 2.5[2.2, 3.1]        | -69                | 7.6      | 2.5[2.1, 3.1]         | -67               |
| s13        | 9.2              | 2.6[2.2, 3.2]        | -72                | 8.6      | 2.5[2.1, 3.1]         | -71               |
| s14        | 4.3              | 2.6[2.2, 3.3]        | -39                | 4.1      | 2.6[2.2, 3.2]         | -37               |
| s15        | 11.2             | 2.6[2.2, 3.3]        | -77                | 10.9     | 2.6[2.2, 3.2]         | -76               |
| s16        | 13.1             | 2.7[2.2, 3.5]        | -79                | 12.4     | 2.6[2.2, 3.4]         | -79               |

#### TABLE XIV

#### Comparison of average wirelength measurements (in gatepitches) for POWER4 LSU

designs with estimates provided by the Davis model. Errors are shown in %.

| LSU              | All Signals [11] |                      |                 | Functional Signals |                       |                    |
|------------------|------------------|----------------------|-----------------|--------------------|-----------------------|--------------------|
|                  | Data             | Davis                |                 | Data               | Davis                 |                    |
|                  | $L_a$            | $L_{avg}(p)$ [range] | $E(L_{avg}(p))$ | $L_a(f)$           | $L_{avg}(p^f)[range]$ | $E(L_{avg}(p^f))$  |
| <i>l1</i>        | 3.2              | 2.2[2.0, 2.4]        | -32             | 2.7                | 2.2[2.0, 2.4]         | -18                |
| l2               | 8.5              | 2.4[2.2, 2.7]        | -71             | 7.4                | 2.5[2.2, 2.8]         | -67                |
| <i>l3</i>        | 8.2              | 2.5[2.2, 2.8]        | -70             | 7.5                | 2.5[2.2, 2.8]         | -67                |
| l4               | 6.3              | 2.6[2.3, 3.0]        | -58             | 5.8                | 2.7[2.4, 3.1]         | -54                |
| l5               | 6.0              | 2.7[2.4, 3.1]        | -56             | 5.7                | 2.7[2.4, 3.2]         | -52                |
| <i>l6</i>        | 6.2              | 2.7[2.4, 3.1]        | -57             | 5.6                | 2.8[2.4, 3.2]         | -51                |
| $l\gamma$        | 6.5              | 2.7[2.4, 3.2]        | -58             | 6.1                | 2.8[2.5, 3.3]         | -53                |
| <i>l8</i>        | 5.6              | 2.9[2.5, 3.4]        | -49             | 5.2                | 3.0[2.5, 3.5]         | -43                |
| l9               | 5.8              | 2.9[2.5, 3.4]        | -51             | 5.3                | 3.0[2.5, 3.5]         | -44                |
| <i>l10</i>       | 5.7              | 2.9[2.5, 3.5]        | -49             | 5.0                | 3.0[2.6, 3.7]         | -39                |
| l11              | 5.8              | 2.9[2.5, 3.5]        | -50             | 5.2                | 3.1[2.6, 3.7]         | -41                |
| l12              | 6.4              | 3.0[2.5, 3.5]        | -54             | 5.8                | 3.1[2.6, 3.8]         | -46                |
| l13              | 6.3              | 3.0[2.5, 3.6]        | -53             | 5.9                | 3.1[2.6, 3.8]         | -46                |
| l14              | 7.2              | 3.0[2.6, 3.6]        | -58             | 6.9                | 3.2[2.7, 3.9]         | -54                |
| l15              | 10.3             | 3.0[2.6, 3.7]        | -71             | 9.9                | 3.2[2.7, 3.9]         | -68                |
| l16              | 7.6              | 3.0[2.6, 3.7]        | -60             | 6.5                | 3.2[2.7, 3.9]         | -51                |
| l17              | 10.0             | 3.1[2.6, 3.7]        | -69             | 9.4                | 3.2[2.7, 3.9]         | -66                |
| l18              | 10.4             | 3.1[2.6, 3.7]        | -71             | 9.9                | 3.2[2.7, 4.0]         | -67                |
| l19              | 9.4              | 3.1[2.6, 3.7]        | -67             | 8.9                | 3.2[2.7, 4.0]         | -64                |
| l20              | 4.9              | 3.1[2.6, 3.8]        | -36             | 4.6                | 3.3[2.7, 4.1]         | -28                |
| l21              | 8.6              | 3.1[2.6, 3.8]        | -64             | 8.1                | 3.3[2.7, 4.1]         | -60                |
| l22              | 11.4             | 3.1[2.6, 3.8]        | -73             | 10.7               | 3.3[2.7, 4.1]         | -69                |
| l23              | 5.7              | 3.1[2.6, 3.8]        | -45             | 5.3                | 3.3[2.7, 4.1]         | -37                |
| l24              | 8.2              | 3.1[2.6, 3.9]        | -62             | 7.7                | 3.3[2.7, 4.1]         | -57                |
| 125<br>ne 8, 200 | 4 7.2            | 3.2[2.6, 3.9]        | -56             | 6.9                | 3.3[2.8, 4.2]         | -52 <sub>DR.</sub> |
| l26              | 7.5              | 3.2[2.7, 4.1]        | -57             | 7.3                | 3.5[2.8, 4.4]         | -52                |
| l27              | 9.1              | 3.3[2.7, 4.1]        | -64             | 8.4                | 3.5[2.8, 4.5]         | -59                |
| l28              | 9.5              | 3.3[2.7, 4.1]        | -65             | 8.9                | 3.5[2.8, 4.5]         | -60                |
| l29              | 12.2             | 3.3[2.7, 4.2]        | -73             | 11.1               | 3.5[2.9, 4.6]         | -69                |
|                  |                  |                      |                 |                    |                       |                    |

#### References

- [1] C. Anderson et al., "Physical design of a fourth-generation POWER GHz microprocessor," Proc. ISSCC, 2001.
- [2] J. D. Warnock, J. Keaty, J. Petrovick, J. Clabes, C. J. Kircher, B. Krauter, P. Restle, B. Zoric, and C. J. Anderson, "The circuit and physical design of the POWER4 microprocessor," IBM J. Res. Dev., vol. 46, pp. 27-51, Jan. 2002.
- [3] J. A. Davis, V. K. De, J. D. Meindl, "A Stochastic Wire-Length Distribution for Gigascale Integration (GSI)
   Part I: Derivation and Validation," *IEEE Trans. Electron Devices*, vol. 45, pp. 580-589, March 1998.
- W. E. Donath, "Placement and Average Interconnection Lengths of Computer Logic," *IEEE Trans. Circuits and Systems*, vol. CAS-26, pp. 272-277, April 1979.
- [5] W. E. Donath, "Wire Length Distribution for Placements of Computer Logic," IBM J. Res. Dev., vol. 25, pp. 152-155, May 1981.
- [6] P. Christie and D. Stroobandt, "The interpretation and application of Rent's rule," *IEEE Trans. VLSI*, vol. 8, pp. 639-648, 2000.
- [7] J. Dambre, P. Verplaetse, D. Stroobandt, J. V. Campenhout, "A comparison of various terminal-gate relationships for interconnect prediction in VLSI circuits," *IEEE Trans. VLSI*, vol. 11, pp. 24-34, February 2003.
- [8] P. Verplaetse, D. Stroobandt, and J. Van Campenhout, "A stochastic model for the interconnection topology of digital circuits," *IEEE Trans. VLSI Syst.*, vol. 9, pp. 938-942, Dec. 2001.
- B. S. Landman and R. L. Russo, "On a Pin Versus Block Relationship For Partitions of Logic Graphs," *IEEE Trans. Computers*, vol. C-20, pp. 1469-1479, December 1971.
- [10] H. B. Bakoglu, Circuits, Interconnections, and Packaging for VLSI. New York: Addison-Wesley, 1990.
- [11] M. Y. Lanzerotti, G. Fiorenza, R. Rand, "Assessment of on-chip wirelength distribution models," *IEEE Trans. VLSI*, in press.
- [12] J. A. Davis, V. K. De, J. D. Meindl, "A Stochastic Wire-Length Distribution for Gigascale Integration (GSI) -Part II: Applications to Clock Frequency, Power Dissipation, and Chip Size Estimation," *IEEE Trans. Electron Devices*, vol. 45, pp. 590-597, March 1998.
- [13] J. A. Davis, Ph.D. Thesis, Georgia Institute of Technology, 1999.
- [14] J. Dambre, "Prediction of interconnect properties for digital circuit design and exploration," Ph.D. Dissertation, University of Ghent, Dept. of Electronics and Information Systems, July 2003.
- [15] L. Sigal, "Circuit design techniques for the high-performance CMOS IBM S/390 parallel enterprise server G4 microprocessor," *IBM J. Res. Develop.*, vol. 41, pp. 489-503, July/September 1997.
- [16] P. J. Restle et al., "The clock distribution of the Power4 microprocessor," Proc. ISSCC, 2002.
- [17] G. Karypis and S. Navaratnasothie, Personal Communication, 10/2003.
- [18] J. Dambre, personal communication, 2003.
- [19] D. Stroobandt, "A priori wirelength distribution models for multiterminal nets", *IEEE Trans. VLSI*, vol. 11, pp. 35-43, February 2003.
- [20] J. Dambre, P. Verplaetse, D. Stroobandt, and J. Van Campenhout, "On Rent's rule for rectangular regions," in Proc. Int. Workshop on System-Level Interconnect Prediction, Mar. 2001, pp. 49-56.
- [21] P. Verplaetse, J. Dambre, D. Stroobandt, and J. Van Campenhout, "On partitioning vs. placement Rent properties," in *Proc. Int. Workshop on System-Level Interconnect Prediction*, Mar. 2001, pp. 33-40.
- [22] D. Stroobandt, "Analytical methods for a priori wirelength estimates in computer systems," Ph.D. Dissertation, University of Ghent, Faculty of Applied Sciences, Nov. 1998. (translated from Dutch).
- [23] D. Stroobandt, A Priori Wirelength Estimates for Digital Design. Boston: Kluwer Academic Publishers, 2001.

June 8, 2004

- [24] G. Karypis and V. Kumar, (1998). hMetis: A Hypergraph Partitioning Package. [Online]. Available: http://www-users.cs.umn.edu/ karypis/metis/index.html.
- [25] J. Davis, G. Lopez, private communication, 2002.
- [26] I. P. Gent, S. A. Grant, E. MacIntyre, P. Prosser, P. Shaw, B. M. Smith, and T. Walsh, "How Not To Do It," Research Report 97-27, Univ. of Leeds School of Computer Studies, May 1997.
- [27] A. E. Caldwell, A. B. Kahng, A. A. Kennings, and I. L. Markov, "Hypergraph Partitioning for VLSI CAD: Methodology for Heuristic Development, Experimentation and Reporting," in *Proc. Design Automation Conf.*, 1999.
- [28] X. Yang, E. Bozogzadeh, M. Sarrafzadeh, "Wirelength estimation based on Rent Exponents of Partitioning and Placement," in *Proc. SLIP.*, 2001.
- [29] P. Christie, "A differential equation for placement analysis," *IEEE Trans. VLSI*, vol. 9, pp. 913-921, Dec. 2001.