On Approximating the Ideal Random Access Machine by Physical Machines

Copyright © (2009) by Association for Computing Machinery, Inc. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distrubuted for profit or commericial advantage. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee.

The capability of the Random Access Machine (RAM) to execute any instruction in constant time is not realizable, due to fundamental physical constraints on the minimum size of devices and on the maximum speed of signals. This work explores how well the ideal RAM performance can be approximated, for significant classes of computations, by machines whose building blocks have constant size and are connected at a constant distance.

A novel memory structure is proposed, which is pipelined (can accept a new request at each cycle) and hierarchical, exhibiting optimal latency a(x) = O(x1/d) to address x, in d-dimensional realizations.

In spite of block-transfer or other memory-pipeline capabilities, a number of previous machine models do not achieve a full overlap of memory accesses. These are examples of machines with explicit data movement. It is shown that there are direct-flow computations (without branches and indirect accesses), which require time super-linear in number of instructions, on all such machines.

To circumvent the explicit-data-movement constraints, the Speculative Prefetcher (SP) and the Speculative Prefetcher and Evaluator (SPE) processors are developed. Both processors can execute any direct-flow program in linear time. The SPE also executes in linear time a class of loop programs that includes many significant algorithms. Even quicksort, a somewhat irregular, recursive algorithm admits a linear-time SPE implementation. A relation between instructions called address dependence is introduced, which limits memory-access overlap and can lead to superlinear time, as illustrated with the classical merging algorithm.

By: Gianfranco BIilardi; Kattamuri Ekanadham; Pratap Pattnaik

Published in: Journal of the ACM , volume 56, (no 5), pages Art No. 27 in 2009

LIMITED DISTRIBUTION NOTICE:

This Research Report is available. This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). I have read and understand this notice and am a member of the scientific community outside or inside of IBM seeking a single copy only.

rc24379.pdf

Questions about this service can be mailed to reports@us.ibm.com .