The Fused Multiply-Add Instruction Leads to Algorithms for Extended-Precision Floating Point: Applications to Java and High-Performance Computing

Portability in Java requires that all architectures produce the same result for a particular floating-point calculation. Conceptually, the best definition for this result is the correctly rounded value of the infinitely precise result. Although this is likely an unattainable goal in general, it should be striven for when attainable with reasonable performance. Although many architectures implement IEEE double-precision arithmetic in hardware, providing the correctly rounded result for the basic operations (+,-,*,/) with good performance, most do not provide hardware support for extended precision arithmetic. The performance and accuracy of the resulting software implementations can be poor.

Many machines provide a double-precision fused multiply-add (FMA) instruction that computes d = a * b + c with d correctly rounded. Using the FMA instruction, higher performance algorithms for extended-precision arithmetic can be devised. We present examples of such algorithms for the basic operations, +,-,*,/. The algorithms always compute the exact result for each of the operations. This contributes to the Java goal of portability by providing a large spectrum of architectures on which it is possible to efficiently implement correctly rounded extended precision floating point. The FMA is also valuable in large-scale computation of linear algebra problems, where the accumulation of dot products is a basic operation. Performing this accumulation in extended precision greatly increases the size of problem that may be solved before roundoff error becomes a limiting factor. It also produces a much faster way, over just using IEEE arithmetic, of doing the more accurate computation.

Given two integers a and b, there exist unique integers q and r such that a = bq + r with 0 <= r < b. In number theory, this lemma is the computational basis for proving the Fundamental Theorem of Arithmetic.

Note that a=bq+r has the same form as the FMA. The exact nature of the FMA and this lemma allow us to define multiplication and division of IEEE numbers exactly. Furthermore, the FMA instruction provides a much faster way to perform these exact computations. Thus, we will demonstrate a clear need for Java to accept the FMA as a standard instruction.

Finally, we demonstrate the accuracy of the algorithms on example problems (matrix multiplication, 2 x 2 determinant, complex multiplication, and triangle area calculation), for which existing computer arithmetic gives completely inaccurate results in certain instances.

By: Fred G. Gustavson, Jose E. Moreira, Robert F. Enenkel

Published in: RC21586 in 1999

LIMITED DISTRIBUTION NOTICE:

This Research Report is available. This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). I have read and understand this notice and am a member of the scientific community outside or inside of IBM seeking a single copy only.

RC21586.ps

Questions about this service can be mailed to reports@us.ibm.com .