A Hardware/Software Codesign Strategy For the Implementation of High-Speed Protocols

Advances in glass-fiber and network technologies have shifted the bottleneck in communication systems to the protocol processing unit. Two trends can be observed in the literature for solving this problem. First, parallelism is employed at the stack, layer, entity and the protocol function level. However, finer-grain parallelism has not been identified. Second, functions that form a bottleneck are off-loaded to a hardware coprocessor, though these functions are often selected on an ad-hoc basis: it is not clear what an optimal trade-off between hardware and software is.
The main goal of this thesis is to design an architecture for a general purpose, programmable, protocol processor, capable of supporting very high bit-rate datastreams (at least 1.2 GBit/sec., or at least 600k packets/sec.). The problem to be solved is to manage the complexity caused by: A large gap between specification and implementation, numerous constraints and tunable parameters, the complex functions to be performed and the high-speed processing requirements. This thesis presents a design strategy for the implementation of high-speed complex systems in general (as characterized above) and the feasibility is shown by applying this strategy to the design of a general purpose protocol processor.
In accordance with the design strategy, our focus is on architectural aspects and not on the specification. Basic functions, which are operations visible in the architecture (and often not in the specification) will be identified by means of a structured, detailed analysis of the protocol implementation. The basic functions will be defined in accordance with the goal to provide hardware support for the functions common to most protocols and with the emphasis on efficient processing of the fast path in a general protocol. Fine-grain parallelism is obtained by defining that the basic functions are the atomic unit of parallelism. Hardware and software trade-offs can be resolved with
performance modeling techniques.
Several novel VLSI based architectures have been proposed: The observation that timers used in protocol processing may be in-accurate has lead to an efficient VLSI implementation. A layered model for the implementation of memory management primitives has been defined. The central part is an architecture for high speed buffer management, for which a patent has been applied for. The header processing task is divided into pipelined operation of header separation, parameter extraction, address translation and state variable management. This task becomes straightforward and can be supported by simple though still programmable hardware modules. Extracted header parameters can be interpreted efficiently by a functional memory, for which two novel implementation methods, also being patented, are presented. A general rate control scheme, in which all schemes known in literature are unified, is presented. The method requires activation of an algorithm only upon arrival of a packet to be transmitted and no periodic calculations are necceary.
An architecture for a general purpose protocol processor, comprised of simple hardware components and enhanced by a simple von Neumann micro processor has been presented. The estimated performance for LLC 8802-2.2, independent of the number of active timers and the number of active connections, is between 550K and 800K packets per second. With a short packet length of 256 bytes, this leads to data rates between 1.1 Gbit/sec and 1.6 Gbit/sec. For a programmable protocol processor, these high speeds can only be obtained if this von Neumann processor is designed specially for its task and is integrated together with the various hardware modules.
The design strategy provides a programming model, in which the protocol implementer (or compiler) is only aware of the defined basic functions. The various implementation aspects, in particular those of header processing and generation, and all data flow related issues, are invisible to the protocol implementer. The advantage of this is that parallelism is obtained by sequential programming.

By: M. Heddes

Published in: RZ2753 in 1998

This Research Report is not available electronically. Please request a copy from the contact listed below. IBM employees should contact ITIRC for a copy.

Questions about this service can be mailed to reports@us.ibm.com .