Exploiting Non-Determinism for Reliability of Mobile Agent Systems

Mobile agents are useful for certain classes of applications. One important technical hurdle blocking their adoption is their lack of reliability. Designing a reliable mobile agent system is especially challenging since a mobile agent is potentially affected by failure of any host that it visits, or failure of any communication link that it needs to traverse. Previous work in this domain has attempted techniques such as periodic checkpointing of mobile agent state and restarting upon machine or communication recovery. Such approaches render an agent unavailable until a machine or a communication link itself recovers. In this paper, we take an alternate approach based on the premise that a mobile agent can often complete its task in more than one way. We capture such redundancy in non-deterministic constructs in the agent language and maintain state about an agent's actual computational path in the possible computational tree implied by the non-deterministic operations used in an agent. We design a distributed recovery scheme that detects a failure, rolls back an agent's computation, and restarts the agent from a previous point in its computational tree down a different but equivalent computational path without waiting for the actual failure itself to be repaired. We implement the system and present preliminary measurements which indicate that our approach has reasonable overhead and scalability.

By: Ajay Mohindra, Apratim Purakayastha, Prasannaa Thati

Published in: DSN 2000: International Conference on Dependable Systems and Networks, Proceedings. Los Alamitos, CA, , IEEE Computer Society, p.144-57 in 2000

Please obtain a copy of this paper from your local library. IBM cannot distribute this paper externally.

Questions about this service can be mailed to reports@us.ibm.com .