Overview | The Dispatcher | Controlling the Dispatcher | Modifying the Schedule | Software Overview

 

Recovering from Failures


SUSOPS experiments frequently operate continuously over several days. In this environment it is critical that experiments are not jeopardized by hardware of software failures. SUSOPS can recover and continue from both Workstation and Console failures.

If a Workstation fails, it, or a replacement, can be started and instructed to join an experiment that is already in progress. When it reconnects to the Console and identifies itself, the Console will send it context information that allows it to continue from the point of failure. The Console and other Workstations are unaffected by the failure.



If the Console fails, the Workstations can continue operating independently. When a new Console is started, it can be instructed to reconnect to each of the running Workstations. Each of these will return context information that allows the Console to properly reflect their status as though the failure did not occur. This includes the history of task executions that took place when the Console was down.


Other SUSOPS | Other JAVA
Company Info | Projects | Tools | Employment | Comments
NTT Systems Inc.