DARMA » Introduction » Termination Detector

Detect termination of work

The termination component vt::term::TerminationDetector, accessed via vt::theTerm() detects the completion of the transitive closure of work by following the causal chain of messages/events across multiple nodes. It provides global termination to determine when all work is complete and the schedulers can stop running. Additionally, it enables the creation of epochs (which stamp message envelopes) to mark messages as part of a work grouping to detect termination of all events causally related to a subset of messages in the system.

The termination detector comes with two different detection algorithms: (1) 4-counter wave-based termination for large collective or large rooted epochs across the whole system; and, (2) Dijkstra-Scholten parental responsibility termination for rooted epochs. Epochs are allowed to have other epochs nested within them, thus forming a graph. The detector tracks the relation between epochs, only making progress on epochs that do not have a dependency on another epoch terminating first.

The termination detector also comes with hang detection to detect causes where no progress can be made due to bugs in an application's code or the runtime implementation. When a hang is detected, if configured as such by the user, the detector will dump a DOT graph of the live epochs and their dependencies.

Example of creating a collective epoch

using TestMsg = vt::Message;

vt::NodeType nextNode() {
  vt::NodeType this_node = vt::theContext()->getNode();
  vt::NodeType num_nodes = vt::theContext()->getNumNodes();
  return (this_node + 1) % num_nodes;
}

static void test_handler(TestMsg* msg) {
  static int num = 3;

  vt::NodeType this_node = vt::theContext()->getNode();

  auto epoch = vt::envelopeGetEpoch(msg->env);
  fmt::print("{}: test_handler: num={}, epoch={:x}\n", this_node, num, epoch);

  num--;
  if (num > 0) {
    auto msg_send = vt::makeMessage<TestMsg>();
    vt::theMsg()->sendMsg<test_handler>(nextNode(), msg_send);
  }
}

int main(int argc, char** argv) {
  vt::initialize(argc, argv);

  vt::NodeType this_node = vt::theContext()->getNode();
  vt::NodeType num_nodes = vt::theContext()->getNumNodes();

  if (num_nodes == 1) {
    return vt::rerror("requires at least 2 nodes");
  }

  auto epoch = vt::theTerm()->makeEpochCollective();

  // This action will not run until all messages originating from the
  // sends are completed
  vt::theTerm()->addAction(epoch, [=]{
    fmt::print("{}: finished epoch={:x}\n", this_node, epoch);
  });

  // Message must go out of scope before finalize
  {
    auto msg = vt::makeMessage<TestMsg>();
    vt::envelopeSetEpoch(msg->env, epoch);
    vt::theMsg()->sendMsg<test_handler>(nextNode(), msg);
  }

  vt::theTerm()->finishedEpoch(epoch);

  vt::finalize();

  return 0;
}

Example of creating a rooted epoch

using TestMsg = vt::Message;

vt::NodeType nextNode() {
  vt::NodeType this_node = vt::theContext()->getNode();
  vt::NodeType num_nodes = vt::theContext()->getNumNodes();
  return (this_node + 1) % num_nodes;
}

static void test_handler(TestMsg* msg) {
  static int num = 3;

  vt::NodeType this_node = vt::theContext()->getNode();

  auto epoch = vt::envelopeGetEpoch(msg->env);
  fmt::print("{}: test_handler: num={}, epoch={:x}\n", this_node, num, epoch);

  num--;
  if (num > 0) {
    auto msg_send = vt::makeMessage<TestMsg>();
    vt::theMsg()->sendMsg<test_handler>(nextNode(), msg_send);
  }
}

int main(int argc, char** argv) {
  vt::initialize(argc, argv);

  vt::NodeType this_node = vt::theContext()->getNode();
  vt::NodeType num_nodes = vt::theContext()->getNumNodes();

  if (num_nodes == 1) {
    return vt::rerror("requires at least 2 nodes");
  }

  if (this_node == 0) {
    auto epoch = vt::theTerm()->makeEpochRooted(vt::term::UseDS{true});

    // This action will not run until all messages originating from the
    // following send are completed
    vt::theTerm()->addAction(epoch, [=]{
      fmt::print("{}: finished epoch={:x}\n", this_node, epoch);
    });

    auto msg = vt::makeMessage<TestMsg>();
    vt::envelopeSetEpoch(msg->env, epoch);
    vt::theMsg()->sendMsg<test_handler>(nextNode(), msg);
    vt::theTerm()->finishedEpoch(epoch);
  }

  vt::finalize();

  return 0;
}