vt::term::TerminationDetector struct

Detect global termination and of subsets of work.

Implements distributed algorithms to termination detection across the entire VT runtime and for subset of work, encapsulated in an epoch. Ships with two algorithms: 4-counter wave-based termination for large collective epochs; and, Dijkstra-Scholten parental responsibility termination for rooted epochs. Epochs may have other epochs nested within them, forming a graph.

The termination detector detects termination of the transitive closure of a piece of work—either starting collectively with all nodes or starting on a particular node (rooted).

In order to track work on the distributed system, work is "produced" and "consumed". Produce and consume are separate counters that are tracked on each node for each epoch. When the global produce and consume counts (sum across all nodes) are equal, termination is reached.

Base classes

template<typename T>
struct vt::runtime::component::Component<TerminationDetector>
Component class for a generic VT runtime module, CRTP'ed over the component's actual type
struct TermAction
struct vt::collective::tree::Tree
General interface for storing a spanning tree.
struct StateDS
struct TermInterface

Public types

template<typename T>
using EpochContainerType = std::unordered_map<EpochType, T>
using TermStateType = TermState
using TermStateDSType = term::ds::StateDS::TerminatorType
using SuccessorBagType = EpochDependency::SuccessorBagType
using EpochGraph = termination::graph::EpochGraph
using EpochGraphMsg = termination::graph::EpochGraphMsg<EpochGraph>
using EpochStackType = EpochStack

Constructors, destructors, conversion operators

TerminationDetector()
Construct a termination detector.
~TerminationDetector() virtual

Public functions

auto name() -> std::string override
Get the name of the component.
void produce(EpochType epoch = any_epoch_sentinel, TermCounterType num_units = 1, NodeType node = uninitialized_destination)
Produce on an epoch—increase the produce counter.
void consume(EpochType epoch = any_epoch_sentinel, TermCounterType num_units = 1, NodeType node = uninitialized_destination)
Consume on an epoch—increase the consume counter.
void hangDetectSend()
Special produce for hang detection.
void hangDetectRecv()
Special consume for hang detection.
auto isRooted(EpochType epoch) -> bool
Check if an epoch is rooted.
auto isDS(EpochType epoch) -> bool
Check if the algorithm behind an epoch is Dijkstra-Scholten parental responsibility.
auto getDSTerm(EpochType epoch, bool is_root = false) -> TermStateDSType*
Get or create the DS terminator for an epoch.
void resetGlobalTerm()
Reset global termination to start producing/consuming again.
void freeEpoch(EpochType const& epoch)
Free an epoch after termination.
auto makeEpochRooted(UseDS use_ds = UseDS{true}, ParentEpochCapture parent = ParentEpochCapture{}) -> EpochType
Create a new rooted epoch.
auto makeEpochCollective(ParentEpochCapture parent = ParentEpochCapture{}) -> EpochType
Create a new collective epoch.
auto makeEpochRooted(std::string const& label, UseDS use_ds = UseDS{true}, ParentEpochCapture parent = ParentEpochCapture{}) -> EpochType
Create a new rooted epoch with a label.
auto makeEpochCollective(std::string const& label, ParentEpochCapture parent = ParentEpochCapture{}) -> EpochType
Create a collective epoch with a label.
auto makeEpoch(std::string const& label, bool is_coll, UseDS use_ds = UseDS{false}, ParentEpochCapture parent = ParentEpochCapture{}) -> EpochType
Create a new rooted or collective epoch with a label.
void initializeCollectiveEpoch(EpochType const epoch, std::string const& label, ParentEpochCapture parent = ParentEpochCapture{})
Setup a collective epoch with the epoch already generated.
void initializeRootedEpoch(EpochType const epoch, std::string const& label, UseDS use_ds = UseDS{false}, ParentEpochCapture parent = ParentEpochCapture{})
Setup a new rooted epoch with the epoch already generated.
void finishedEpoch(EpochType const& epoch)
Tell the termination detector that all initial work has been enqueued for a given epoch on this node.
void activateEpoch(EpochType const& epoch)
Activate an epoch; start detecting on it.
void finishNoActivateEpoch(EpochType const& epoch)
Finish an epoch without activating it (starting the work of detecting its termination)
auto makeEpochRootedWave(ParentEpochCapture parent, std::string const& label = "") -> EpochType
Create a new rooted epoch that uses the 4-counter wave algorithm.
auto makeEpochRootedDS(ParentEpochCapture parent, std::string const& label = "") -> EpochType
Create a new rooted epoch that uses the DS algorithm.
void initializeRootedWaveEpoch(EpochType const epoch, ParentEpochCapture parent, std::string const& label = "")
Setup a new rooted epoch that uses the 4-counter wave algorithm with an epoch already generated.
void initializeRootedDSEpoch(EpochType const epoch, ParentEpochCapture parent, std::string const& label = "")
Setup a new rooted epoch that uses the DS algorithm with the epoch already generated.
void startEpochGraphBuild()
Build the epoch graph. Typically called to output to the user due to a failure.
void setLocalTerminated(bool const terminated, bool const no_propagate = true)
Set whether the scheduler has locally terminated.
void maybePropagate()
Progress function to move state forward.
auto getNumUnits() const -> TermCounterType
Get number of units produced on global epoch.
auto getNumTerminatedCollectiveEpochs() const -> std::size_t
Get number of collective epochs that have terminated.
auto testEpochTerminated(EpochType epoch) -> TermStatusEnum override
Test if an epoch has terminated or not.
auto isEpochTerminated(EpochType epoch) -> bool
Check if an epoch has terminated.
auto makeGraph() -> std::shared_ptr<EpochGraph>
Make the local epoch graph.
void addLocalDependency(EpochType epoch)
Add a local work dependency on an epoch to stop propagation.
void releaseLocalDependency(EpochType epoch)
Release a local work dependency on an epoch to resume propagation.
void addDependency(EpochType predecessor, EpochType successor)
Make a dependency between two epochs.
void disableTD(EpochType in_epoch = any_epoch_sentinel)
Disable termination detection on an epoch. Local counting is still enabled, but any non-local progress is halted until it is enabled.
void enableTD(EpochType in_epoch = any_epoch_sentinel)
Enable termination detection on an epoch.
auto getEpochState() -> EpochContainerType<TermStateType> const &
auto getEpochReadySet() -> std::unordered_set<EpochType> const &
auto getEpochWaitSet() -> std::unordered_set<EpochType> const &
template<typename SerializerT>
void serialize(SerializerT& s)
auto getEpoch() const -> EpochType
void pushEpoch(EpochType epoch)
auto popEpoch(EpochType epoch = no_epoch) -> EpochType
void pushEpochFast(EpochType epoch)
void popEpochFast()
auto getEpochStack() -> EpochStackType&

Public variables

TermStateType any_epoch_state_
TermStateType hang_

Function documentation

void vt::term::TerminationDetector::produce(EpochType epoch = any_epoch_sentinel, TermCounterType num_units = 1, NodeType node = uninitialized_destination)

Produce on an epoch—increase the produce counter.

Parameters
epoch in the epoch to produce; if empty, produce on global epoch
num_units in number of units to produce
node in the node where this unit will be consumed (optional)

void vt::term::TerminationDetector::consume(EpochType epoch = any_epoch_sentinel, TermCounterType num_units = 1, NodeType node = uninitialized_destination)

Consume on an epoch—increase the consume counter.

Parameters
epoch in the epoch to consume; if empty, consume on global epoch
num_units in number of units to consume
node in the node where this unit was produced (optional)

bool vt::term::TerminationDetector::isRooted(EpochType epoch)

Check if an epoch is rooted.

Parameters
epoch in the epoch to check
Returns whether it is rooted

bool vt::term::TerminationDetector::isDS(EpochType epoch)

Check if the algorithm behind an epoch is Dijkstra-Scholten parental responsibility.

Parameters
epoch in the epoch to check
Returns whether is it DS

TermStateDSType* vt::term::TerminationDetector::getDSTerm(EpochType epoch, bool is_root = false)

Get or create the DS terminator for an epoch.

Parameters
epoch in the epoch
is_root in whether this is the root (relevant when creating)
Returns the DS terminator manager

void vt::term::TerminationDetector::freeEpoch(EpochType const& epoch)

Free an epoch after termination.

Parameters
epoch in the epoch

EpochType vt::term::TerminationDetector::makeEpochRooted(UseDS use_ds = UseDS{true}, ParentEpochCapture parent = ParentEpochCapture{})

Create a new rooted epoch.

Parameters
use_ds in whether to use the Dijkstra-Scholten algorithm
parent in parent epoch that waits for this new epoch
Returns the new epoch

EpochType vt::term::TerminationDetector::makeEpochCollective(ParentEpochCapture parent = ParentEpochCapture{})

Create a new collective epoch.

Parameters
parent in parent epoch that waits for this new epoch
Returns the new epoch

EpochType vt::term::TerminationDetector::makeEpochRooted(std::string const& label, UseDS use_ds = UseDS{true}, ParentEpochCapture parent = ParentEpochCapture{})

Create a new rooted epoch with a label.

Parameters
label in epoch label for debugging purposes
use_ds in whether to use the Dijkstra-Scholten algorithm
parent in parent epoch that waits for this new epoch
Returns the new epoch

EpochType vt::term::TerminationDetector::makeEpochCollective(std::string const& label, ParentEpochCapture parent = ParentEpochCapture{})

Create a collective epoch with a label.

Parameters
label in epoch label for debugging purposes
parent in parent epoch that waits for this new epoch
Returns the new epoch

EpochType vt::term::TerminationDetector::makeEpoch(std::string const& label, bool is_coll, UseDS use_ds = UseDS{false}, ParentEpochCapture parent = ParentEpochCapture{})

Create a new rooted or collective epoch with a label.

Parameters
label in epoch label for debugging purposes
is_coll in whether to create a collective or rooted epoch
use_ds in whether to use the Dijkstra-Scholten algorithm
parent in parent epoch that waits for this new epoch
Returns the new epoch

void vt::term::TerminationDetector::initializeCollectiveEpoch(EpochType const epoch, std::string const& label, ParentEpochCapture parent = ParentEpochCapture{})

Setup a collective epoch with the epoch already generated.

Parameters
epoch in the collective epoch already generated
label in epoch label for debugging purposes
parent in parent epoch that waits for this new epoch

void vt::term::TerminationDetector::initializeRootedEpoch(EpochType const epoch, std::string const& label, UseDS use_ds = UseDS{false}, ParentEpochCapture parent = ParentEpochCapture{})

Setup a new rooted epoch with the epoch already generated.

Parameters
epoch in the collective epoch already generated
label in epoch label for debugging purposes
use_ds in whether to use the Dijkstra-Scholten algorithm
parent in parent epoch that waits for this new epoch

void vt::term::TerminationDetector::finishedEpoch(EpochType const& epoch)

Tell the termination detector that all initial work has been enqueued for a given epoch on this node.

Parameters
epoch in the finished epoch

void vt::term::TerminationDetector::activateEpoch(EpochType const& epoch)

Activate an epoch; start detecting on it.

Parameters
epoch in the epoch to activate

void vt::term::TerminationDetector::finishNoActivateEpoch(EpochType const& epoch)

Finish an epoch without activating it (starting the work of detecting its termination)

Parameters
epoch in the epoch that is finished

EpochType vt::term::TerminationDetector::makeEpochRootedWave(ParentEpochCapture parent, std::string const& label = "")

Create a new rooted epoch that uses the 4-counter wave algorithm.

Parameters
parent in parent epoch that waits for this new epoch
label in epoch label for debugging purposes
Returns the new epoch

EpochType vt::term::TerminationDetector::makeEpochRootedDS(ParentEpochCapture parent, std::string const& label = "")

Create a new rooted epoch that uses the DS algorithm.

Parameters
parent in parent epoch that waits for this new epoch
label in epoch label for debugging purposes
Returns the new epoch

void vt::term::TerminationDetector::initializeRootedWaveEpoch(EpochType const epoch, ParentEpochCapture parent, std::string const& label = "")

Setup a new rooted epoch that uses the 4-counter wave algorithm with an epoch already generated.

Parameters
epoch in the wave epoch already generated
parent in parent epoch that waits for this new epoch
label in epoch label for debugging purposes

void vt::term::TerminationDetector::initializeRootedDSEpoch(EpochType const epoch, ParentEpochCapture parent, std::string const& label = "")

Setup a new rooted epoch that uses the DS algorithm with the epoch already generated.

Parameters
epoch in the DS epoch already generated
parent in parent epoch that waits for this new epoch
label in epoch label for debugging purposes

void vt::term::TerminationDetector::setLocalTerminated(bool const terminated, bool const no_propagate = true)

Set whether the scheduler has locally terminated.

Parameters
terminated in whether it has terminated
no_propagate in whether to should propagate state remotely

TermCounterType vt::term::TerminationDetector::getNumUnits() const

Get number of units produced on global epoch.

Returns number of produced units

std::size_t vt::term::TerminationDetector::getNumTerminatedCollectiveEpochs() const

Get number of collective epochs that have terminated.

Returns number of epochs

TermStatusEnum vt::term::TerminationDetector::testEpochTerminated(EpochType epoch) override

Test if an epoch has terminated or not.

Parameters
epoch in the epoch to test
Returns status enum indicating the known state

bool vt::term::TerminationDetector::isEpochTerminated(EpochType epoch)

Check if an epoch has terminated.

Parameters
epoch in the epoch to test
Returns whether it is known to be terminated

std::shared_ptr<EpochGraph> vt::term::TerminationDetector::makeGraph()

Make the local epoch graph.

Returns shared pointer to epoch graph

void vt::term::TerminationDetector::addLocalDependency(EpochType epoch)

Add a local work dependency on an epoch to stop propagation.

Parameters
epoch in the epoch

void vt::term::TerminationDetector::releaseLocalDependency(EpochType epoch)

Release a local work dependency on an epoch to resume propagation.

Parameters
epoch in the epoch

void vt::term::TerminationDetector::addDependency(EpochType predecessor, EpochType successor)

Make a dependency between two epochs.

Parameters
predecessor in the predecessor epoch
successor in the successor epoch

void vt::term::TerminationDetector::disableTD(EpochType in_epoch = any_epoch_sentinel)

Disable termination detection on an epoch. Local counting is still enabled, but any non-local progress is halted until it is enabled.

Parameters
in_epoch in the epoch

void vt::term::TerminationDetector::enableTD(EpochType in_epoch = any_epoch_sentinel)

Enable termination detection on an epoch.

Parameters
in_epoch in the epoch