TOP-C stands for Task-Oriented Parallel C Coo96. The ``TOP-C
model'' is the specific master slave model implemented here. That model
has been adapted for use in ParGAP. The implementation is in
masslave.g
in ParGAP's lib
directory. Note that the functions and
variables with names TOPC...
are intended as internal functions only,
and should not be used by the GAP programmer.
For the impatient, you may type MSexample();
in a ParGAP session
now. If you prefer further hands-on learning in a tutorial style, you may
wish to next read Chapter MasterSlave Tutorial. Eventually, if you wish
a deeper understanding of the TOP-C model, you will need to read this
current section and those that follow.
The initial GAP process is the master process, and all others are slave processes. It allows most of the CPU-intensive computations to be carried out on slave processes, which typically reside on remote processors. A well-developed TOP-C application should find that the master process is almost never busy when a slave process is idle, waiting for a new computation to carry out. This provides a natural way of maximizing utilization and load balancing.
The TOP-C model depends on three concepts:
UpdateSharedData()
.
The task input is defined to be the argument of the task (considered as a function), and the task output is the return value of the task.
3.1 Basic TOP-C (Master-Slave) commands
There is only one core TOP-C command, a utility function, and several constants. A TOP-C command must be evaluated on the master and on all slaves. We shall describe the commands in detail in the following sections, but a short list of the essentials and a small example will be helpful to set the context.
MasterSlave(
SubmitTaskInput,
DoTask[,
CheckTaskResult[,
UpdateSharedData[,
taskAgglomCount]]] ) F
See Section Other TOP-C Commands for a description of MasterSlave
.
NOTASK V
NO_ACTION V
UPDATE_ACTION V
REDO_ACTION V
CONTINUATION_ACTION(
taskContinuation ) F
CONTINUATION_ACTION()
is described in Section The GOTO statement of the TOP-C model.
IsUpToDate() F
ParInstallTOPCGlobalFunction(
string,
function ) F
ParInstallTOPCGlobalFunction(
gvar,
function ) F
A short example shows one possible implementation of ParList()
.
gap> ParInstallTOPCGlobalFunction( "MyParListWithAgglom", > function( list, fnc ) > local result, i; > result := []; i := 0; > MasterSlave( function() if i >= Length(list) then return NOTASK; > else i := i+1; return i; fi; end, > fnc, > function(input,output) result[input] := output; > return NO_ACTION; end, > Error > ); > return result; > end );
(Of course rather than type such code in a ParGAP session it's
generally more convenient to have it in a file and Read
it in.)
A master-slave computation is invoked when a GAP program issues the
command MasterSlave()
. As given earlier, the typical form is:
MasterSlave(
SubmitTaskInput,
DoTask [,
CheckTaskResult[,
UpdateSharedData[,
taskAgglom]]] )
where the first four arguments of MasterSlave()
are also functions, but
they must be defined by the application writer. Their calling syntax is
defined by the following GAP code, which also provides a simplified
description of how a sequential (non-parallel) MasterSlave()
would
invoke these functions if there were only a single process. (A more
sophisticated version of this routine is provided in ParGAP to allow
one to debug within a single process first.) The use of the fifth
argument, taskAgglom, is deferred until section Agglomerating tasks for efficiency (ParSemiEchelonMat revisited again).
In this section, we define MasterSlave()
and describe the use of its
four arguments in a purely sequential environment. The issues of
parallelism and passing of messages between processes is covered in the
next section. The call to MasterSlave()
in ParGAP, above, will have
the same result as if MasterSlave()
were defined equivalently to
SeqMasterSlave()
below, and then run in a standard, sequential GAP
(a single process). The next section describes the multi-process
implementation of MasterSlave()
in ParGAP, in which taskInput
is
computed on the master process and sent as a message to a slave process,
while taskOutput
is computed on a slave process and sent as a message
to the master process.
SeqMasterSlave := function(SubmitTaskInput, DoTask, CheckTaskResult, UpdateSharedData) local taskInput, taskOutput, action; while true do taskInput := SubmitTaskInput(); if taskInput = NOTASK then break; fi; repeat taskOutput := DoTask( taskInput ); action := CheckTaskResult( taskOutput, taskInput ); until action <> REDO_ACTION; if action = UPDATE_ACTION then # Modify the shared data (global data structures) here # Called on all processes, master and slaves UpdateSharedData( taskOutput, taskInput ); fi; od; end;
One can also follow the life of a single task in a multi-processing environment through the diagram below.
MASTER | SLAVE _________________________________________________________________________ | +---------------------+ | | GenerateTaskInput() | | +---------------------+ | \input | \______________ | | | v | +---------------+ | | DoTask(input) | | +---------------+ |output/ ^ _________/ | | | | v | | +--------------------------------+ | | | CheckTaskResult(input, output) |___________| +--------------------------------+ (if action == REDO) | | | (if action == UPDATE) v | +---------------------------------+ | UpdateSharedData(input, output) | +---------------------------------+ | TOP-C Programmers' Model (Life Cycle of a Task)
Although not explicit in the code, the application writer should add
comments to define the shared data. The shared data is defined as a
global data structure that is treated as ``read-write'' by
UpdateSharedData
()
, while being treated as ``read-only'' by
SubmitTaskInput
()
, DoTask
()
, and CheckTaskResult
()
. Note also
that an application writer may use different names for the four functions
SubmitTaskInput
()
, etc. It is only a convention within this manual to
give those functions the names, above. Similarly, taskInput,
taskOutput and action are the conventional names used in this manual,
and a given application may use different names.
In a correct ParGAP application, the shared data should be initialized
to the same value on all processes before the application calls
MasterSlave()
. MasterSlave()
is then called on all processes. After
that, the shared data can be modified only by a call to
UpdateSharedData
()
, and MasterSlave()
arranges for each call to
UpdateSharedData
()
to be executed on all processes. Further,
UpdateSharedData
()
has access only to taskInput, taskOutput, and
the previous value of the shared data. Thus, MasterSlave()
maintains
the same shared data uniformly on all processes.
3.3 Simple Usage of MasterSlave()
This section is concerned with formal definitions for the routines
associated with ParGAP. It is important to keep in mind the
pseudo-code of Chapter Basic Concepts for the TOP-C model (MasterSlave). Since MasterSlave()
uses all the ParGAP processes,
the user must invoke it on all processes. This is typically done through
some function provided by the slave listener layer, such as ParEval()
(see ParEval). It may be instructive for the reader to run ParGAP
and type MSexample();
now, or else to look at some examples of
ParGAP applications in the section MasterSlave Tutorial. This
demonstrates the use of MasterSlave()
in a typical session.
The four functions written by the application writer are:
SubmitTaskInput
()
, DoTask
()
, CheckTaskResult
()
, and
UpdateSharedData
()
. DoTask
()
is executed on a slave.
SubmitTaskInput
()
and CheckTaskResult
()
are executed on the
master, where a taskInput is generated and a corresponding taskOutput
is received. Finally, UpdateSharedData
()
is executed on all
processes. ParGAP arranges to automatically pass taskInput and
taskOutput between the master and a slave.
Since the single master process is responsible for generating all
taskInputs and receiving all taskOutputs, it is critical that
computation on the master process should not become a bottleneck for a
well-designed ParGAP application. Accordingly, the application writer
should arrange for SubmitTaskInput
()
and CheckTaskResult
()
to
execute quickly, even if this means additional computation by
DoTask
()
or UpdateSharedData
()
.
As seen in the examples, SubmitTaskInput
()
may use global variables on
the master to ``remember'' the last taskInput or other state
information. Note that such global variables cannot be part of the
shared data, since they are modified outside of UpdateSharedData
()
.
3.4 Efficient Parallelism in MasterSlave() using CheckTaskResult()
It is instructive to review the logic for the lifetime of a task, as
described by the pseudo-code for SeqMasterSlave
in Section Other TOP-C commands. Initially, MasterSlave()
calls SubmitTaskInput
()
on the
master, which returns an application-defined GAP object, taskInput.
MasterSlave()
then copies taskInput to an arbitrary slave process,
and MasterSlave()
then calls DoTask
(
taskInput )
on the slave.
This returns an application-defined GAP object, taskOutput, which
MasterSlave()
copies to the master process. On the master,
MasterSlave()
then calls CheckTaskResult
(
taskInput,
taskOutput
)
, which returns an action. (Recall that taskInput, taskOutput and
CheckTaskResult
()
are defined by the application writer, and so an
application program may give them different names.)
There are four possible actions (ParGAP constants): NO_ACTION
,
UPDATE_ACTION
, REDO_ACTION
, CONTINUATION_ACTION(
taskContinuation
)
. A standard language idiom in ParGAP is to define
CheckTaskResult
()
as the ParGAP function
DefaultCheckTaskResult()
, whose code is as follows:
DefaultCheckTaskResult := function( taskOutput, taskInput ) if taskOutput = false then return NO_ACTION; elif IsUpToDate() then return UPDATE_ACTION; else return REDO_ACTION; fi; end;
In the simplest case, CheckTaskResult
()
returns NO_ACTION
, in which
case there is no further computation related to the original taskInput.
CheckTaskResult
()
may record global information on the master
process, based on the taskOutput, but the shared data, and hence the
state of the slave processes, will not be modified.
In the second most common case, CheckTaskResult
()
returns
UPDATE_ACTION
. This action causes MasterSlave()
to call
UpdateSharedData
(
taskOutput,
taskInput )
on all processes
(master and slaves). This is the only way in which the shared data can
be modified by a correct ParGAP program.
In the third most common case, CheckTaskResult
()
returns REDO_ACTION
.
When a REDO_ACTION
action is generated, the value of taskInput is
re-sent to the same slave that executed DoTask
(
taskInput )
for the
current task. An application will typically invoke REDO_ACTION
if the
shared data has changed, and this changed shared data will produce a new
taskOutput. As before, DoTask
()
then returns a new value of
taskOutput. Then, taskInput and the new taskOutput are again passed
to CheckTaskResult
()
.
Note that MasterSlave()
guarantees that REDO_ACTION
causes the task
to be re-sent to the same slave process. This allows the application to
cache in a global variable some information computed by the first
invocation of DoTask
()
. A second invocation of DoTask
()
caused by
the REDO_ACTION
allows the task to test if the taskInput is the same
as the last invocation. In that case, the application-defined
DoTask
()
routine can recognize that this is a REDO_ACTION
, and it
can take advantage of the cached global variable to avoid re-computing
certain quantities that would not be changed by the altered shared data.
In order to make this strategy possible, MasterSlave()
also guarantees
that in the case of REDO_ACTION
, the slave process will not have seen
any intervening calls to DoTask
()
with values of taskInput
other
than the current value.
In typical usage, the application-defined routine, CheckTaskResult
()
,
will first call IsUpToDate()
. IsUpToDate()
tests if the shared data
has been modified since the current taskInput corresponding to
CheckTaskResult
()
was originally generated by SubmitTaskInput
()
.
The times of the relevant events are recorded as when seen on the master
process. It is an error to call IsUpToDate()
outside of a call to
CheckTaskResult
()
by MasterSlave()
. IsUpToDate()
returns a
boolean value, true
or false
.
The last possible action, CONTINUATION_ACTION(
taskContinuation )
, is
provided for unusual cases. As with advice about the use of ``goto'', it
is recommended to avoid CONTINUATION_ACTION()
where possible.
A favorite aphorism of this author is, ``The source code is the ultimate
documentation''. With this in mind, the reader may also wish to read
lib/masslave.g
, for which readability of the code was one of the design
criteria.
3.5 Modifying Task Output or Input (a dirty trick)
At this point, it should be noted that it explicitly is allowed to
modify the input or output of a task from within CheckTaskResult
()
.
This is not recommended in general, but there may be times when
CheckTaskResult
()
returns an UPDATE_ACTION
and must also be used to
pass additional information to UpdateSharedData
()
. In order to modify
a previous input or output, it is important that the application has
chosen a representation of the input or output as a list or record, which
can be modified in place, such that the code excerpt succeeds without
error.
oldOutput := taskOutput; # Modify taskOutput here if ( IsIdenticalObj( oldOutput, taskOutput ) ) = false then Error( "MasterSlave() will see only oldOutput, not current taskOutput" ); fi; return UPDATE_ACTION;
In principle, a dirty trick like this would also work in the case of
returning a REDO_ACTION
. However, this is not recommended. For that
functionality, the code will be clearer if an explicit
CONTINUATION_ACTION(
modifiedTaskOutput )
is returned. See
Section The GOTO statement of the TOP-C model for further discussion on
the use of CONTINUATION_ACTION()
.
CONTINUATION_ACTION(
taskContinuation )
The CONTINUATION_ACTION()
, like the goto statement, is not
recommended for ordinary programs, but it may be useful in unusual
circumstances. This is a parametrized action. When the application
routine CheckTaskResult
()
returns this action, MasterSlave()
guarantees to invoke DoTask
()
on the same slave process as for the
original task. There will have been no intervening calls to DoTask
()
on that slave, although there may have been an intervening call to
UpdateSharedData
()
on that slave.
This action allows arbitrary, repeated communication between the master
and a single slave process. The slave process executes DoTask
(
taskInput )
and communicates with the master by returning a
taskOutput. The master process executes CheckTaskResult
(
taskInput,
taskOutput )
and returns a taskContinuation. The
original slave process then receives another call to DoTask
(
taskInput )
, this time with taskInput bound to taskContinuation.
3.7 Being nice to other users (Nice, Alarm and LimitRss)
When you are running a long job on a network of workstations, you will often be sharing it with others. Making your parallel job as unintrusive as possible will leave you with a warmer welcome the next time that you want to use that network of workstations. Accordingly, three useful functions are provided.
UNIX_Nice(
priority ) F
This is similar to the nice
command of many UNIX shells. UNIX
priorities are in a range from -20 to 20 with -20 being the highest.
Users typically start at priority 0. You can give yourself a lower
priority by specifying a priority of 5, for example. Usually, priorities
19 and 20 are absolute priorities. Any process with a priority higher
than 19 that wishes to run will always have precedence. Other priorities
are relative priorities. Your process will still receive some CPU time
even if other processes with higher priorities are running. You can set
your priority lower, but you cannot raise it back to its original value
after that. The return value is the previous priority of your process.
UNIX_Alarm(
seconds ) F
This causes the process to kill itself after that many seconds. This is a
useful safety measure, since it is unfortunately too easy for a runaway
slave process to continue if the master process is killed without the
normal quit;
. You might consider adding something like UNIX_Alarm(
25000 );
(about 6 hours) to your .gaprc
file. Executing UNIX_Alarm( 0
);
cancels any previous alarm. The return value is the number of seconds
remaining under the previous setting of the alarm.
UNIX_LimitRss(
size ) [ = setrlimit(RLIMIT_RSS, ...) ] F
Many dialects of UNIX (and their shells) offer a limit
or ulimit
command to limit the resources available to the shell. This command
limits the size of the RSS (resident set size), or the amount of physical
RAM used by your process. The size limit is in bytes. Unfortunately, some
UNIX dialects may not allow or even silently ignore this request to limit
the RSS. A UNIX command such as top
can show you if your process RSS is
staying below your requested limit.
3.8 Converting legacy sequential code to the TOP-C model
The (tutorial contains a section Raw MasterSlave (ParMultMat revisited),
about raw version of MasterSlave()
that is useful for converting
legacy sequential code to the TOP-C model. However, that model is not
recommended for writing new code, for stylistic reasons.
[Up] [Previous] [Next] [Index]
ParGAP manual