3 Basic Concepts for the TOP-C model (MasterSlave)

Sections

TOP-C stands for Task-Oriented Parallel C Coo96. The ``TOP-C model'' is the specific master slave model implemented here. That model has been adapted for use in ParGAP. The implementation is in masslave.g in ParGAP's lib directory. Note that the functions and variables with names TOPC... are intended as internal functions only, and should not be used by the GAP programmer.

For the impatient, you may type MSexample(); in a ParGAP session now. If you prefer further hands-on learning in a tutorial style, you may wish to next read Chapter MasterSlave Tutorial. Eventually, if you wish a deeper understanding of the TOP-C model, you will need to read this current section and those that follow.

The initial GAP process is the master process, and all others are slave processes. It allows most of the CPU-intensive computations to be carried out on slave processes, which typically reside on remote processors. A well-developed TOP-C application should find that the master process is almost never busy when a slave process is idle, waiting for a new computation to carry out. This provides a natural way of maximizing utilization and load balancing.

The TOP-C model depends on three concepts:

the task:: a function that takes an arbitrary object as its single argument, reads some or all of the global shared data, and then returns an arbitrary object as its value. The task typically corresponds to the inner loop of a typical application.
the shared data:: global data, shared among all processes. This data can be read as part of the computation of a task. However, after initialization of the shared data, this data must be written (modified) only by a particular user-provided application routine, UpdateSharedData().
the action:: After the output of a task has been produced, an application routine must choose one of four actions to determine how the output is used.

The task input is defined to be the argument of the task (considered as a function), and the task output is the return value of the task.

3.1 Basic TOP-C (Master-Slave) commands

There is only one core TOP-C command, a utility function, and several constants. A TOP-C command must be evaluated on the master and on all slaves. We shall describe the commands in detail in the following sections, but a short list of the essentials and a small example will be helpful to set the context.

MasterSlave( SubmitTaskInput,DoTask[,CheckTaskResult[,UpdateSharedData[,taskAgglomCount]]] ) F

See Section Other TOP-C Commands for a description of MasterSlave.

NOTASK V

NO_ACTION V

UPDATE_ACTION V

REDO_ACTION V

CONTINUATION_ACTION( taskContinuation ) F

CONTINUATION_ACTION() is described in Section The GOTO statement of the TOP-C model.

IsUpToDate() F

ParInstallTOPCGlobalFunction( string, function ) F

ParInstallTOPCGlobalFunction( gvar, function ) F

A short example shows one possible implementation of ParList().

gap> ParInstallTOPCGlobalFunction( "MyParListWithAgglom",
> function( list, fnc )
>   local result, i;
>   result := []; i := 0;
>   MasterSlave( function() if i >= Length(list) then return NOTASK;
>                           else i := i+1; return i; fi; end,
>                fnc,
>                function(input,output) result[input] := output;
>                                       return NO_ACTION; end,
>                Error
>              );
>   return result;
> end );

(Of course rather than type such code in a ParGAP session it's generally more convenient to have it in a file and Read it in.)

3.2 Other TOP-C Commands

A master-slave computation is invoked when a GAP program issues the command MasterSlave(). As given earlier, the typical form is:

MasterSlave( SubmitTaskInput, DoTask [, CheckTaskResult[, UpdateSharedData[, taskAgglom]]] )

where the first four arguments of MasterSlave() are also functions, but they must be defined by the application writer. Their calling syntax is defined by the following GAP code, which also provides a simplified description of how a sequential (non-parallel) MasterSlave() would invoke these functions if there were only a single process. (A more sophisticated version of this routine is provided in ParGAP to allow one to debug within a single process first.) The use of the fifth argument, taskAgglom, is deferred until section Agglomerating tasks for efficiency (ParSemiEchelonMat revisited again).

In this section, we define MasterSlave() and describe the use of its four arguments in a purely sequential environment. The issues of parallelism and passing of messages between processes is covered in the next section. The call to MasterSlave() in ParGAP, above, will have the same result as if MasterSlave() were defined equivalently to SeqMasterSlave() below, and then run in a standard, sequential GAP (a single process). The next section describes the multi-process implementation of MasterSlave() in ParGAP, in which taskInput is computed on the master process and sent as a message to a slave process, while taskOutput is computed on a slave process and sent as a message to the master process.

SeqMasterSlave :=
  function(SubmitTaskInput, DoTask, CheckTaskResult, UpdateSharedData)
  local taskInput, taskOutput, action;
  while true do
    taskInput := SubmitTaskInput();
    if taskInput = NOTASK then break; fi;
    repeat
      taskOutput := DoTask( taskInput );
      action := CheckTaskResult( taskOutput, taskInput );
    until action <> REDO_ACTION;
    if action = UPDATE_ACTION then
      # Modify the shared data (global data structures) here
      # Called on all processes, master and slaves
      UpdateSharedData( taskOutput, taskInput );
    fi;
  od;
end;

One can also follow the life of a single task in a multi-processing environment through the diagram below.

               MASTER               |               SLAVE
_________________________________________________________________________
                                    |  
      +---------------------+       |
      | GenerateTaskInput() |       |
      +---------------------+       |
                             \input |
                              \______________
                                    |        |
                                    |        v
                                    |       +---------------+
                                    |       | DoTask(input) |
                                    |       +---------------+
                                    |output/  ^
                                 _________/   |
                                |   |         |
                                v   |         |
 +--------------------------------+ |         |
 | CheckTaskResult(input, output) |___________|
 +--------------------------------+ (if action == REDO)
                         |          |
                         | (if action == UPDATE)
                         v          |
                    +---------------------------------+
                    | UpdateSharedData(input, output) |
                    +---------------------------------+
                                    |
                         TOP-C Programmers' Model
                          (Life Cycle of a Task)

Although not explicit in the code, the application writer should add comments to define the shared data. The shared data is defined as a global data structure that is treated as ``read-write'' by UpdateSharedData(), while being treated as ``read-only'' by SubmitTaskInput(), DoTask(), and CheckTaskResult(). Note also that an application writer may use different names for the four functions SubmitTaskInput(), etc. It is only a convention within this manual to give those functions the names, above. Similarly, taskInput, taskOutput and action are the conventional names used in this manual, and a given application may use different names.

In a correct ParGAP application, the shared data should be initialized to the same value on all processes before the application calls MasterSlave(). MasterSlave() is then called on all processes. After that, the shared data can be modified only by a call to UpdateSharedData(), and MasterSlave() arranges for each call to UpdateSharedData() to be executed on all processes. Further, UpdateSharedData() has access only to taskInput, taskOutput, and the previous value of the shared data. Thus, MasterSlave() maintains the same shared data uniformly on all processes.

3.3 Simple Usage of MasterSlave()

This section is concerned with formal definitions for the routines associated with ParGAP. It is important to keep in mind the pseudo-code of Chapter Basic Concepts for the TOP-C model (MasterSlave). Since MasterSlave() uses all the ParGAP processes, the user must invoke it on all processes. This is typically done through some function provided by the slave listener layer, such as ParEval() (see ParEval). It may be instructive for the reader to run ParGAP and type MSexample(); now, or else to look at some examples of ParGAP applications in the section MasterSlave Tutorial. This demonstrates the use of MasterSlave() in a typical session.

The four functions written by the application writer are: SubmitTaskInput(), DoTask(), CheckTaskResult(), and UpdateSharedData(). DoTask() is executed on a slave. SubmitTaskInput() and CheckTaskResult() are executed on the master, where a taskInput is generated and a corresponding taskOutput is received. Finally, UpdateSharedData() is executed on all processes. ParGAP arranges to automatically pass taskInput and taskOutput between the master and a slave.

Since the single master process is responsible for generating all taskInputs and receiving all taskOutputs, it is critical that computation on the master process should not become a bottleneck for a well-designed ParGAP application. Accordingly, the application writer should arrange for SubmitTaskInput() and CheckTaskResult() to execute quickly, even if this means additional computation by DoTask() or UpdateSharedData().

As seen in the examples, SubmitTaskInput() may use global variables on the master to ``remember'' the last taskInput or other state information. Note that such global variables cannot be part of the shared data, since they are modified outside of UpdateSharedData().

3.4 Efficient Parallelism in MasterSlave() using CheckTaskResult()

It is instructive to review the logic for the lifetime of a task, as described by the pseudo-code for SeqMasterSlave in Section Other TOP-C commands. Initially, MasterSlave() calls SubmitTaskInput() on the master, which returns an application-defined GAP object, taskInput. MasterSlave() then copies taskInput to an arbitrary slave process, and MasterSlave() then calls DoTask( taskInput ) on the slave. This returns an application-defined GAP object, taskOutput, which MasterSlave() copies to the master process. On the master, MasterSlave() then calls CheckTaskResult( taskInput, taskOutput), which returns an action. (Recall that taskInput, taskOutput and CheckTaskResult() are defined by the application writer, and so an application program may give them different names.)

There are four possible actions (ParGAP constants): NO_ACTION, UPDATE_ACTION, REDO_ACTION, CONTINUATION_ACTION( taskContinuation). A standard language idiom in ParGAP is to define CheckTaskResult() as the ParGAP function DefaultCheckTaskResult(), whose code is as follows:

DefaultCheckTaskResult := function( taskOutput, taskInput )
  if taskOutput = false then return NO_ACTION;
  elif IsUpToDate() then return UPDATE_ACTION;
  else return REDO_ACTION;
  fi;
end;

In the simplest case, CheckTaskResult() returns NO_ACTION, in which case there is no further computation related to the original taskInput. CheckTaskResult() may record global information on the master process, based on the taskOutput, but the shared data, and hence the state of the slave processes, will not be modified.

In the second most common case, CheckTaskResult() returns UPDATE_ACTION. This action causes MasterSlave() to call UpdateSharedData( taskOutput, taskInput ) on all processes (master and slaves). This is the only way in which the shared data can be modified by a correct ParGAP program.

In the third most common case, CheckTaskResult() returns REDO_ACTION. When a REDO_ACTION action is generated, the value of taskInput is re-sent to the same slave that executed DoTask( taskInput ) for the current task. An application will typically invoke REDO_ACTION if the shared data has changed, and this changed shared data will produce a new taskOutput. As before, DoTask() then returns a new value of taskOutput. Then, taskInput and the new taskOutput are again passed to CheckTaskResult().

Note that MasterSlave() guarantees that REDO_ACTION causes the task to be re-sent to the same slave process. This allows the application to cache in a global variable some information computed by the first invocation of DoTask(). A second invocation of DoTask() caused by the REDO_ACTION allows the task to test if the taskInput is the same as the last invocation. In that case, the application-defined DoTask() routine can recognize that this is a REDO_ACTION, and it can take advantage of the cached global variable to avoid re-computing certain quantities that would not be changed by the altered shared data. In order to make this strategy possible, MasterSlave() also guarantees that in the case of REDO_ACTION, the slave process will not have seen any intervening calls to DoTask() with values of taskInput other than the current value.

In typical usage, the application-defined routine, CheckTaskResult(), will first call IsUpToDate(). IsUpToDate() tests if the shared data has been modified since the current taskInput corresponding to CheckTaskResult() was originally generated by SubmitTaskInput(). The times of the relevant events are recorded as when seen on the master process. It is an error to call IsUpToDate() outside of a call to CheckTaskResult() by MasterSlave(). IsUpToDate() returns a boolean value, true or false.

The last possible action, CONTINUATION_ACTION( taskContinuation ), is provided for unusual cases. As with advice about the use of ``goto'', it is recommended to avoid CONTINUATION_ACTION() where possible.

A favorite aphorism of this author is, ``The source code is the ultimate documentation''. With this in mind, the reader may also wish to read lib/masslave.g, for which readability of the code was one of the design criteria.

3.5 Modifying Task Output or Input (a dirty trick)

At this point, it should be noted that it explicitly is allowed to modify the input or output of a task from within CheckTaskResult(). This is not recommended in general, but there may be times when CheckTaskResult() returns an UPDATE_ACTION and must also be used to pass additional information to UpdateSharedData(). In order to modify a previous input or output, it is important that the application has chosen a representation of the input or output as a list or record, which can be modified in place, such that the code excerpt succeeds without error.

oldOutput := taskOutput;
# Modify taskOutput here
if ( IsIdenticalObj( oldOutput, taskOutput ) ) = false then
  Error( "MasterSlave() will see only oldOutput, not current taskOutput" );
fi;
return UPDATE_ACTION;

In principle, a dirty trick like this would also work in the case of returning a REDO_ACTION. However, this is not recommended. For that functionality, the code will be clearer if an explicit CONTINUATION_ACTION( modifiedTaskOutput ) is returned. See Section The GOTO statement of the TOP-C model for further discussion on the use of CONTINUATION_ACTION().

3.6 The GOTO statement of the TOP-C model

CONTINUATION_ACTION( taskContinuation )

The CONTINUATION_ACTION(), like the goto statement, is not recommended for ordinary programs, but it may be useful in unusual circumstances. This is a parametrized action. When the application routine CheckTaskResult() returns this action, MasterSlave() guarantees to invoke DoTask() on the same slave process as for the original task. There will have been no intervening calls to DoTask() on that slave, although there may have been an intervening call to UpdateSharedData() on that slave.

This action allows arbitrary, repeated communication between the master and a single slave process. The slave process executes DoTask(taskInput ) and communicates with the master by returning a taskOutput. The master process executes CheckTaskResult(taskInput, taskOutput ) and returns a taskContinuation. The original slave process then receives another call to DoTask(taskInput ), this time with taskInput bound to taskContinuation.

3.7 Being nice to other users (Nice, Alarm and LimitRss)

When you are running a long job on a network of workstations, you will often be sharing it with others. Making your parallel job as unintrusive as possible will leave you with a warmer welcome the next time that you want to use that network of workstations. Accordingly, three useful functions are provided.

UNIX_Nice( priority ) F

This is similar to the nice command of many UNIX shells. UNIX priorities are in a range from -20 to 20 with -20 being the highest. Users typically start at priority 0. You can give yourself a lower priority by specifying a priority of 5, for example. Usually, priorities 19 and 20 are absolute priorities. Any process with a priority higher than 19 that wishes to run will always have precedence. Other priorities are relative priorities. Your process will still receive some CPU time even if other processes with higher priorities are running. You can set your priority lower, but you cannot raise it back to its original value after that. The return value is the previous priority of your process.

UNIX_Alarm( seconds ) F

This causes the process to kill itself after that many seconds. This is a useful safety measure, since it is unfortunately too easy for a runaway slave process to continue if the master process is killed without the normal quit;. You might consider adding something like UNIX_Alarm( 25000 ); (about 6 hours) to your .gaprc file. Executing UNIX_Alarm( 0 ); cancels any previous alarm. The return value is the number of seconds remaining under the previous setting of the alarm.

UNIX_LimitRss( size ) [ = setrlimit(RLIMIT_RSS, ...) ] F

Many dialects of UNIX (and their shells) offer a limit or ulimit command to limit the resources available to the shell. This command limits the size of the RSS (resident set size), or the amount of physical RAM used by your process. The size limit is in bytes. Unfortunately, some UNIX dialects may not allow or even silently ignore this request to limit the RSS. A UNIX command such as top can show you if your process RSS is staying below your requested limit.

3.8 Converting legacy sequential code to the TOP-C model

The (tutorial contains a section Raw MasterSlave (ParMultMat revisited), about raw version of MasterSlave() that is useful for converting legacy sequential code to the TOP-C model. However, that model is not recommended for writing new code, for stylistic reasons.

[Up] [Previous] [Next] [Index]

ParGAP manual
May 2002