The GAMMA communication library provides functions for process grouping, point-to-point communication, and collective communications at the application level. Both C and FORTRAN calls are provided. Here we describe only the C interface.
This is a list of the GAMMA library functions and variables.
Initiate/terminate parallel section of a job: | |
gamma_init() | gamma_exit() |
Set up communication ports: |
|
gamma_set_active_port() | gamma_set_passive_port() |
gamma_post_recv() | |
Send routines, blocking: |
|
gamma_send() | gamma_send_flowctl() |
gamma_send_2p() | gamma_send_2p_flowctl() |
Send routines, non-blocking: |
|
gamma_isend() | gamma_isend_flowctl() |
gamma_isend_2p() | gamma_isend_2p_flowctl() |
gamma_wsend() | gamma_tsend() |
Synchronize on message arrivals: |
|
gamma_signal() | gamma_sigerr() |
gamma_wait() | gamma_test() |
Miscellaneous: |
|
gamma_atomic() | gamma_sync() |
gamma_my_par_pid() | gamma_my_node() |
gamma_how_many_nodes() | gamma_mlock() |
gamma_munlock() | gamma_munlockall() |
gamma_time() | gamma_time_diff() |
gamma_active_port | gamma_msglen |
GAMMA functions are built on top of a small set of custom system calls, activated using the trap address 0x81, which traps down to kernel in the GAMMA device driver through a short and fast code path.
Each library function, with the exception of gamma_time() and gamma_time_diff(), returns a negative integer value in case of error, and a non-negative integer value in case of successful completion.
The programming interface is currently defined as follows:
int gamma_init ( unsigned char num_nodes, int argc, char **argv );
As a sequential user process P invokes it, a process group called virtual GAMMA is activated. The group is composed of process P, running on the local workstation, plus additional num_nodes-1 processes identical to P launched on num_nodes-1 distinct remote workstations (chosen by those one found in file /etc/gamma.conf) via ``rsh'' command.
Hence after having invoked gamma_init() the invoking user process P is replicated on num_nodes workstations in the cluster, thus forming a running SPMD parallel application.
The process replicas themselves eventually invoke gamma_init(), but this time the effect is that of registering themselves with the created group, without creating new ones.
A positive number called ``parallel pid'' uniquely identifies the newly created process group in the cluster.
Note that nothing prevents two independent user processes P and Q to invoke gamma_init() separately from one another. This will result into the creation of two distinct GAMMA process groups in the same cluster, each with a distinct ``parallel pid''. The two groups may share some or even all the available workstations in the cluster, but cannot share processes.
Currently invoking gamma_init() with num_nodes less than or equal to zero or greater than the total number of workstations connected to the cluster has the same effect as num_nodes were equal to the total number of workstations connected in the cluster.
int gamma_exit (void);
int gamma_set_active_port ( unsigned short port, unsigned short dest_node, unsigned char dest_par_pid, unsigned short dest_port, void (*receiver_handler)(void), unsigned short semaphore, unsigned char buffer_kind, void *destination_buffer, unsigned long buffer_len );
int gamma_set_passive_port ( unsigned short port, unsigned short dest_node, unsigned char dest_par_pid, unsigned short dest_port, unsigned short semaphore, unsigned char buffer_kind, void *destination_buffer, unsigned long buffer_len );
The communication port may be programmed for output, input, or both.
An output port must be bound to an input port of a remote receiver process which outgoing messages are to be delivered to. Such remote port is fully specified by the triple dest_node (instance number of the receiver process), dest_par_pid (``parallel pid'' of the process group which the receiver process belongs to), and dest_port (a specific input port of the receiver process). Note that inter-group communication is allowed.It is not allowed for a process to connect a port to itself for output.
Parameter dest_node may be set to the constant BROADCAST. In this case, each message transmitted through the port will be broadcast to each process in the group specified by dest_par_pid (excluding the sender itself). Each receiver process will get the message through its local port specified by dest_port.
An input port must be bound to a destination buffer, a notification semaphore, and a receiver handler (active ports only).
The destination buffer is a contiguous virtual memory region in application space; its size in bytes is specified by buffer_len. Any non-empty message arriving to the port will be stored in such buffer. Specifying a destination buffer is mandatory only if non-empty messages are to be received.
Many common data structures (for instance, arrays) span contiguous regions in virtual memory space, therefore in most cases there is no need of providing separate buffers for incoming messages.
If the current message fits the destination buffer exactly, the next message hitting the same port will be stored at the beginning of the same buffer, thus overwriting the current one (unless the port has been bound to a different destination buffer meanwhile).
If a message arrives which is larger than the destination buffer, then the message is truncated to fit the buffer.
If the current message is shorter than the destination buffer, and the port has not been bound to a different destination buffer before a new message hits the port, then the next message will be stored in the same destination buffer; either contiguous next to the previous message (in case buffer_kind is set to GO_AHEAD), or at the beginning of the buffer itself (in case buffer_kind is set to GO_BACK) The former mode helps building gather-like communication patterns; in such a case, however, if the new message is larger than the remaining room in the destination buffer, it is truncated to fit the buffer.
The receiver handler is an application-defined function, which will be executed each time a new message hits a port, provided the port has been set up by invoking the gamma_set_active_port() routine. Empty messages hitting the port will trigger the receiver handler as well.
The receiver handler will run after the message body (if any) has been copied to the destination buffer (if any). New messages hitting the port will not be stored into the destination buffer before the receiver handler has run to completion.
A receiver handler should not loop for ever, and may invoke any GAMMA call in turn (however, invocation of GAMMA flow controlled send routines may lead to a deadlock).
In order to allow a receiver process to synchronize to input events (message arrivals, handlers activities) in a safe way, GAMMA provides 1025 per-process notification semaphores numbered from 0 to 1024. Semaphores 1023 and 1024 are reserved to GAMMA collective routines (broadcast and barrier synchronization respectively). Each port being used for input must be associated to one such semaphore. Each time a message hits the port, its semaphore get incremented by one. Additionally, receiver handlers may also increment other semaphores, if programmed to do so, by invoking gamma_signal(). A receiver process can wait upon message arrivals or handlers activities by invoking gamma_wait() or gamma_test(). Semaphores are initialized to zero by gamma_init().
Recall that any GAMMA port can be programmed to be output and input simultaneously, provided the correct parameters are passed to the gamma_set_active_port() gamma_set_passive_port() routines. The actual use of a port as an input or output one depends on its use by the application.
int gamma_post_recv ( unsigned short input_port, void *destination_buffer, unsigned long buffer_len );
This is a low-overhead alternative to the gamma_set_active_port() and gamma_set_passive_port() functions. It does not require invoking any system call, as the buffer address and size are actually kept in the user data segment. Its intended use is within receiver handlers, in order to prepare a fresh application-space buffer for incoming messages after having consumed the previous one.
int gamma_send ( unsigned short output_port, void *data, unsigned long len );
int gamma_send_flowctl ( unsigned short output_port, void *data, unsigned long len );
int gamma_send_2p ( unsigned short output_port, void *data1, unsigned long len1, void *data2, unsigned long len2, );
int gamma_send_2p_flowctl ( unsigned short output_port, void *data1, unsigned long len1, void *data2, unsigned long len2, );
int gamma_isend ( unsigned short output_port, void *data, unsigned long len );
int gamma_isend_flowctl ( unsigned short output_port, void *data, unsigned long len );
int gamma_isend_2p ( unsigned short output_port, void *data1, unsigned long len1, void *data2, unsigned long len2, );
int gamma_isend_2p_flowctl ( unsigned short output_port, void *data1, unsigned long len1, void *data2, unsigned long len2, );
The memory region(s) referred to by these non-blocking send routines should have previously been locked and prefetched in physical RAM (see gamma_mlock()).
int gamma_wsend ( unsigned long handle );
int gamma_tsend ( unsigned long handle );
int gamma_signal ( unsigned short sem );
Semaphores are initialized to zero by gamma_init().
gamma_signal(sem) causes semaphore sem to be atomically incremented by one.
Typically such function is issued by a receiver handler in order to notify the arrival of a message to the main thread of the receiver process.
int gamma_sigerr ( unsigned short sem );
Error semaphores are initialized to zero by gamma_init().
gamma_sigerr(sem) causes error semaphore sem to be atomically incremented by one.
Typically such function is issued by a receiver handler in order to notify a receive anomaly to the main thread of a process.
int gamma_wait ( unsigned short sem, unsigned long n );
Typically such function is invoked by a process waiting for message arrivals. Semaphore sem is typically incremented by some receiver handler issuing gamma_signal(). During the busy-waiting the NIC is polled for incoming frames so as to speed up message arrivals by avoiding IRQ overheads. However this is only an optimization, which does not change the semantics.
On return, gamma_wait() yields zero if no receive errors were encountered, otherwise it yields a negative number whose absolute value is the count of how many times the function gamma_sigerr has been issued on error semaphore sem since last run of gamma_wait.
int gamma_test ( unsigned short sem );
int gamma_atomic ( void (*funct)(void) );
int gamma_sync (void);
Exploiting a 2 tokens synchronization mechanism, the GAMMA implementation of this collective communication primitive achieves best performance over shared Fast Ethernet channels.
int gamma_my_par_pid (void);
int gamma_my_node (void);
The programming paradigm supported by GAMMA is Single Program Multiple Data (SPMD). In this paradigm, each process may differentiate its behaviour by testing its own instance number.
int gamma_how_many_nodes (void);
int gamma_mlock ( void *buffer, unsigned long len );
Usually such a contiguous memory region is a store for outgoing messages to be sent by a non-blocking, zero-copy send routine. It must be pre-fetched and locked into physical RAM in order for the DMA engine of the network adapter not to upload unexistent pages on transmission.
gamma_mlock() adds the pre-fetch functionality to the standard UNIX mlock() function.
int gamma_munlock ( void *buffer, unsigned long len );
int gamma_munlockall (void);
void gamma_time(time_586 t);
Register TSC is incremented by one at each CPU clock tick, so this function is useful for time measurements involved in performance evaluations.
double gamma_time_diff(time_586 b, time_586 a);
Currently the conversion from CPU clock ticks to microseconds requires a constant named CLOCK to be set to the CPU clock frequency in MHz before compiling the GAMMA library. More information in the README file enclosed with the GAMMA source code.
int gamma_active_port;
int gamma_msglen;