Execution of Communication Operations¶
Communication operations are executed by CCL worker threads (workers). The number of workers is controlled by the CCL_WORKER_COUNT environment variable.
Workers affinity is controlled by CCL_WORKER_AFFINITY.
By setting workers affinity you can specify which CPU cores are used to host CCL workers. The general rule of thumb is to use different CPU cores for compute (e.g. by specifying KMP_AFFINITY
) and for communication.
There are two ways to set workers affinity: explicit and automatic.
Explicit setup¶
To set affinity explicitly, pass ID of the cores to be bound to to the CCL_WORKER_AFFINITY
environment variable.
Example
In the example below, oneCCL creates 4 threads and pins them to cores with numbers 3, 4, 5, and 6, respectively:
export CCL_WORKER_COUNT=4
export CCL_WORKER_AFFINITY=3,4,5,6
Automatic setup¶
Note
Automatic pinning only works if application is launched using mpirun
provided by the oneCCL distribution package.
To set affinity automatically, set CCL_WORKER_AFFINITY
to auto
.
Example
In the example below, oneCCL creates four threads and pins them to the last four cores available for the process launched:
export CCL_WORKER_COUNT=4
export CCL_WORKER_AFFINITY=auto
Note
The exact IDs of CPU cores depend on the parameters passed to mpirun
.