Distributed Processing¶
This mode assumes that the data set is split into nblocks
blocks across computation nodes.
Parameters¶
Centroid initialization for KMeans clustering in the distributed processing mode has the following parameters:
Parameter 
Method 
Default Valude 
Description 


any 
Not applicable 
The parameter required to initialize the algorithm. Can be:


any 

The floatingpoint type that the algorithm uses for intermediate computations. Can be 

Not applicable 

Available initialization methods for KMeans clustering:
For more details, see the algorithm description. 

any 
Not applicable 
The number of centroids. Required. 

any 
\(0\) 
The total number of rows in all input data sets on all nodes. Required in the distributed processing mode in the first step. 

any 
Not applicable 
Offset in the total data set specifying the start of a block stored on a given local node. Required. 


\(0.5\) 
A fraction of 


\(5\) 
The number of rounds for parallel KMeans++. \(L * \mathrm{nRounds}\) must be greater than 



Set to true if 



Set to true if 
Centroid initialization for KMeans clustering follows the general schema described in Algorithms.
Step 1  on Local Nodes (deterministic
, random
, plusPlus
, and parallelPlus
methods)¶
In this step, centroid initialization for KMeans clustering accepts the input described below.
Pass the Input ID
as a parameter to the methods that provide input for your algorithm.
For more details, see Algorithms.
Input ID 
Input 


Pointer to the \(n_i \times p\) numeric table that represents the \(i\)th data block on the local node. Note While the input for 
In this step, centroid initialization for KMeans clustering calculates the results described below.
Pass the Result ID
as a parameter to the methods that access the results of your algorithm.
For more details, see Algorithms.
Result ID 
Result 


Pointer to the \(\mathrm{nClusters} \times p\) numeric table with the centroids computed on the local node. Note By default, this result is an object of the 
Step 2  on Master Node (deterministic
and random
methods)¶
This step is applicable for deterministic
and random
methods only.
Centroid initialization for KMeans clustering accepts the input from each local node described below.
Pass the Input ID
as a parameter to the methods that provide input for your algorithm.
For more details, see Algorithms.
Input ID 
Input 


A collection that contains results computed in Step 1 on local nodes (two numeric tables from each local node). 
In this step, centroid initialization for KMeans clustering calculates the results described below.
Pass the Result ID
as a parameter to the methods that access the results of your algorithm.
For more details, see Algorithms.
Result ID 
Result 


Pointer to the \(\mathrm{nClusters} \times p\) numeric table with centroids. Note By default, this result is an object of the 
Step 2  on Local Nodes (plusPlus
and parallelPlus
methods)¶
This step is applicable for plusPlus
and parallelPlus
methods only.
Centroid initialization for KMeans clustering accepts the input from each local node described below.
Pass the Input ID
as a parameter to the methods that provide input for your algorithm.
For more details, see Algorithms.
Input ID 
Input 


Pointer to the \(n_i \times p\) numeric table that represents the \(i\)th data block on the local node. Note While the input for 

Pointer to the \(m \times p\) numeric table with the centroids calculated in the previous steps (Step 1 or Step 4). The value of \(m\) is defined by the method and iteration of the algorithm:
This input can be an object of any class derived from 

Pointer to the 
In this step, centroid initialization for KMeans clustering calculates the results described below.
Pass the Result ID
as a parameter to the methods that access the results of your algorithm.
For more details, see Algorithms.
Result ID 
Result 


Pointer to the \(1 \times 1\) numeric table that contains the overall error accumulated on the node. For a description of the overall error, see KMeans Clustering Details. 

Applicable for 
Note
By default, these results are objects of the HomogenNumericTable
class,
but you can define the result as an object of any class derived from NumericTable
except PackedTriangularMatrix
, PackedSymmetricMatrix
, and CSRNumericTable
.
Step 3  on Master Node (plusPlus
and parallelPlus
methods)¶
This step is applicable for plusPlus and parallelPlus methods only.
Centroid initialization for KMeans clustering accepts the input from each local node described below.
Pass the Input ID
as a parameter to the methods that provide input for your algorithm.
For more details, see Algorithms.
Input ID 
Input 


A keyvalue data collection that maps parts of the accumulated error to the local nodes: \(i\)th element of this collection is a numeric table that contains overall error accumulated on the \(i\)th node. 
In this step, centroid initialization for KMeans clustering calculates the results described below.
Pass the Result ID
as a parameter to the methods that access the results of your algorithm.
For more details, see Algorithms.
Result ID 
Result 


A keyvalue data collection that maps the input from Step 4 to local nodes: \(i\)th element of this collection is a numeric table that contains the input from Step 4 on the ith node. Note that Step 3 may produce no input for Step 4 on some local nodes, which means the collection may not contain the \(i\)th node entry. The single element of this numeric table \(v \leq \Phi_X(C)\), where the overall error \(\Phi_X(C)\) calculated on the node. For a description of the overall error, see KMeans Clustering Details. This value defines the probability to sample a new centroid on the \(i\)th node. 

Applicable for parallelPlus methods only. Pointer to the service data to be used in Step 5. 
Step 4  on Local Nodes (plusPlus
and parallelPlus
methods)¶
This step is applicable for plusPlus and parallelPlus methods only.
Centroid initialization for KMeans clustering accepts the input from each local node described below.
Pass the Input ID
as a parameter to the methods that provide input for your algorithm.
For more details, see Algorithms.
Input ID 
Input 


Pointer to the \(n_i \times p\) numeric table that represents the \(i\)th data block on the local node. Note While the input for 

Pointer to the \(l \times m\) numeric table with the values calculated in Step 3. The value of \(m\) is defined by the method of the algorithm:
This input can be an object of any class derived from 

Pointer to the 
In this step, centroid initialization for KMeans clustering calculates the results described below.
Pass the Result ID
as a parameter to the methods that access the results of your algorithm.
For more details, see Algorithms.
Result ID 
Result 


Pointer to the \(m \times p\) numeric table that contains centroids computed on this local node,
where \(m\) equals to the one in Note By default, this result is an object of the 
Step 5  on Master Node (parallelPlus
methods)¶
This step is applicable for parallelPlus methods only.
Centroid initialization for KMeans clustering accepts the input from each local node described below.
Pass the Input ID
as a parameter to the methods that provide input for your algorithm.
For more details, see Algorithms.
Input ID 
Input 

inputCentroids 
A data collection with the centroids calculated in Step 1 or Step 4. Each item in the collection is the pointer to \(m \times p\) numeric table, where the value of \(m\) is defined by the method and the iteration of the algorithm:
Each numeric table can be an object of any class derived from 

A data collection with the items calculated in Step 2 on local nodes.
For a detailed definition, see 

Pointer to the service data generated as the output of Step 3 on master node.
For a detailed definition, see 
In this step, centroid initialization for KMeans clustering calculates the results described below.
Pass the Result ID
as a parameter to the methods that access the results of your algorithm.
For more details, see Algorithms.
Result ID 
Result 


Pointer to the \(\mathrm{nClusters} \times p\) numeric table with centroids. Note By default, this result is an object of the 