Table Of Contents

KMeansModel train


train(self, frame, observation_columns, column_scalings, k=None, max_iterations=None, epsilon=None, initialization_mode=None)

[BETA] Creates k-means model from trained frame.

Parameters:

frame : <bound method AtkEntityType.__name__ of <trustedanalytics.rest.jsonschema.AtkEntityType object at 0x7f9e686f3fd0>>

A frame to train the model on.

observation_columns : list

Columns containing the observations.

column_scalings : list

Column scalings for each of the observation columns. The scaling value is multiplied by the corresponding value in the observation column.

k : int32 (default=None)

Desired number of clusters. Default is 2.

max_iterations : int32 (default=None)

Number of iterations for which the algorithm should run. Default is 20.

epsilon : float64 (default=None)

Distance threshold within which we consider k-means to have converged. Default is 1e-4.

initialization_mode : unicode (default=None)

The initialization technique for the algorithm. It could be either “random” or “k-means||”. Default is “k-means||”.

Returns:

: dict

The data returned is composed of multiple components:

dict : cluster_size
Cluster size.
int : ClusterId
Number of elements in the cluster ‘ClusterId’.
double : within_set_sum_of_squared_error
Sum of squared error for the model.

Upon training the ‘k’ cluster centers are computed.

Examples

>>> my_model = ta.KMeansModel(name='MyKMeansModel')
>>> my_model.train(train_frame, ['name_of_observation_column1', 'name_of_observation_column2'],[1.0,2.0] 3, 10, 0.0002, "random")