KMeansModel train¶

train(self, frame, observation_columns, column_scalings, k=None, max_iterations=None, epsilon=None, initialization_mode=None)¶

[BETA] Creates k-means model from trained frame.

Parameters:

Parameters:	frame : <bound method AtkEntityType.__name__ of <trustedanalytics.rest.jsonschema.AtkEntityType object at 0x7f9e686f3fd0>> A frame to train the model on. observation_columns : list Columns containing the observations. column_scalings : list Column scalings for each of the observation columns. The scaling value is multiplied by the corresponding value in the observation column. k : int32 (default=None) Desired number of clusters. Default is 2. max_iterations : int32 (default=None) Number of iterations for which the algorithm should run. Default is 20. epsilon : float64 (default=None) Distance threshold within which we consider k-means to have converged. Default is 1e-4. initialization_mode : unicode (default=None) The initialization technique for the algorithm. It could be either “random” or “k-means\|\|”. Default is “k-means\|\|”.
Returns:	: dict The data returned is composed of multiple components: dict : cluster_size Cluster size. int : ClusterId Number of elements in the cluster ‘ClusterId’. double : within_set_sum_of_squared_error Sum of squared error for the model.

frame : <bound method AtkEntityType.__name__ of <trustedanalytics.rest.jsonschema.AtkEntityType object at 0x7f9e686f3fd0>>

A frame to train the model on.

observation_columns : list

Columns containing the observations.

column_scalings : list

Column scalings for each of the observation columns. The scaling value is multiplied by the corresponding value in the observation column.

k : int32 (default=None)

Desired number of clusters. Default is 2.

max_iterations : int32 (default=None)

Number of iterations for which the algorithm should run. Default is 20.

epsilon : float64 (default=None)

Distance threshold within which we consider k-means to have converged. Default is 1e-4.

initialization_mode : unicode (default=None)

The initialization technique for the algorithm. It could be either “random” or “k-means||”. Default is “k-means||”.

Returns:

: dict

The data returned is composed of multiple components:

dict : cluster_size

Cluster size.

int : ClusterId

Number of elements in the cluster ‘ClusterId’.

double : within_set_sum_of_squared_error

Sum of squared error for the model.

Upon training the ‘k’ cluster centers are computed.

Examples

>>> my_model = ta.KMeansModel(name='MyKMeansModel')
>>> my_model.train(train_frame, ['name_of_observation_column1', 'name_of_observation_column2'],[1.0,2.0] 3, 10, 0.0002, "random")

Quick search

Table Of Contents

KMeansModel train¶