Models DaalKMeansModel¶
-
class
DaalKMeansModel
¶ Entity DaalKMeansModel
Attributes
last_read_date Read-only property - Last time this model’s data was accessed. name Set or get the name of the model object. status Read-only property - Current model life cycle status. Methods
__init__(self[, name, _info]) [BETA] Create a ‘new’ instance of a DAAL k-means model. predict(self, frame[, observation_columns, label_column]) [BETA] Predict the cluster assignments for the data points. publish(self) [BETA] Creates a tar file that will be used as input to the scoring engine train(self, frame, observation_columns[, column_scalings, k, max_iterations, ...]) [ALPHA] Creates DAAL KMeans Model from train frame.
-
__init__
(self, name=None)¶ [BETA] Create a ‘new’ instance of a DAAL k-means model.
Parameters: name : unicode (default=None)
User supplied name.
Returns: : Model
A new instance of DaalKMeansModel
k-means [R1] is an unsupervised algorithm used to partition the data into ‘k’ clusters. Each observation can belong to only one cluster, the cluster with the nearest mean. The k-means model is initialized, trained on columns of a frame, and used to predict cluster assignments for a frame.
This model runs the DAAL implementation of k-means[R2]_. The K-Means clustering algorithm computes centroids using the Lloyd method[R3]_
footnotes
[R1] https://en.wikipedia.org/wiki/K-means_clustering [R2] https://software.intel.com/en-us/daal [R3] https://en.wikipedia.org/wiki/Lloyd%27s_algorithm Examples
Consider the following model trained and tested on the sample data set in frame ‘frame’.
Consider the following frame containing two columns.
>>> frame.inspect() [#] data name =================== [0] 2.0 ab [1] 1.0 cd [2] 7.0 ef [3] 1.0 gh [4] 9.0 ij [5] 2.0 kl [6] 0.0 mn [7] 6.0 op [8] 5.0 qr [9] 120.0 outlier
>>> model = ta.DaalKMeansModel() [===Job Progress===] >>> train_output = model.train(frame, ["data"], k=2, max_iterations = 20) [===Job Progress===] >>> train_output {u'centroids': {u'Cluster:0': [120.0], u'Cluster:1': [3.6666666666666665]}, u'cluster_size': {u'Cluster:0': 1, u'Cluster:1': 9}} >>> predicted_frame = model.predict(frame, ["data"]) [===Job Progress===] >>> predicted_frame.inspect() [#] data name distance_from_cluster_0 distance_from_cluster_1 predicted_cluster ======================================================================================== [0] 2.0 ab 13924.0 2.77777777778 1 [1] 1.0 cd 14161.0 7.11111111111 1 [2] 7.0 ef 12769.0 11.1111111111 1 [3] 1.0 gh 14161.0 7.11111111111 1 [4] 9.0 ij 12321.0 28.4444444444 1 [5] 2.0 kl 13924.0 2.77777777778 1 [6] 0.0 mn 14400.0 13.4444444444 1 [7] 6.0 op 12996.0 5.44444444444 1 [8] 5.0 qr 13225.0 1.77777777778 1 [9] 120.0 outlier 0.0 13533.4444444 0