Models DaalKMeansModel


class DaalKMeansModel

Entity DaalKMeansModel

Attributes

last_read_date Read-only property - Last time this model’s data was accessed.
name Set or get the name of the model object.
status Read-only property - Current model life cycle status.

Methods

__init__(self[, name, _info]) [BETA] Create a ‘new’ instance of a DAAL k-means model.
predict(self, frame[, observation_columns, label_column]) [BETA] Predict the cluster assignments for the data points.
publish(self) [BETA] Creates a tar file that will be used as input to the scoring engine
train(self, frame, observation_columns[, column_scalings, k, max_iterations, ...]) [ALPHA] Creates DAAL KMeans Model from train frame.
__init__(self, name=None)

[BETA] Create a ‘new’ instance of a DAAL k-means model.

Parameters:

name : unicode (default=None)

User supplied name.

Returns:

: Model

A new instance of DaalKMeansModel

k-means [R1] is an unsupervised algorithm used to partition the data into ‘k’ clusters. Each observation can belong to only one cluster, the cluster with the nearest mean. The k-means model is initialized, trained on columns of a frame, and used to predict cluster assignments for a frame.

This model runs the DAAL implementation of k-means[R2]_. The K-Means clustering algorithm computes centroids using the Lloyd method[R3]_

footnotes

[R1]https://en.wikipedia.org/wiki/K-means_clustering
[R2]https://software.intel.com/en-us/daal
[R3]https://en.wikipedia.org/wiki/Lloyd%27s_algorithm

Examples

Consider the following model trained and tested on the sample data set in frame ‘frame’.

Consider the following frame containing two columns.

>>> frame.inspect()
[#]  data   name
===================
[0]    2.0  ab
[1]    1.0  cd
[2]    7.0  ef
[3]    1.0  gh
[4]    9.0  ij
[5]    2.0  kl
[6]    0.0  mn
[7]    6.0  op
[8]    5.0  qr
[9]  120.0  outlier
>>> model = ta.DaalKMeansModel()
[===Job Progress===]
>>> train_output = model.train(frame, ["data"],  k=2, max_iterations = 20)
[===Job Progress===]
>>> train_output
{u'centroids': {u'Cluster:0': [120.0], u'Cluster:1': [3.6666666666666665]},
 u'cluster_size': {u'Cluster:0': 1, u'Cluster:1': 9}}
>>> predicted_frame = model.predict(frame, ["data"])
[===Job Progress===]
>>> predicted_frame.inspect()
[#]  data   name     distance_from_cluster_0  distance_from_cluster_1  predicted_cluster
========================================================================================
[0]    2.0  ab                       13924.0            2.77777777778        1
[1]    1.0  cd                       14161.0            7.11111111111        1
[2]    7.0  ef                       12769.0            11.1111111111        1
[3]    1.0  gh                       14161.0            7.11111111111        1
[4]    9.0  ij                       12321.0            28.4444444444        1
[5]    2.0  kl                       13924.0            2.77777777778        1
[6]    0.0  mn                       14400.0            13.4444444444        1
[7]    6.0  op                       12996.0            5.44444444444        1
[8]    5.0  qr                       13225.0            1.77777777778        1
[9]  120.0  outlier                      0.0            13533.4444444        0