DaalKMeansModel __init__


__init__(self, name=None)

[BETA] Create a ‘new’ instance of a DAAL k-means model.

Parameters:

name : unicode (default=None)

User supplied name.

Returns:

: Model

A new instance of DaalKMeansModel

k-means [R4] is an unsupervised algorithm used to partition the data into ‘k’ clusters. Each observation can belong to only one cluster, the cluster with the nearest mean. The k-means model is initialized, trained on columns of a frame, and used to predict cluster assignments for a frame.

This model runs the DAAL implementation of k-means[R5]_. The K-Means clustering algorithm computes centroids using the Lloyd method[R6]_

footnotes

[R4]https://en.wikipedia.org/wiki/K-means_clustering
[R5]https://software.intel.com/en-us/daal
[R6]https://en.wikipedia.org/wiki/Lloyd%27s_algorithm

Examples

Consider the following model trained and tested on the sample data set in frame ‘frame’.

Consider the following frame containing two columns.

>>> frame.inspect()
[#]  data   name
===================
[0]    2.0  ab
[1]    1.0  cd
[2]    7.0  ef
[3]    1.0  gh
[4]    9.0  ij
[5]    2.0  kl
[6]    0.0  mn
[7]    6.0  op
[8]    5.0  qr
[9]  120.0  outlier
>>> model = ta.DaalKMeansModel()
[===Job Progress===]
>>> train_output = model.train(frame, ["data"],  k=2, max_iterations = 20)
[===Job Progress===]
>>> train_output
{u'centroids': {u'Cluster:0': [120.0], u'Cluster:1': [3.6666666666666665]},
 u'cluster_size': {u'Cluster:0': 1, u'Cluster:1': 9}}
>>> predicted_frame = model.predict(frame, ["data"])
[===Job Progress===]
>>> predicted_frame.inspect()
[#]  data   name     distance_from_cluster_0  distance_from_cluster_1  predicted_cluster
========================================================================================
[0]    2.0  ab                       13924.0            2.77777777778        1
[1]    1.0  cd                       14161.0            7.11111111111        1
[2]    7.0  ef                       12769.0            11.1111111111        1
[3]    1.0  gh                       14161.0            7.11111111111        1
[4]    9.0  ij                       12321.0            28.4444444444        1
[5]    2.0  kl                       13924.0            2.77777777778        1
[6]    0.0  mn                       14400.0            13.4444444444        1
[7]    6.0  op                       12996.0            5.44444444444        1
[8]    5.0  qr                       13225.0            1.77777777778        1
[9]  120.0  outlier                      0.0            13533.4444444        0