Models DaalKMeansModel¶

class DaalKMeansModel¶

Entity DaalKMeansModel

Attributes

last_read_date	Read-only property - Last time this model’s data was accessed.
name	Set or get the name of the model object.
status	Read-only property - Current model life cycle status.

Methods

__init__(self[, name, _info])	[BETA] Create a ‘new’ instance of a DAAL k-means model.
predict(self, frame[, observation_columns, label_column])	[BETA] Predict the cluster assignments for the data points.
publish(self)	[BETA] Creates a tar file that will be used as input to the scoring engine
train(self, frame, observation_columns[, column_scalings, k, max_iterations, ...])	[ALPHA] Creates DAAL KMeans Model from train frame.

__init__(self, name=None)¶

[BETA] Create a ‘new’ instance of a DAAL k-means model.

Parameters:

Parameters:	name : unicode (default=None) User supplied name.
Returns:	: Model A new instance of DaalKMeansModel

name : unicode (default=None)

User supplied name.

Returns:

: Model

A new instance of DaalKMeansModel

k-means [R1] is an unsupervised algorithm used to partition the data into ‘k’ clusters. Each observation can belong to only one cluster, the cluster with the nearest mean. The k-means model is initialized, trained on columns of a frame, and used to predict cluster assignments for a frame.

This model runs the DAAL implementation of k-means[R2]_. The K-Means clustering algorithm computes centroids using the Lloyd method[R3]_

footnotes

[R1]	https://en.wikipedia.org/wiki/K-means_clustering

[R2]	https://software.intel.com/en-us/daal

[R3]	https://en.wikipedia.org/wiki/Lloyd%27s_algorithm

Examples

Consider the following model trained and tested on the sample data set in frame ‘frame’.

Consider the following frame containing two columns.

>>> frame.inspect()
[#]  data   name
===================
[0]    2.0  ab
[1]    1.0  cd
[2]    7.0  ef
[3]    1.0  gh
[4]    9.0  ij
[5]    2.0  kl
[6]    0.0  mn
[7]    6.0  op
[8]    5.0  qr
[9]  120.0  outlier

>>> model = ta.DaalKMeansModel()
[===Job Progress===]
>>> train_output = model.train(frame, ["data"],  k=2, max_iterations = 20)
[===Job Progress===]
>>> train_output
{u'centroids': {u'Cluster:0': [120.0], u'Cluster:1': [3.6666666666666665]},
 u'cluster_size': {u'Cluster:0': 1, u'Cluster:1': 9}}
>>> predicted_frame = model.predict(frame, ["data"])
[===Job Progress===]
>>> predicted_frame.inspect()
[#]  data   name     distance_from_cluster_0  distance_from_cluster_1  predicted_cluster
========================================================================================
[0]    2.0  ab                       13924.0            2.77777777778        1
[1]    1.0  cd                       14161.0            7.11111111111        1
[2]    7.0  ef                       12769.0            11.1111111111        1
[3]    1.0  gh                       14161.0            7.11111111111        1
[4]    9.0  ij                       12321.0            28.4444444444        1
[5]    2.0  kl                       13924.0            2.77777777778        1
[6]    0.0  mn                       14400.0            13.4444444444        1
[7]    6.0  op                       12996.0            5.44444444444        1
[8]    5.0  qr                       13225.0            1.77777777778        1
[9]  120.0  outlier                      0.0            13533.4444444        0

Quick search

Table Of Contents

Models DaalKMeansModel¶