Models GmmModel


class GmmModel

Entity GmmModel

Attributes

last_read_date Read-only property - Last time this model’s data was accessed.
name Set or get the name of the model object.
status Read-only property - Current model life cycle status.

Methods

__init__(self[, name, _info]) Create a ‘new’ instance of a gmm model.
predict(self, frame[, observation_columns]) Predict the cluster assignments for the data points.
train(self, frame, observation_columns, column_scalings[, k, max_iterations, ...]) Creates a GMM Model from the train frame.
__init__(self, name=None)

Create a ‘new’ instance of a gmm model.

Parameters:

name : unicode (default=None)

User supplied name.

Returns:

: Model

A Gaussian Mixture Model [R7] represents a distribution where the observations are drawn from one of the k Gaussian sub-distributions, each with its own probability. Each observation can belong to only one cluster, the cluster representing the distribution with highest probability for that observation.

The gmm model is initialized, trained on columns of a frame, and used to predict cluster assignments for a frame. This model runs the MLLib implementation of gmm [R8] with enhanced feature of computing the number of elements in each cluster during training. During predict, it computes the cluster assignment of the observations given in the frame.

footnotes

[R7]https://en.wikipedia.org/wiki/Mixture_model#Multivariate_Gaussian_mixture_model
[R8]https://spark.apache.org/docs/1.5.0/mllib-clustering.html#gaussian-mixture

Examples

Consider the following model trained and tested on the sample data set in frame ‘frame’.

Consider the following frame containing two columns.

>>> frame.inspect()
[#]  data  name
===============
[0]   2.0  ab
[1]   1.0  cd
[2]   7.0  ef
[3]   1.0  gh
[4]   9.0  ij
[5]   2.0  kl
[6]   0.0  mn
[7]   6.0  op
[8]   5.0  qr
>>> model = ta.GmmModel()
[===Job Progress===]
>>> train_output = model.train(frame, ["data"], [1.0], 4)
[===Job Progress===]
>>> train_output
{u'cluster_size': {u'Cluster:0': 4, u'Cluster:1': 5},
 u'gaussians': [[u'mu:[6.79969916638852]',
   u'sigma:List(List(2.2623755196701305))'],
  [u'mu:[1.1984454608177824]', u'sigma:List(List(0.5599200477022921))'],
  [u'mu:[6.6173304476544335]', u'sigma:List(List(2.1848346923369246))']],
 u'weights': [0.2929610525524124, 0.554374326098111, 0.15266462134947675]}
>>> predicted_frame = model.predict(frame, ["data"])
[===Job Progress===]
>>> predicted_frame.inspect()
[#]  data  name  predicted_cluster
==================================
[0]   9.0  ij                    0
[1]   2.0  ab                    1
[2]   1.0  cd                    1
[3]   0.0  mn                    1
[4]   1.0  gh                    1
[5]   6.0  op                    0
[6]   5.0  qr                    0
[7]   2.0  kl                    1
[8]   7.0  ef                    0