Table Of Contents

EdgeFrame correlation_matrix


correlation_matrix(self, data_column_names, matrix_name=None)

Calculate correlation matrix for two or more columns.

Parameters:

data_column_names : list

The names of the columns from which to compute the matrix.

matrix_name : unicode (default=None)

The name for the returned matrix Frame.

Returns:

: <bound method AtkEntityType.__name__ of <trustedanalytics.rest.jsonschema.AtkEntityType object at 0x7f9e686f3fd0>>

A Frame with the matrix of the correlation values for the columns.

This method applies only to columns containing numerical data.

Examples

Consider Frame my_frame, which contains the data

 >>> my_frame.inspect()

  idnum:int32   x1:float32   x2:float32   x3:float32   x4:float32
/-------------------------------------------------------------------/
            0          1.0          4.0          0.0         -1.0
            1          2.0          3.0          0.0         -1.0
            2          3.0          2.0          1.0         -1.0
            3          4.0          1.0          2.0         -1.0
            4          5.0          0.0          2.0         -1.0

my_frame.correlation_matrix computes the common correlation coefficient (Pearson’s) on each pair of columns in the user-provided list. In this example, the idnum and most of the columns have trivial correlations: -1, 0, or +1. Column x3 provides a contrasting coefficient of 3 / sqrt(3) = 0.948683298051 . The resulting table (specifying all columns) is

>>> corr_matrix = my_frame.correlation_matrix(my_frame.column_names)
>>> corr_matrix.inspect()