EdgeFrame correlation_matrix¶
-
correlation_matrix
(self, data_column_names, matrix_name=None)¶ Calculate correlation matrix for two or more columns.
Parameters: data_column_names : list
The names of the columns from which to compute the matrix.
matrix_name : unicode (default=None)
The name for the returned matrix Frame.
Returns: : <bound method AtkEntityType.__name__ of <trustedanalytics.rest.jsonschema.AtkEntityType object at 0x7f9e686f3fd0>>
A Frame with the matrix of the correlation values for the columns.
This method applies only to columns containing numerical data.
Examples
Consider Frame my_frame, which contains the data
>>> my_frame.inspect() idnum:int32 x1:float32 x2:float32 x3:float32 x4:float32 /-------------------------------------------------------------------/ 0 1.0 4.0 0.0 -1.0 1 2.0 3.0 0.0 -1.0 2 3.0 2.0 1.0 -1.0 3 4.0 1.0 2.0 -1.0 4 5.0 0.0 2.0 -1.0
my_frame.correlation_matrix computes the common correlation coefficient (Pearson’s) on each pair of columns in the user-provided list. In this example, the idnum and most of the columns have trivial correlations: -1, 0, or +1. Column x3 provides a contrasting coefficient of 3 / sqrt(3) = 0.948683298051 . The resulting table (specifying all columns) is
>>> corr_matrix = my_frame.correlation_matrix(my_frame.column_names) >>> corr_matrix.inspect()