Up

sparktk.frame.ops.binary_classification_metrics module

# vim: set encoding=utf-8

#  Copyright (c) 2016 Intel Corporation 
#
#  Licensed under the Apache License, Version 2.0 (the "License");
#  you may not use this file except in compliance with the License.
#  You may obtain a copy of the License at
#
#       http://www.apache.org/licenses/LICENSE-2.0
#
#  Unless required by applicable law or agreed to in writing, software
#  distributed under the License is distributed on an "AS IS" BASIS,
#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#  See the License for the specific language governing permissions and
#  limitations under the License.
#

from classification_metrics_value import ClassificationMetricsValue

def binary_classification_metrics(self, label_column, pred_column, pos_label, beta=1.0, frequency_column=None):
    """
    Statistics of accuracy, precision, and others for a binary classification model.

    Parameters
    ----------

    :param label_column: (str) The name of the column containing the correct label for each instance.
    :param pred_column: (str) The name of the column containing the predicted label for each instance.
    :param pos_label: (Any) The value to be interpreted as a positive instance for binary classification.
    :param beta: (Optional[float]) This is the beta value to use for :math:`F_{ \beta}` measure (default F1 measure is computed);
    must be greater than zero. Defaults is 1.
    :param frequency_column: (Optional[str]) The name of an optional column containing the frequency of observations.
    :return: (ClassificationMetricsValue) The data returned is composed of multiple components:
<object>.accuracy : double
<object>.confusion_matrix : table
<object>.f_measure : double
<object>.precision : double
<object>.recall : double
Calculate the accuracy, precision, confusion_matrix, recall and :math:`F_{ \beta}` measure for a classification model. * The **f_measure** result is the :math:`F_{ \beta}` measure for a classification model. The :math:`F_{ \beta}` measure of a binary classification model is the harmonic mean of precision and recall. If we let: * beta :math:`\equiv \beta`, * :math:`T_{P}` denotes the number of true positives, * :math:`F_{P}` denotes the number of false positives, and * :math:`F_{N}` denotes the number of false negatives then: .. math:: F_{ \beta} = (1 + \beta ^ 2) * \frac{ \frac{T_{P}}{T_{P} + F_{P}} * \ \frac{T_{P}}{T_{P} + F_{N}}}{ \beta ^ 2 * \frac{T_{P}}{T_{P} + \ F_{P}} + \frac{T_{P}}{T_{P} + F_{N}}} The :math:`F_{ \beta}` measure for a multi-class classification model is computed as the weighted average of the :math:`F_{ \beta}` measure for each label, where the weight is the number of instances of each label. The determination of binary vs. multi-class is automatically inferred from the data. * The **recall** result of a binary classification model is the proportion of positive instances that are correctly identified. If we let :math:`T_{P}` denote the number of true positives and :math:`F_{N}` denote the number of false negatives, then the model recall is given by :math:`\frac {T_{P}} {T_{P} + F_{N}}`. * The **precision** of a binary classification model is the proportion of predicted positive instances that are correctly identified. If we let :math:`T_{P}` denote the number of true positives and :math:`F_{P}` denote the number of false positives, then the model precision is given by: :math:`\frac {T_{P}} {T_{P} + F_{P}}`. * The **accuracy** of a classification model is the proportion of predictions that are correctly identified. If we let :math:`T_{P}` denote the number of true positives, :math:`T_{N}` denote the number of true negatives, and :math:`K` denote the total number of classified instances, then the model accuracy is given by: :math:`\frac{T_{P} + T_{N}}{K}`. * The **confusion_matrix** result is a confusion matrix for a binary classifier model, formatted for human readability. Examples -------- Consider Frame *my_frame*, which contains the data >>> my_frame.inspect() [#] a b labels predictions ================================== [0] red 1 0 0 [1] blue 3 1 0 [2] green 1 0 0 [3] green 0 1 1 >>> cm = my_frame.binary_classification_metrics('labels', 'predictions', 1, 1) [===Job Progress===] >>> cm.f_measure 0.6666666666666666 >>> cm.recall 0.5 >>> cm.accuracy 0.75 >>> cm.precision 1.0 >>> cm.confusion_matrix Predicted_Pos Predicted_Neg Actual_Pos 1 1 Actual_Neg 0 2 """ return ClassificationMetricsValue(self._tc, self._scala.binaryClassificationMetrics(label_column, pred_column, pos_label, float(beta), self._tc.jutils.convert.to_scala_option(frequency_column)))

Functions

def binary_classification_metrics(

self, label_column, pred_column, pos_label, beta=1.0, frequency_column=None)

Statistics of accuracy, precision, and others for a binary classification model.

Parameters:
label_column(str):The name of the column containing the correct label for each instance.
pred_column(str):The name of the column containing the predicted label for each instance.
pos_label(Any):The value to be interpreted as a positive instance for binary classification.
beta(Optional[float]):This is the beta value to use for :math:`F_{ eta}` measure (default F1 measure is computed); must be greater than zero. Defaults is 1.
frequency_column(Optional[str]):The name of an optional column containing the frequency of observations.

Returns(ClassificationMetricsValue): The data returned is composed of multiple components:
<object>.accuracy : double
<object>.confusion_matrix : table
<object>.f_measure : double
<object>.precision : double
<object>.recall : double

Calculate the accuracy, precision, confusion_matrix, recall and :math:F_{ eta} measure for a classification model.

  • The f_measure result is the :math:F_{ eta} measure for a classification model. The :math:F_{ eta} measure of a binary classification model is the harmonic mean of precision and recall. If we let:

    • beta :math:\equiv eta,
    • :math:T_{P} denotes the number of true positives,
    • :math:F_{P} denotes the number of false positives, and
    • :math:F_{N} denotes the number of false negatives

    then:

    .. math::

    F_{ eta} = (1 + eta ^ 2) * rac{ rac{T_{P}}{T_{P} + F_{P}} * rac{T_{P}}{T_{P} + F_{N}}}{ eta ^ 2 * rac{T_{P}}{T_{P} + F_{P}} + rac{T_{P}}{T_{P} + F_{N}}}

    The :math:F_{ eta} measure for a multi-class classification model is computed as the weighted average of the :math:F_{ eta} measure for each label, where the weight is the number of instances of each label. The determination of binary vs. multi-class is automatically inferred from the data.

    • The recall result of a binary classification model is the proportion of positive instances that are correctly identified. If we let :math:T_{P} denote the number of true positives and :math:F_{N} denote the number of false negatives, then the model recall is given by :math:rac {T_{P}} {T_{P} + F_{N}}.

    • The precision of a binary classification model is the proportion of predicted positive instances that are correctly identified. If we let :math:T_{P} denote the number of true positives and :math:F_{P} denote the number of false positives, then the model precision is given by: :math:rac {T_{P}} {T_{P} + F_{P}}.

    • The accuracy of a classification model is the proportion of predictions that are correctly identified. If we let :math:T_{P} denote the number of true positives, :math:T_{N} denote the number of true negatives, and :math:K denote the total number of classified instances, then the model accuracy is given by: :math:rac{T_{P} + T_{N}}{K}.

  • The confusion_matrix result is a confusion matrix for a binary classifier model, formatted for human readability.

Examples:

Consider Frame my_frame, which contains the data

>>> my_frame.inspect()
[#]  a      b  labels  predictions
==================================
[0]  red    1       0            0
[1]  blue   3       1            0
[2]  green  1       0            0
[3]  green  0       1            1

>>> cm = my_frame.binary_classification_metrics('labels', 'predictions', 1, 1)
[===Job Progress===]

>>> cm.f_measure
0.6666666666666666

>>> cm.recall
0.5

>>> cm.accuracy
0.75

>>> cm.precision
1.0

>>> cm.confusion_matrix
            Predicted_Pos  Predicted_Neg
Actual_Pos              1              1
Actual_Neg              0              2
def binary_classification_metrics(self, label_column, pred_column, pos_label, beta=1.0, frequency_column=None):
    """
    Statistics of accuracy, precision, and others for a binary classification model.

    Parameters
    ----------

    :param label_column: (str) The name of the column containing the correct label for each instance.
    :param pred_column: (str) The name of the column containing the predicted label for each instance.
    :param pos_label: (Any) The value to be interpreted as a positive instance for binary classification.
    :param beta: (Optional[float]) This is the beta value to use for :math:`F_{ \beta}` measure (default F1 measure is computed);
    must be greater than zero. Defaults is 1.
    :param frequency_column: (Optional[str]) The name of an optional column containing the frequency of observations.
    :return: (ClassificationMetricsValue) The data returned is composed of multiple components:
<object>.accuracy : double
<object>.confusion_matrix : table
<object>.f_measure : double
<object>.precision : double
<object>.recall : double
Calculate the accuracy, precision, confusion_matrix, recall and :math:`F_{ \beta}` measure for a classification model. * The **f_measure** result is the :math:`F_{ \beta}` measure for a classification model. The :math:`F_{ \beta}` measure of a binary classification model is the harmonic mean of precision and recall. If we let: * beta :math:`\equiv \beta`, * :math:`T_{P}` denotes the number of true positives, * :math:`F_{P}` denotes the number of false positives, and * :math:`F_{N}` denotes the number of false negatives then: .. math:: F_{ \beta} = (1 + \beta ^ 2) * \frac{ \frac{T_{P}}{T_{P} + F_{P}} * \ \frac{T_{P}}{T_{P} + F_{N}}}{ \beta ^ 2 * \frac{T_{P}}{T_{P} + \ F_{P}} + \frac{T_{P}}{T_{P} + F_{N}}} The :math:`F_{ \beta}` measure for a multi-class classification model is computed as the weighted average of the :math:`F_{ \beta}` measure for each label, where the weight is the number of instances of each label. The determination of binary vs. multi-class is automatically inferred from the data. * The **recall** result of a binary classification model is the proportion of positive instances that are correctly identified. If we let :math:`T_{P}` denote the number of true positives and :math:`F_{N}` denote the number of false negatives, then the model recall is given by :math:`\frac {T_{P}} {T_{P} + F_{N}}`. * The **precision** of a binary classification model is the proportion of predicted positive instances that are correctly identified. If we let :math:`T_{P}` denote the number of true positives and :math:`F_{P}` denote the number of false positives, then the model precision is given by: :math:`\frac {T_{P}} {T_{P} + F_{P}}`. * The **accuracy** of a classification model is the proportion of predictions that are correctly identified. If we let :math:`T_{P}` denote the number of true positives, :math:`T_{N}` denote the number of true negatives, and :math:`K` denote the total number of classified instances, then the model accuracy is given by: :math:`\frac{T_{P} + T_{N}}{K}`. * The **confusion_matrix** result is a confusion matrix for a binary classifier model, formatted for human readability. Examples -------- Consider Frame *my_frame*, which contains the data >>> my_frame.inspect() [#] a b labels predictions ================================== [0] red 1 0 0 [1] blue 3 1 0 [2] green 1 0 0 [3] green 0 1 1 >>> cm = my_frame.binary_classification_metrics('labels', 'predictions', 1, 1) [===Job Progress===] >>> cm.f_measure 0.6666666666666666 >>> cm.recall 0.5 >>> cm.accuracy 0.75 >>> cm.precision 1.0 >>> cm.confusion_matrix Predicted_Pos Predicted_Neg Actual_Pos 1 1 Actual_Neg 0 2 """ return ClassificationMetricsValue(self._tc, self._scala.binaryClassificationMetrics(label_column, pred_column, pos_label, float(beta), self._tc.jutils.convert.to_scala_option(frequency_column)))