sparktk.frame.ops.binary_classification_metrics module

Show source ≡

# vim: set encoding=utf-8

#  Copyright (c) 2016 Intel Corporation 
#
#  Licensed under the Apache License, Version 2.0 (the "License");
#  you may not use this file except in compliance with the License.
#  You may obtain a copy of the License at
#
#       http://www.apache.org/licenses/LICENSE-2.0
#
#  Unless required by applicable law or agreed to in writing, software
#  distributed under the License is distributed on an "AS IS" BASIS,
#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#  See the License for the specific language governing permissions and
#  limitations under the License.
#

from classification_metrics_value import ClassificationMetricsValue

def binary_classification_metrics(self, label_column, pred_column, pos_label, beta=1.0, frequency_column=None):
    """
    Statistics of accuracy, precision, and others for a binary classification model.

    Parameters
    ----------

    :param label_column: (str) The name of the column containing the correct label for each instance.
    :param pred_column: (str) The name of the column containing the predicted label for each instance.
    :param pos_label: (Any) The value to be interpreted as a positive instance for binary classification.
    :param beta: (Optional[float]) This is the beta value to use for :math:`F_{ \beta}` measure (default F1 measure is computed);
    must be greater than zero. Defaults is 1.
    :param frequency_column: (Optional[str]) The name of an optional column containing the frequency of observations.
    :return: (ClassificationMetricsValue) The data returned is composed of multiple components:

        <object>.accuracy : double

        <object>.confusion_matrix : table

        <object>.f_measure : double

        <object>.precision : double

        <object>.recall : double



    Calculate the accuracy, precision, confusion_matrix, recall and :math:`F_{ \beta}` measure for a
        classification model.

    *   The **f_measure** result is the :math:`F_{ \beta}` measure for a
        classification model.
        The :math:`F_{ \beta}` measure of a binary classification model is the
        harmonic mean of precision and recall.
        If we let:

        * beta :math:`\equiv \beta`,
        * :math:`T_{P}` denotes the number of true positives,
        * :math:`F_{P}` denotes the number of false positives, and
        * :math:`F_{N}` denotes the number of false negatives

        then:

        .. math::

        F_{ \beta} = (1 + \beta ^ 2) * \frac{ \frac{T_{P}}{T_{P} + F_{P}} * \
                \frac{T_{P}}{T_{P} + F_{N}}}{ \beta ^ 2 * \frac{T_{P}}{T_{P} + \
                                                                         F_{P}}  + \frac{T_{P}}{T_{P} + F_{N}}}

        The :math:`F_{ \beta}` measure for a multi-class classification model is
        computed as the weighted average of the :math:`F_{ \beta}` measure for
            each label, where the weight is the number of instances of each label.
        The determination of binary vs. multi-class is automatically inferred
        from the data.

        *   The **recall** result of a binary classification model is the proportion
        of positive instances that are correctly identified.
        If we let :math:`T_{P}` denote the number of true positives and
        :math:`F_{N}` denote the number of false negatives, then the model
        recall is given by :math:`\frac {T_{P}} {T_{P} + F_{N}}`.

        *   The **precision** of a binary classification model is the proportion of
        predicted positive instances that are correctly identified.
        If we let :math:`T_{P}` denote the number of true positives and
        :math:`F_{P}` denote the number of false positives, then the model
        precision is given by: :math:`\frac {T_{P}} {T_{P} + F_{P}}`.

        *   The **accuracy** of a classification model is the proportion of
        predictions that are correctly identified.
        If we let :math:`T_{P}` denote the number of true positives,
        :math:`T_{N}` denote the number of true negatives, and :math:`K` denote
        the total number of classified instances, then the model accuracy is
        given by: :math:`\frac{T_{P} + T_{N}}{K}`.

    *   The **confusion_matrix** result is a confusion matrix for a
        binary classifier model, formatted for human readability.

    Examples
    --------

    Consider Frame *my_frame*, which contains the data

        >>> my_frame.inspect()
        [#]  a      b  labels  predictions
        ==================================
        [0]  red    1       0            0
        [1]  blue   3       1            0
        [2]  green  1       0            0
        [3]  green  0       1            1

        >>> cm = my_frame.binary_classification_metrics('labels', 'predictions', 1, 1)
        [===Job Progress===]

        >>> cm.f_measure
        0.6666666666666666

        >>> cm.recall
        0.5

        >>> cm.accuracy
        0.75

        >>> cm.precision
        1.0

        >>> cm.confusion_matrix
                    Predicted_Pos  Predicted_Neg
        Actual_Pos              1              1
        Actual_Neg              0              2

    """
    return ClassificationMetricsValue(self._tc, self._scala.binaryClassificationMetrics(label_column,
                                      pred_column,
                                      pos_label,
                                      float(beta),
                                      self._tc.jutils.convert.to_scala_option(frequency_column)))

Functions

def binary_classification_metrics(

self, label_column, pred_column, pos_label, beta=1.0, frequency_column=None)

Statistics of accuracy, precision, and others for a binary classification model.

Parameters:

label_column

(str):

The name of the column containing the correct label for each instance.

pred_column

(str):

The name of the column containing the predicted label for each instance.

pos_label

(Any):

The value to be interpreted as a positive instance for binary classification.

beta

(Optional[float]):

This is the beta value to use for :math:`F_{ eta}` measure (default F1 measure is computed); must be greater than zero. Defaults is 1.

frequency_column

(Optional[str]):

The name of an optional column containing the frequency of observations.

Returns

(ClassificationMetricsValue):

The data returned is composed of multiple components:
<object>.accuracy : double
<object>.confusion_matrix : table
<object>.f_measure : double
<object>.precision : double
<object>.recall : double

Calculate the accuracy, precision, confusion_matrix, recall and :math:F_{ eta} measure for a classification model.

The f_measure result is the :math:F_{ eta} measure for a classification model. The :math:F_{ eta} measure of a binary classification model is the harmonic mean of precision and recall. If we let:
- beta :math:\equiv eta,
- :math:T_{P} denotes the number of true positives,
- :math:F_{P} denotes the number of false positives, and
- :math:F_{N} denotes the number of false negatives
then:

.. math::

F_{ eta} = (1 + eta ^ 2) * rac{ rac{T_{P}}{T_{P} + F_{P}} * rac{T_{P}}{T_{P} + F_{N}}}{ eta ^ 2 * rac{T_{P}}{T_{P} + F_{P}} + rac{T_{P}}{T_{P} + F_{N}}}

The :math:F_{ eta} measure for a multi-class classification model is computed as the weighted average of the :math:F_{ eta} measure for each label, where the weight is the number of instances of each label. The determination of binary vs. multi-class is automatically inferred from the data.
- The recall result of a binary classification model is the proportion of positive instances that are correctly identified. If we let :math:T_{P} denote the number of true positives and :math:F_{N} denote the number of false negatives, then the model recall is given by :math:rac {T_{P}} {T_{P} + F_{N}}.
- The precision of a binary classification model is the proportion of predicted positive instances that are correctly identified. If we let :math:T_{P} denote the number of true positives and :math:F_{P} denote the number of false positives, then the model precision is given by: :math:rac {T_{P}} {T_{P} + F_{P}}.
- The accuracy of a classification model is the proportion of predictions that are correctly identified. If we let :math:T_{P} denote the number of true positives, :math:T_{N} denote the number of true negatives, and :math:K denote the total number of classified instances, then the model accuracy is given by: :math:rac{T_{P} + T_{N}}{K}.
The confusion_matrix result is a confusion matrix for a binary classifier model, formatted for human readability.

Examples:

Consider Frame my_frame, which contains the data

>>> my_frame.inspect()
[#]  a      b  labels  predictions
==================================
[0]  red    1       0            0
[1]  blue   3       1            0
[2]  green  1       0            0
[3]  green  0       1            1

>>> cm = my_frame.binary_classification_metrics('labels', 'predictions', 1, 1)
[===Job Progress===]

>>> cm.f_measure
0.6666666666666666

>>> cm.recall
0.5

>>> cm.accuracy
0.75

>>> cm.precision
1.0

>>> cm.confusion_matrix
            Predicted_Pos  Predicted_Neg
Actual_Pos              1              1
Actual_Neg              0              2

Show source ≡

def binary_classification_metrics(self, label_column, pred_column, pos_label, beta=1.0, frequency_column=None):
    """
    Statistics of accuracy, precision, and others for a binary classification model.

    Parameters
    ----------

    :param label_column: (str) The name of the column containing the correct label for each instance.
    :param pred_column: (str) The name of the column containing the predicted label for each instance.
    :param pos_label: (Any) The value to be interpreted as a positive instance for binary classification.
    :param beta: (Optional[float]) This is the beta value to use for :math:`F_{ \beta}` measure (default F1 measure is computed);
    must be greater than zero. Defaults is 1.
    :param frequency_column: (Optional[str]) The name of an optional column containing the frequency of observations.
    :return: (ClassificationMetricsValue) The data returned is composed of multiple components:

        <object>.accuracy : double

        <object>.confusion_matrix : table

        <object>.f_measure : double

        <object>.precision : double

        <object>.recall : double



    Calculate the accuracy, precision, confusion_matrix, recall and :math:`F_{ \beta}` measure for a
        classification model.

    *   The **f_measure** result is the :math:`F_{ \beta}` measure for a
        classification model.
        The :math:`F_{ \beta}` measure of a binary classification model is the
        harmonic mean of precision and recall.
        If we let:

        * beta :math:`\equiv \beta`,
        * :math:`T_{P}` denotes the number of true positives,
        * :math:`F_{P}` denotes the number of false positives, and
        * :math:`F_{N}` denotes the number of false negatives

        then:

        .. math::

        F_{ \beta} = (1 + \beta ^ 2) * \frac{ \frac{T_{P}}{T_{P} + F_{P}} * \
                \frac{T_{P}}{T_{P} + F_{N}}}{ \beta ^ 2 * \frac{T_{P}}{T_{P} + \
                                                                         F_{P}}  + \frac{T_{P}}{T_{P} + F_{N}}}

        The :math:`F_{ \beta}` measure for a multi-class classification model is
        computed as the weighted average of the :math:`F_{ \beta}` measure for
            each label, where the weight is the number of instances of each label.
        The determination of binary vs. multi-class is automatically inferred
        from the data.

        *   The **recall** result of a binary classification model is the proportion
        of positive instances that are correctly identified.
        If we let :math:`T_{P}` denote the number of true positives and
        :math:`F_{N}` denote the number of false negatives, then the model
        recall is given by :math:`\frac {T_{P}} {T_{P} + F_{N}}`.

        *   The **precision** of a binary classification model is the proportion of
        predicted positive instances that are correctly identified.
        If we let :math:`T_{P}` denote the number of true positives and
        :math:`F_{P}` denote the number of false positives, then the model
        precision is given by: :math:`\frac {T_{P}} {T_{P} + F_{P}}`.

        *   The **accuracy** of a classification model is the proportion of
        predictions that are correctly identified.
        If we let :math:`T_{P}` denote the number of true positives,
        :math:`T_{N}` denote the number of true negatives, and :math:`K` denote
        the total number of classified instances, then the model accuracy is
        given by: :math:`\frac{T_{P} + T_{N}}{K}`.

    *   The **confusion_matrix** result is a confusion matrix for a
        binary classifier model, formatted for human readability.

    Examples
    --------

    Consider Frame *my_frame*, which contains the data

        >>> my_frame.inspect()
        [#]  a      b  labels  predictions
        ==================================
        [0]  red    1       0            0
        [1]  blue   3       1            0
        [2]  green  1       0            0
        [3]  green  0       1            1

        >>> cm = my_frame.binary_classification_metrics('labels', 'predictions', 1, 1)
        [===Job Progress===]

        >>> cm.f_measure
        0.6666666666666666

        >>> cm.recall
        0.5

        >>> cm.accuracy
        0.75

        >>> cm.precision
        1.0

        >>> cm.confusion_matrix
                    Predicted_Pos  Predicted_Neg
        Actual_Pos              1              1
        Actual_Neg              0              2

    """
    return ClassificationMetricsValue(self._tc, self._scala.binaryClassificationMetrics(label_column,
                                      pred_column,
                                      pos_label,
                                      float(beta),
                                      self._tc.jutils.convert.to_scala_option(frequency_column)))

Index

Functions

Functions