Table Of Contents

VertexFrame entropy


entropy(self, data_column, weights_column=None)

Calculate the Shannon entropy of a column.

Parameters:

data_column : unicode

The column whose entropy is to be calculated.

weights_column : unicode (default=None)

The column that provides weights (frequencies) for the entropy calculation. Must contain numerical data. Default is using uniform weights of 1 for all items.

Returns:

: dict

Entropy.

The data column is weighted via the weights column. All data elements of weight <= 0 are excluded from the calculation, as are all data elements whose weight is NaN or infinite. If there are no data elements with a finite weight greater than 0, the entropy is zero.

Examples

Given a frame of coin flips, half heads and half tails, the entropy is simply ln(2): .. code:

>>> print frame.inspect()

          data:unicode
        /----------------/
          H
          T
          H
          T
          H
          T
          H
          T
          H
          T

>>> print "Computed entropy:", frame.entropy("data")

        Computed entropy: 0.69314718056

If we have more choices and weights, the computation is not as simple. An on-line search for “Shannon Entropy” will provide more detail.

>>> print frame.inspect()
           data:int32   weight:int32
         -----------------------------
                    0              1
                    1              2
                    2              4
                    4              8

>>> print "Computed entropy:", frame.entropy("data", "weight")

         Computed entropy: 1.13691659183