EdgeFrame entropy¶
-
entropy
(self, data_column, weights_column=None)¶ Calculate the Shannon entropy of a column.
Parameters: data_column : unicode
The column whose entropy is to be calculated.
weights_column : unicode (default=None)
The column that provides weights (frequencies) for the entropy calculation. Must contain numerical data. Default is using uniform weights of 1 for all items.
Returns: : dict
Entropy.
The data column is weighted via the weights column. All data elements of weight <= 0 are excluded from the calculation, as are all data elements whose weight is NaN or infinite. If there are no data elements with a finite weight greater than 0, the entropy is zero.
Examples
Given a frame of coin flips, half heads and half tails, the entropy is simply ln(2): .. code:
>>> print frame.inspect() data:unicode /----------------/ H T H T H T H T H T >>> print "Computed entropy:", frame.entropy("data") Computed entropy: 0.69314718056
If we have more choices and weights, the computation is not as simple. An on-line search for “Shannon Entropy” will provide more detail.
>>> print frame.inspect() data:int32 weight:int32 ----------------------------- 0 1 1 2 2 4 4 8 >>> print "Computed entropy:", frame.entropy("data", "weight") Computed entropy: 1.13691659183