EdgeFrame top_k¶
-
top_k
(self, column_name, k, weights_column=None)¶ Most or least frequent column values.
Parameters: column_name : unicode
The column whose top (or bottom) K distinct values are to be calculated.
k : int32
Number of entries to return (If k is negative, return bottom k).
weights_column : unicode (default=None)
The column that provides weights (frequencies) for the topK calculation. Must contain numerical data. Default is 1 for all items.
Returns: : <bound method AtkEntityType.__name__ of <trustedanalytics.rest.jsonschema.AtkEntityType object at 0x7f9e686f3fd0>>
An object with access to the frame of data.
Calculate the top (or bottom) K distinct values by count of a column. The column can be weighted. All data elements of weight <= 0 are excluded from the calculation, as are all data elements whose weight is NaN or infinite. If there are no data elements of finite weight > 0, then topK is empty.
Examples
For this example, we calculate the top 5 movie genres in a data frame:
>>> top5 = frame.top_k('genre', 5) >>> top5.inspect() genre:str count:float64 /---------------------------/ Drama 738278 Comedy 671398 Short 455728 Documentary 323150 Talk-Show 265180
This example calculates the top 3 movies weighted by rating:
>>> top3 = frame.top_k('genre', 3, weights_column='rating') >>> top3.inspect() movie:str count:float64 /------------------------------/ The Godfather 7689.0 Shawshank Redemption 6358.0 The Dark Knight 5426.0
This example calculates the bottom 3 movie genres in a data frame:
>>> bottom3 = frame.top_k('genre', -3) >>> bottom3.inspect() genre:str count:float64 /---------------------------/ Musical 26 War 47 Film-Noir 595