Frame group_by¶

group_by(self, group_by_columns, aggregation_arguments=None)¶

[BETA] Create summarized frame.

Parameters:

Parameters:	group_by_columns : list Column name or list of column names aggregation_arguments : dict (default=None) Aggregation function based on entire row, and/or dictionaries (one or more) of { column name str : aggregation function(s) }.
Returns:	: Frame A new frame with the results of the group_by

group_by_columns : list

Column name or list of column names

aggregation_arguments : dict (default=None)

Aggregation function based on entire row, and/or dictionaries (one or more) of { column name str : aggregation function(s) }.

Returns:

: Frame

A new frame with the results of the group_by

Creates a new frame and returns a Frame object to access it. Takes a column or group of columns, finds the unique combination of values, and creates unique rows with these column values. The other columns are combined according to the aggregation argument(s).

Notes

Column order is not guaranteed when columns are added
The column names created by aggregation functions in the new frame are the original column name appended with the ‘_’ character and the aggregation function. For example, if the original field is a and the function is avg, the resultant column is named a_avg.
An aggregation argument of count results in a column named count.
The aggregation function agg.count is the only full row aggregation function supported at this time.
Aggregation currently supports using the following functions:
- avg
- count
- count_distinct
- max
- min
- stdev
- sum
- var (see glossary Bias vs Variance)

Examples

For setup, we will use a Frame my_frame accessing a frame with a column a:

>>> my_frame.inspect()

  a:str
/-------/
  cat
  apple
  bat
  cat
  bat
  cat

Create a new frame, combining similar values of column a, and count how many of each value is in the original frame:

>>> new_frame = my_frame.group_by('a', agg.count)
>>> new_frame.inspect()

  a:str       count:int
/-----------------------/
  cat             3
  apple           1
  bat             2

In this example, ‘my_frame’ is accessing a frame with three columns, a, b, and c:

>>> my_frame.inspect()

  a:int   b:str   c:float
/-------------------------/
  1       alpha     3.0
  1       bravo     5.0
  1       alpha     5.0
  2       bravo     8.0
  2       bravo    12.0

Create a new frame from this data, grouping the rows by unique combinations of column a and b. Average the value in c for each group:

>>> new_frame = my_frame.group_by(['a', 'b'], {'c' : agg.avg})
>>> new_frame.inspect()

  a:int   b:str   c_avg:float
/-----------------------------/
  1       alpha     4.0
  1       bravo     5.0
  2       bravo    10.0

For this example, we use my_frame with columns a, c, d, and e:

>>> my_frame.inspect()

  a:str   c:int   d:float e:int
/-------------------------------/
  ape     1       4.0     9
  ape     1       8.0     8
  big     1       5.0     7
  big     1       6.0     6
  big     1       8.0     5

Create a new frame from this data, grouping the rows by unique combinations of column a and c. Count each group; for column d calculate the average, sum and minimum value. For column e, save the maximum value:

>>> new_frame = my_frame.group_by(['a', 'c'], agg.count, {'d': [agg.avg, agg.sum, agg.min], 'e': agg.max})

  a:str   c:int   count:int  d_avg:float  d_sum:float   d_min:float   e_max:int
/-------------------------------------------------------------------------------/
  ape     1       2          6.0          12.0          4.0           9
  big     1       3          6.333333     19.0          5.0           7

For further examples, see Group by (and aggregate):.

Quick search

Table Of Contents

Frame group_by¶