Table Of Contents

Frame flatten_column


flatten_column(self, column, delimiter=None)

Spread data to multiple rows based on cell data.

Parameters:

column : unicode

The column to be flattened.

delimiter : unicode (default=None)

The delimiter string. Default is comma (,).

Returns:

: _Unit

Splits cells in the specified column into multiple rows according to a string delimiter. New rows are a full copy of the original row, but the specified column only contains one value. The original row is deleted.

Examples

Given a data file:

1-"solo,mono,single"
2-"duo,double"

The commands to bring the data into a frame, where it can be worked on:

>>> my_csv = CsvFile("original_data.csv", schema=[('a', int32), ('b', str)], delimiter='-')
>>> my_frame = Frame(source=my_csv)

Looking at it:

>>> my_frame.inspect()

  a:int32   b:str
/-------------------------------/
    1       solo, mono, single
    2       duo, double

Now, spread out those sub-strings in column b:

>>> my_frame.flatten_column('b')

Check again:

>>> my_frame.inspect()

  a:int32   b:str
/------------------/
    1       solo
    1       mono
    1       single
    2       duo
    2       double