EdgeFrame flatten_column¶
-
flatten_column
(self, column, delimiter=None)¶ Spread data to multiple rows based on cell data.
Parameters: column : unicode
The column to be flattened.
delimiter : unicode (default=None)
The delimiter string. Default is comma (,).
Returns: : _Unit
Splits cells in the specified column into multiple rows according to a string delimiter. New rows are a full copy of the original row, but the specified column only contains one value. The original row is deleted.
Examples
Given a data file:
1-"solo,mono,single" 2-"duo,double"
The commands to bring the data into a frame, where it can be worked on:
>>> my_csv = CsvFile("original_data.csv", schema=[('a', int32), ('b', str)], delimiter='-') >>> my_frame = Frame(source=my_csv)
Looking at it:
>>> my_frame.inspect() a:int32 b:str /-------------------------------/ 1 solo, mono, single 2 duo, double
Now, spread out those sub-strings in column b:
>>> my_frame.flatten_column('b')
Check again:
>>> my_frame.inspect() a:int32 b:str /------------------/ 1 solo 1 mono 1 single 2 duo 2 double