Frames Frame


class Frame

Large table of data.

Class with information about a large row and columnar data store in a frame, Has information needed to modify data and table structure.

Attributes

column_names Column identifications in the current frame.
name Set or get the name of the frame object.
row_count Number of rows in the current frame.
schema Current frame column names and types.
status Current frame life cycle status.

Methods

__init__(self[, source, name, _info]) Create a Frame/frame.
add_columns(self, func, schema[, columns_accessed]) Add columns to current frame.
append(self, data) Adds more data to the current frame.
assign_sample(self, sample_percentages[, sample_labels, ...]) Randomly group rows into user-defined classes.
bin_column(self, column_name, cutoffs[, include_lowest, strict_binning, ...]) Classify data into user-defined groups.
bin_column_equal_depth(self, column_name[, num_bins, ...]) Classify column into groups with the same frequency.
bin_column_equal_width(self, column_name[, num_bins, ...]) Classify column into same-width groups.
categorical_summary(self, *column_inputs) [ALPHA] Compute a summary of the data in a column(s) for categorical or numerical data types.
classification_metrics(self, label_column, pred_column[, ...]) Model statistics of accuracy, precision, and others.
column_median(self, data_column[, weights_column]) Calculate the (weighted) median of a column.
column_mode(self, data_column[, weights_column, max_modes_returned]) Evaluate the weights assigned to rows.
column_summary_statistics(self, data_column[, ...]) Calculate multiple statistics for a column.
compute_misplaced_score(self, gravity)  
copy(self[, columns, where, name]) Create new frame from current frame.
correlation(self, data_column_names) Calculate correlation for two columns of current frame.
correlation_matrix(self, data_column_names[, matrix_name]) Calculate correlation matrix for two or more columns.
count(self, where) Counts the number of rows which meet given criteria.
covariance(self, data_column_names) Calculate covariance for exactly two columns.
covariance_matrix(self, data_column_names[, matrix_name]) Calculate covariance matrix for two or more columns.
cumulative_percent(self, sample_col) [BETA] Add column to frame with cumulative percent sum.
cumulative_sum(self, sample_col) [BETA] Add column to frame with cumulative percent sum.
dot_product(self, left_column_names, right_column_names, ...[, ...]) [ALPHA] Calculate dot product for each row in current frame.
download(self[, n, offset, columns]) Download a frame from the server into client workspace.
drop_columns(self, columns) Remove columns from the frame.
drop_duplicates(self[, unique_columns]) Modify the current frame, removing duplicate rows.
drop_rows(self, predicate) Erase any row in the current frame which qualifies.
ecdf(self, column[, result_frame_name]) Builds new frame with columns for data and distribution.
entropy(self, data_column[, weights_column]) Calculate the Shannon entropy of a column.
export_to_csv(self, folder_name[, separator, count, offset]) Write current frame to HDFS in csv format.
export_to_hbase(self, table_name[, key_column_name, family_name]) Write current frame to HBase table.
export_to_hive(self, table_name) Write current frame to Hive table.
export_to_jdbc(self, table_name[, connector_type, url, driver_name, ...]) Write current frame to JDBC table.
export_to_json(self, folder_name[, count, offset]) Write current frame to HDFS in JSON format.
filter(self, predicate) Select all rows which satisfy a predicate.
flatten_column(self, column[, delimiter]) Spread data to multiple rows based on cell data.
get_error_frame(self) Get a frame with error recordings.
group_by(self, group_by_columns, *aggregation_arguments) [BETA] Create summarized frame.
histogram(self, column_name[, num_bins, weight_column_name, bin_type]) [BETA] Compute the histogram for a column in a frame.
inspect(self[, n, offset, columns, wrap, truncate, round, width, margin]) Prints the frame data in readable format.
join(self, right, left_on[, right_on, how, name]) [BETA] Join operation on one or two frames, creating a new frame.
label_propagation(self, src_col_name, dest_col_name, ...[, ...]) Label Propagation on Gaussian Random Fields.
loadhbase(self, table_name, schema[, start_tag, end_tag]) Append data from an HBase table into an existing (possibly empty) FrameRDD
loadhive(self, query) Append data from a hive table into an existing (possibly empty) frame
loadjdbc(self, table_name[, connector_type, url, driver_name, query]) Append data from a JDBC table into an existing (possibly empty) frame
loopy_belief_propagation(self, src_col_name, ...[, ...]) Message passing to infer state probabilities.
quantiles(self, column_name, quantiles) New frame with Quantiles and their values.
rename_columns(self, names) Rename columns
sort(self, columns[, ascending]) [BETA] Sort the data in a frame.
sorted_k(self, k, column_names_and_ascending[, reduce_tree_depth]) [ALPHA] Get a sorted subset of the data.
take(self, n[, offset, columns]) Get data subset.
tally(self, sample_col, count_val) [BETA] Count number of times a value is seen.
tally_percent(self, sample_col, count_val) [BETA] Compute a cumulative percent count.
top_k(self, column_name, k[, weights_column]) Most or least frequent column values.
unflatten_column(self, composite_key_column_names[, delimiter]) Compacts data from multiple rows based on cell data.
__init__(self, source=None, name=None)

Create a Frame/frame.

Parameters:

source : CsvFile | Frame (default=None)

A source of initial data.

name : str (default=None)

The name of the newly created frame. Default is None.

Notes

A frame with no name is subject to garbage collection.

If a string in the CSV file starts and ends with a double-quote (”) character, the character is stripped off of the data before it is put into the field. Anything, including delimiters, between the double-quote characters is considered part of the str. If the first character after the delimiter is anything other than a double-quote character, the string will be composed of all the characters between the delimiters, including double-quotes. If the first field type is str, leading spaces on each row are considered part of the str. If the last field type is str, trailing spaces on each row are considered part of the str.

Examples

Create a new frame based upon the data described in the CsvFile object my_csv_schema. Name the frame “myframe”. Create a Frame my_frame to access the data:

>>> my_frame = ta.Frame(my_csv_schema, "myframe")

A Frame object has been created and my_frame is its proxy. It brought in the data described by my_csv_schema. It is named myframe.

Create an empty frame; name it “yourframe”:

>>> your_frame = ta.Frame(name='yourframe')

A frame has been created and Frame your_frame is its proxy. It has no data yet, but it does have the name yourframe.