Frames Frame


class Frame

Large table of data.

Acts as a proxy object to a frame of data on the server, with properties and functions to operate on that frame.

Attributes

column_names Column identifications in the current frame.
last_read_date Last time this frame’s data was accessed.
name Set or get the name of the frame object.
row_count Number of rows in the current frame.
schema Current frame column names and types.
status Current frame life cycle status.

Methods

__init__(self[, source, name, _info]) Create a Frame/frame.
add_columns(self, func, schema[, columns_accessed]) Add columns to current frame.
append(self, data) Adds more data to the current frame.
assign_sample(self, sample_percentages[, sample_labels, ...]) Randomly group rows into user-defined classes.
bin_column(self, column_name, cutoffs[, include_lowest, strict_binning, ...]) Classify data into user-defined groups.
bin_column_equal_depth(self, column_name[, num_bins, ...]) Classify column into groups with the same frequency.
bin_column_equal_width(self, column_name[, num_bins, ...]) Classify column into same-width groups.
box_cox(self, column_name[, lambda_value, box_cox_column_name]) Calculate the box-cox transformation for each row in current frame.
categorical_summary(self, *column_inputs) Compute a summary of the data in a column(s) for categorical or numerical data types.
classification_metrics(self, label_column, pred_column[, ...]) Model statistics of accuracy, precision, and others.
column_median(self, data_column[, weights_column]) Calculate the (weighted) median of a column.
column_mode(self, data_column[, weights_column, max_modes_returned]) Evaluate the weights assigned to rows.
column_summary_statistics(self, data_column[, ...]) Calculate multiple statistics for a column.
copy(self[, columns, where, name]) Create new frame from current frame.
correlation(self, data_column_names) Calculate correlation for two columns of current frame.
correlation_matrix(self, data_column_names[, matrix_name]) Calculate correlation matrix for two or more columns.
count(self, where) Counts the number of rows which meet given criteria.
covariance(self, data_column_names) Calculate covariance for exactly two columns.
covariance_matrix(self, data_column_names[, matrix_name]) Calculate covariance matrix for two or more columns.
cumulative_percent(self, sample_col) Add column to frame with cumulative percent sum.
cumulative_sum(self, sample_col) Add column to frame with cumulative percent sum.
daal_covariance_matrix(self, data_column_names[, matrix_name]) [BETA] Calculate covariance matrix for two or more columns.
dot_product(self, left_column_names, right_column_names, ...[, ...]) Calculate dot product for each row in current frame.
download(self[, n, offset, columns]) Download frame data from the server into client workspace as a pandas dataframe
drop_columns(self, columns) Remove columns from the frame.
drop_duplicates(self[, unique_columns]) Modify the current frame, removing duplicate rows.
drop_rows(self, predicate) Erase any row in the current frame which qualifies.
ecdf(self, column[, result_frame_name]) Builds new frame with columns for data and distribution.
entropy(self, data_column[, weights_column]) Calculate the Shannon entropy of a column.
export_to_csv(self, folder_name[, separator, count, offset]) Write current frame to HDFS in csv format.
export_to_hbase(self, table_name[, key_column_name, family_name]) Write current frame to HBase table.
export_to_hive(self, table_name) Write current frame to Hive table.
export_to_jdbc(self, table_name[, connector_type]) Write current frame to JDBC table.
export_to_json(self, folder_name[, count, offset]) Write current frame to HDFS in JSON format.
filter(self, predicate) Select all rows which satisfy a predicate.
flatten_columns(self, columns[, delimiters]) Spread data to multiple rows based on cell data.
get_error_frame(self) Get a frame with error recordings.
group_by(self, group_by_columns, *aggregation_arguments) Create summarized frame.
histogram(self, column_name[, num_bins, weight_column_name, bin_type]) Compute the histogram for a column in a frame.
inspect(self[, n, offset, columns, wrap, truncate, round, width, margin, ...]) Pretty-print of the frame data
join(self, right, left_on[, right_on, how, name]) Join operation on one or two frames, creating a new frame.
quantiles(self, column_name, quantiles) New frame with Quantiles and their values.
rename_columns(self, names) Rename columns
reverse_box_cox(self, column_name[, lambda_value, box_cox_column_name]) Calculate the reverse box-cox transformation for each row in current frame.
sort(self, columns[, ascending]) Sort the data in a frame.
sorted_k(self, k, column_names_and_ascending[, reduce_tree_depth]) Get a sorted subset of the data.
take(self, n[, offset, columns]) Get data subset.
tally(self, sample_col, count_val) Count number of times a value is seen.
tally_percent(self, sample_col, count_val) Compute a cumulative percent count.
timeseries_augmented_dickey_fuller_test(...) Augmented Dickey-Fuller statistics test
timeseries_breusch_godfrey_test(self, residuals, ...) Breusch-Godfrey statistics test
timeseries_breusch_pagan_test(self, residuals, factors) Breusch-Pagan statistics test
timeseries_durbin_watson_test(self, residuals) Durbin-Watson statistics test
timeseries_from_observations(self, date_time_index, ...) Returns a frame that has the observations formatted as a time series.
timeseries_slice(self, date_time_index, start, end) Returns a frame that is a sub-slice of the given series.
top_k(self, column_name, k[, weights_column]) Most or least frequent column values.
unflatten_columns(self, columns[, delimiter]) Compacts data from multiple rows based on cell data.
__init__(self, source=None, name=None)

Create a Frame/frame.

Parameters:

source : CsvFile | Frame (default=None)

A source of initial data.

name : str (default=None)

The name of the newly created frame. Default is None.

Notes

A frame with no name is subject to garbage collection.

If a string in the CSV file starts and ends with a double-quote (”) character, the character is stripped off of the data before it is put into the field. Anything, including delimiters, between the double-quote characters is considered part of the str. If the first character after the delimiter is anything other than a double-quote character, the string will be composed of all the characters between the delimiters, including double-quotes. If the first field type is str, leading spaces on each row are considered part of the str. If the last field type is str, trailing spaces on each row are considered part of the str.

Examples

Create a new frame based upon the data described in the CsvFile object my_csv_schema. Name the frame “myframe”. Create a Frame my_frame to access the data:

>>> my_frame = ta.Frame(my_csv_schema, "myframe")

A Frame object has been created and my_frame is its proxy. It brought in the data described by my_csv_schema. It is named myframe.

Create an empty frame; name it “yourframe”:

>>> your_frame = ta.Frame(name='yourframe')

A frame has been created and Frame your_frame is its proxy. It has no data yet, but it does have the name yourframe.