sparktk arx
ARX (autoregressive exogenous) Model
Functions
def load(
path, tc=<class 'sparktk.arguments.implicit'>)
load ArxModel from given path
def train(
frame, ts_column, x_columns, y_max_lag, x_max_lag, no_intercept=False)
Creates a ARX model by training on the given frame. Fit an autoregressive model with additional exogenous variables.
frame | (Frame): | Frame used for training |
ts_column | (str): | Name of the column that contains the time series values. |
x_columns | (List(str)): | Names of the column(s) that contain the values of exogenous regressors. |
y_max_lag | (int): | The maximum lag order for the dependent (time series) variable. |
x_max_lag | (int): | The maximum lag order for exogenous variables. |
no_intercept | (bool): | A boolean flag indicating if the intercept should be dropped. Default is false. |
Returns | (ArxModel): | Trained ARX model |
- Dataset being trained must be small enough to be worked with on a single node.
- If the specified set of exogenous variables is not invertible, an exception is thrown stating that the "matrix is singular". This happens when there are certain patterns in the dataset or columns of all zeros. In order to work around the singular matrix issue, try selecting a different set of columns for exogenous variables, or use a different time window for training.
Classes
class ArxModel
A trained ARX model.
Consider the following model trained and tested on the sample data set in frame 'frame'. The frame has a snippet of air quality data from:
https://archive.ics.uci.edu/ml/datasets/Air+Quality.
Lichman, M. (2013). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.
>>> frame.inspect()
[#] Date Time CO_GT PT08_S1_CO NMHC_GT C6H6_GT PT08_S2_NMHC
============================================================================
[0] 10/03/2004 18.00.00 2.6 1360 150 11.9 1046
[1] 10/03/2004 19.00.00 2.0 1292 112 9.4 955
[2] 10/03/2004 20.00.00 2.2 1402 88 9.0 939
[3] 10/03/2004 21.00.00 2.2 1376 80 9.2 948
[4] 10/03/2004 22.00.00 1.6 1272 51 6.5 836
[5] 10/03/2004 23.00.00 1.2 1197 38 4.7 750
[6] 11/03/2004 00.00.00 1.2 1185 31 3.6 690
[7] 11/03/2004 01.00.00 1.0 1136 31 3.3 672
[8] 11/03/2004 02.00.00 0.9 1094 24 2.3 609
[9] 11/03/2004 03.00.00 0.6 1010 19 1.7 561
<BLANKLINE>
[#] NOx_GT PT08_S3_NOx NO2_GT PT08_S4_NO2 PT08_S5_O3_ T RH AH
==============================================================================
[0] 166 1056 113 1692 1268 13.6 48.9 0.7578
[1] 103 1174 92 1559 972 13.3 47.7 0.7255
[2] 131 1140 114 1555 1074 11.9 54.0 0.7502
[3] 172 1092 122 1584 1203 11.0 60.0 0.7867
[4] 131 1205 116 1490 1110 11.2 59.6 0.7888
[5] 89 1337 96 1393 949 11.2 59.2 0.7848
[6] 62 1462 77 1333 733 11.3 56.8 0.7603
[7] 62 1453 76 1333 730 10.7 60.0 0.7702
[8] 45 1579 60 1276 620 10.7 59.7 0.7648
[9] -200 1705 -200 1235 501 10.3 60.2 0.7517
We will be using the column "T" (temperature) as our time series value:
>>> y = "T"
The sensor values will be used as our exogenous variables:
>>> x = ['CO_GT','PT08_S1_CO','NMHC_GT','C6H6_GT','PT08_S2_NMHC','NOx_GT','PT08_S3_NOx','NO2_GT','PT08_S4_NO2','PT08_S5_O3_']
Train the model and then take a look at the model properties and coefficients:
>>> model = tc.models.timeseries.arx.train(frame, y, x, 0, 0, True)
[===Job Progress===]
>>> model
c = 0.0
coefficients = [0.005567992923907625, -0.010969068059453009, 0.012556586798371176, -0.39792503380811506, 0.04289162879826746, -0.012253952164677924, 0.01192148525581035, 0.014100699808650077, -0.021091473795935345, 0.007622676727420039]
no_intercept = True
x_max_lag = 0
y_max_lag = 0
In this example, we will call predict using the same frame that was used for training, again specifying the name of the time series column and the names of the columns that contain exogenous regressors.
>>> predicted_frame = model.predict(frame, y, x)
[===Job Progress===]
The predicted_frame that's return has a new column called predicted_y. This column contains the predicted time series values.
>>> predicted_frame.column_names
[u'Date',
u'Time',
u'CO_GT',
u'PT08_S1_CO',
u'NMHC_GT',
u'C6H6_GT',
u'PT08_S2_NMHC',
u'NOx_GT',
u'PT08_S3_NOx',
u'NO2_GT',
u'PT08_S4_NO2',
u'PT08_S5_O3_',
u'T',
u'RH',
u'AH',
u'predicted_y']
>>> predicted_frame.inspect(n=15, columns=["T","predicted_y"])
[##] T predicted_y
=========================
[0] 13.6 13.236459938
[1] 13.3 13.0250130899
[2] 11.9 11.4147282294
[3] 11.0 11.3157457822
[4] 11.2 11.3982074883
[5] 11.2 11.7079198051
[6] 11.3 10.7879916472
[7] 10.7 10.527428478
[8] 10.7 10.4439615476
[9] 10.3 10.276662138
[10] 10.1 10.0999996581
[11] 11.0 11.2849327784
[12] 10.5 10.5726885589
[13] 10.2 10.1984619512
[14] 10.8 11.0063774234
The trained model can be saved to be used later:
>>> model_path = "sandbox/savedArxModel"
>>> model.save(model_path)
The saved model can be loaded through the tk context and then used for forecasting values the same way that the original model was used.
>>> loaded_model = tc.load(model_path)
>>> predicted_frame = loaded_model.predict(frame, y, x)
>>> predicted_frame.inspect(n=15,columns=["T","predicted_y"])
[##] T predicted_y
=========================
[0] 13.6 13.236459938
[1] 13.3 13.0250130899
[2] 11.9 11.4147282294
[3] 11.0 11.3157457822
[4] 11.2 11.3982074883
[5] 11.2 11.7079198051
[6] 11.3 10.7879916472
[7] 10.7 10.527428478
[8] 10.7 10.4439615476
[9] 10.3 10.276662138
[10] 10.1 10.0999996581
[11] 11.0 11.2849327784
[12] 10.5 10.5726885589
[13] 10.2 10.1984619512
[14] 10.8 11.0063774234
The trained model can also be exported to a .mar file, to be used with the scoring engine:
>>> canonical_path = model.export_to_mar("sandbox/arx.mar")
Ancestors (in MRO)
- ArxModel
- sparktk.propobj.PropertiesObject
- __builtin__.object
Instance variables
var c
An intercept term (zero if none desired), from the trained model.
var coefficients
Coefficient values from the trained model.
var no_intercept
A boolean flag indicating if the intercept should be dropped.
var x_max_lag
The maximum lag order for exogenous variables.
var y_max_lag
The maximum lag order for the dependent (time series) values.
Methods
def __init__(
self, tc, scala_model)
def export_to_mar(
self, path)
Exports the trained model as a model archive (.mar) to the specified path.
path | (str): | Path to save the trained model |
:returns (str) Full path to the saved .mar file
def predict(
self, frame, ts_column, x_columns)
New frame with column of predicted y values
Predict the time series values for a test frame, based on the specified x values. Creates a new frame revision with the existing columns and a new predicted_y column.
frame | (Frame): | Frame used for predicting the ts values |
ts_column | (str): | Name of the time series column |
x_columns | (List[str]): | Names of the column(s) that contain the values of the exogenous inputs. |
Returns | (Frame): | A new frame containing the original frame's columns and a column *predictied_y* |
def save(
self, path)
Save the trained model to the specified path.
path | (str): | Path to save |
def to_dict(
self)
def to_json(
self)