sparktk arima
ARIMA (Autoregressive Integrated Moving Average) Model
Functions
def load(
path, tc=<class 'sparktk.arguments.implicit'>)
load ArimaModel from given path
def train(
ts, p, d, q, include_intercept=True, method='css-cgd', init_params=None, tc=<class 'sparktk.arguments.implicit'>)
Creates Autoregressive Integrated Moving Average (ARIMA) Model from the specified time series values.
Given a time series, fits an non-seasonal Autoregressive Integrated Moving Average (ARIMA) model of order (p, d, q) where p represents the autoregression terms, d represents the order of differencing, and q represents the moving average error terms. If includeIntercept is true, the model is fitted with an intercept.
ts | (List[float]): | Time series to which to fit an ARIMA(p, d, q) model. |
p | (int): | Autoregressive order |
d | (int): | Differencing order |
q | (int): | Moving average order |
include_intercept | (Optional(boolean)): | If True, the model is fit with an intercept. Default is True. |
method | (Optional(string)): | Objective function and optimization method. Current options are: 'css-bobyqa' and 'css-cgd'. Both optimize the log likelihood in terms of the conditional sum of squares. The first uses BOBYQA for optimization, while the second uses conjugate gradient descent. Default is 'css-cgd'. |
init_params | (Optional(List[float]): | A set of user provided initial parameters for optimization. If the list is empty (default), initialized using Hannan-Rissanen algorithm. If provided, order of parameter should be: intercept term, AR parameters (in increasing order of lag), MA parameters (in increasing order of lag). |
Returns | (ArimaModel): | Trained ARIMA model |
Classes
class ArimaModel
A trained Autoregressive Integrated Moving Average (ARIMA) model.
Consider the following frame that has three columns: timestamp, name, and value.
>>> frame.inspect()
[#] timestamp name value
=================================================
[0] 2015-01-01T00:00:00.000Z Sarah 12.88969427
[1] 2015-01-02T00:00:00.000Z Sarah 13.54964408
[2] 2015-01-03T00:00:00.000Z Sarah 13.8432745
[3] 2015-01-04T00:00:00.000Z Sarah 12.13843611
[4] 2015-01-05T00:00:00.000Z Sarah 12.81156092
[5] 2015-01-06T00:00:00.000Z Sarah 14.2499628
[6] 2015-01-07T00:00:00.000Z Sarah 15.12102595
Define the date time index:
>>> datetimeindex = ['2015-01-01T00:00:00.000Z','2015-01-02T00:00:00.000Z',
... '2015-01-03T00:00:00.000Z','2015-01-04T00:00:00.000Z','2015-01-05T00:00:00.000Z',
... '2015-01-06T00:00:00.000Z','2015-01-07T00:00:00.000Z']
Then, create a time series frame from the frame of observations, since the ARIMA model expects data to be in a time series format (where the time series values are in a vector column).
>>> ts = frame.timeseries_from_observations(datetimeindex, "timestamp","name","value")
[===Job Progress===]
>>> ts.inspect()
[#] name
==========
[0] Sarah
<BLANKLINE>
[#] value
================================================================================
[0] [12.88969427, 13.54964408, 13.8432745, 12.13843611, 12.81156092, 14.2499628, 15.12102595]
Use the frame take function to get one row of data with just the "value" column
>>> ts_frame_data = ts.take(n=1,offset=0,columns=["value"])
From the ts_frame_data, get the first row and first column to extract out just the time series values.
>>> ts_values = ts_frame_data[0][0].tolist()
Train the ARIMA model by specifying the list of time series values, p, d, q (and optionally include_intercept, method, and init_params):
>>> model = tc.models.timeseries.arima.train(ts_values, 1, 0, 1)
Forecast future values by calling predict(). By default, the number of forecasted values is equal to the number of values that was passed to during training. In this example, we trained with 7 valuse in ts_values, so 7 values are returned from predict().
>>> model.predict()
[12.674342627141744,
13.638048984791693,
13.682219498657313,
13.883970022400577,
12.49564914570843,
13.66340392811346,
14.201275185574925]
To forecast more values beyond the length of the time series, specify the number of future_periods to add on. Here we will specify future_periods = 3, so that we get a total of 10 predicted values.
>>> model.predict(future_periods=3)
[12.674342627141744,
13.638048984791693,
13.682219498657313,
13.883970022400577,
12.49564914570843,
13.66340392811346,
14.201275185574925,
14.345159879072785,
13.950679344897772,
13.838311126610202]
Save the trained model to use later:
>>> save_path = "sandbox/savedArimaModel"
>>> model.save(save_path)
The model can be loaded from the tk context like:
>>> loaded_model = tc.load(save_path)
>>> loaded_model.predict()
[12.674342627141744,
13.638048984791693,
13.682219498657313,
13.883970022400577,
12.49564914570843,
13.66340392811346,
14.201275185574925]
The trained model can also be exported to a .mar file, to be used with the scoring engine:
>>> canonical_path = model.export_to_mar("sandbox/arima.mar")
Ancestors (in MRO)
- ArimaModel
- sparktk.propobj.PropertiesObject
- __builtin__.object
Instance variables
var coefficients
Coefficient values from the trained model (intercept, AR, MA, with increasing degrees).
var d
Differencing order
var include_intercept
True, if the model was fit with an intercept.
var init_params
A set of user provided initial parameters for optimization
var method
Objective function and optimization method. Either: 'css-bobyqa' or 'css-cgd'.
var p
Autoregressive order
var q
Moving average order
var ts_values
List of time series values that were used to fit the model.
Methods
def __init__(
self, tc, scala_model)
def export_to_mar(
self, path)
Exports the trained model as a model archive (.mar) to the specified path.
path | (str): | Path to save the trained model |
:returns (str) Full path to the saved .mar file
def predict(
self, future_periods=0, ts=None)
Forecasts future periods using ARIMA.
Provided fitted values of the time series as 1-step ahead forecasts, based on current model parameters, then provide future periods of forecast. We assume AR terms prior to the start of the series are equal to the model's intercept term (or 0.0, if fit without an intercept term). Meanwhile, MA terms prior to the start are assumed to be 0.0. If there is differencing, the first d terms come from the original series.
future_periods | (int): | Periods in the future to forecast (beyond length of time series that the model was trained with). |
ts | (Optional(List[float])): | Optional list of time series values to use as golden values. If no time series values are provided, the values used during training will be used during forecasting. |
def save(
self, path)
Save the trained model to the specified path
:param path: Path to save
def to_dict(
self)
def to_json(
self)