sparktk arima

ARIMA (Autoregressive Integrated Moving Average) Model

Functions

def load(

path, tc=<class 'sparktk.arguments.implicit'>)

load ArimaModel from given path

def train(

ts, p, d, q, include_intercept=True, method='css-cgd', init_params=None, tc=<class 'sparktk.arguments.implicit'>)

Creates Autoregressive Integrated Moving Average (ARIMA) Model from the specified time series values.

Given a time series, fits an non-seasonal Autoregressive Integrated Moving Average (ARIMA) model of order (p, d, q) where p represents the autoregression terms, d represents the order of differencing, and q represents the moving average error terms. If includeIntercept is true, the model is fitted with an intercept.

Parameters:

(List[float]):

Time series to which to fit an ARIMA(p, d, q) model.

(int):

Autoregressive order

(int):

Differencing order

(int):

Moving average order

include_intercept

(Optional(boolean)):

If True, the model is fit with an intercept. Default is True.

method

(Optional(string)):

Objective function and optimization method. Current options are: 'css-bobyqa' and 'css-cgd'. Both optimize the log likelihood in terms of the conditional sum of squares. The first uses BOBYQA for optimization, while the second uses conjugate gradient descent. Default is 'css-cgd'.

init_params

(Optional(List[float]):

A set of user provided initial parameters for optimization. If the list is empty (default), initialized using Hannan-Rissanen algorithm. If provided, order of parameter should be: intercept term, AR parameters (in increasing order of lag), MA parameters (in increasing order of lag).

Returns

(ArimaModel):

Trained ARIMA model

Classes

class ArimaModel

A trained Autoregressive Integrated Moving Average (ARIMA) model.

Example:

Consider the following frame that has three columns: timestamp, name, and value.

>>> frame.inspect()
[#]  timestamp                 name   value
=================================================
[0]  2015-01-01T00:00:00.000Z  Sarah  12.88969427
[1]  2015-01-02T00:00:00.000Z  Sarah  13.54964408
[2]  2015-01-03T00:00:00.000Z  Sarah   13.8432745
[3]  2015-01-04T00:00:00.000Z  Sarah  12.13843611
[4]  2015-01-05T00:00:00.000Z  Sarah  12.81156092
[5]  2015-01-06T00:00:00.000Z  Sarah   14.2499628
[6]  2015-01-07T00:00:00.000Z  Sarah  15.12102595

Define the date time index:

>>> datetimeindex = ['2015-01-01T00:00:00.000Z','2015-01-02T00:00:00.000Z',
... '2015-01-03T00:00:00.000Z','2015-01-04T00:00:00.000Z','2015-01-05T00:00:00.000Z',
... '2015-01-06T00:00:00.000Z','2015-01-07T00:00:00.000Z']

Then, create a time series frame from the frame of observations, since the ARIMA model expects data to be in a time series format (where the time series values are in a vector column).

>>> ts = frame.timeseries_from_observations(datetimeindex, "timestamp","name","value")
[===Job Progress===]

>>> ts.inspect()
[#]  name
==========
[0]  Sarah
<BLANKLINE>
[#]  value
================================================================================
[0]  [12.88969427, 13.54964408, 13.8432745, 12.13843611, 12.81156092, 14.2499628, 15.12102595]

Use the frame take function to get one row of data with just the "value" column

>>> ts_frame_data = ts.take(n=1,offset=0,columns=["value"])

From the ts_frame_data, get the first row and first column to extract out just the time series values.

>>> ts_values = ts_frame_data[0][0].tolist()

Train the ARIMA model by specifying the list of time series values, p, d, q (and optionally include_intercept, method, and init_params):

>>> model = tc.models.timeseries.arima.train(ts_values, 1, 0, 1)

Forecast future values by calling predict(). By default, the number of forecasted values is equal to the number of values that was passed to during training. In this example, we trained with 7 valuse in ts_values, so 7 values are returned from predict().

>>> model.predict()
[12.674342627141744,
 13.638048984791693,
 13.682219498657313,
 13.883970022400577,
 12.49564914570843,
 13.66340392811346,
 14.201275185574925]

To forecast more values beyond the length of the time series, specify the number of future_periods to add on. Here we will specify future_periods = 3, so that we get a total of 10 predicted values.

>>> model.predict(future_periods=3)
[12.674342627141744,
 13.638048984791693,
 13.682219498657313,
 13.883970022400577,
 12.49564914570843,
 13.66340392811346,
 14.201275185574925,
 14.345159879072785,
 13.950679344897772,
 13.838311126610202]

Save the trained model to use later:

>>> save_path = "sandbox/savedArimaModel"
>>> model.save(save_path)

The model can be loaded from the tk context like:

>>> loaded_model = tc.load(save_path)

>>> loaded_model.predict()
[12.674342627141744,
 13.638048984791693,
 13.682219498657313,
 13.883970022400577,
 12.49564914570843,
 13.66340392811346,
 14.201275185574925]

The trained model can also be exported to a .mar file, to be used with the scoring engine:

>>> canonical_path = model.export_to_mar("sandbox/arima.mar")

Ancestors (in MRO)

ArimaModel
sparktk.propobj.PropertiesObject
__builtin__.object

Instance variables

var coefficients

Coefficient values from the trained model (intercept, AR, MA, with increasing degrees).

var d

Differencing order

var include_intercept

True, if the model was fit with an intercept.

var init_params

A set of user provided initial parameters for optimization

var method

Objective function and optimization method. Either: 'css-bobyqa' or 'css-cgd'.

var p

Autoregressive order

var q

Moving average order

var ts_values

List of time series values that were used to fit the model.

Methods

def __init__(

self, tc, scala_model)

def export_to_mar(

self, path)

Exports the trained model as a model archive (.mar) to the specified path.

Parameters:

path

(str):

Path to save the trained model

:returns (str) Full path to the saved .mar file

def predict(

self, future_periods=0, ts=None)

Forecasts future periods using ARIMA.

Provided fitted values of the time series as 1-step ahead forecasts, based on current model parameters, then provide future periods of forecast. We assume AR terms prior to the start of the series are equal to the model's intercept term (or 0.0, if fit without an intercept term). Meanwhile, MA terms prior to the start are assumed to be 0.0. If there is differencing, the first d terms come from the original series.

Parameters:

future_periods

(int):

Periods in the future to forecast (beyond length of time series that the model was trained with).

(Optional(List[float])):

Optional list of time series values to use as golden values. If no time series values are provided, the values used during training will be used during forecasting.

def save(

self, path)

Save the trained model to the specified path

Parameters:

:param path: Path to save

def to_dict(

self)

def to_json(

self)

Index

Functions

Classes

Functions

Classes

Ancestors (in MRO)

Instance variables

Methods