ArimaModel init¶

__init__(self, name=None)¶

Create a ‘new’ instance of an Autoregressive Integrated Moving Average (ARIMA) model.

Parameters:

Parameters:	name : unicode (default=None) User supplied name.
Returns:	: Model A new instance of ARIMAModel

name : unicode (default=None)

User supplied name.

Returns:

: Model

A new instance of ARIMAModel

An autoregressive integrated moving average (ARIMA) [R4] model is a generalization of an autoregressive moving average (ARMA) model. These models are fitted to time series data either to better understand the data or to predict future points in the series (forecasting). Non-seasonal ARIMA models are generally denoted ARIMA (p,d,q) where parameters p, d, and q are non-negative integers, p is the order of the Autoregressive model, d is the degree of differencing, and q is the order of the Moving-average model.

footnotes

[R4]	https://en.wikipedia.org/wiki/Autoregressive_integrated_moving_average

Examples

Consider the following frame of observations collected over seven days.

The frame has three columns: timestamp, name, and value.

>>> frame.inspect()
[#]  timestamp                 name   value
=================================================
[0]  2015-01-01T00:00:00.000Z  Sarah  12.88969427
[1]  2015-01-02T00:00:00.000Z  Sarah  13.54964408
[2]  2015-01-03T00:00:00.000Z  Sarah   13.8432745
[3]  2015-01-04T00:00:00.000Z  Sarah  12.13843611
[4]  2015-01-05T00:00:00.000Z  Sarah  12.81156092
[5]  2015-01-06T00:00:00.000Z  Sarah   14.2499628
[6]  2015-01-07T00:00:00.000Z  Sarah  15.12102595

Define the date time index:

>>> datetimeindex = ['2015-01-01T00:00:00.000Z','2015-01-02T00:00:00.000Z',
... '2015-01-03T00:00:00.000Z','2015-01-04T00:00:00.000Z','2015-01-05T00:00:00.000Z',
... '2015-01-06T00:00:00.000Z','2015-01-07T00:00:00.000Z']

Then, create a time series frame from the frame of observations, since the ARIMA model expects data to be in a time series format (where the time series values are in a vector column).

>>> ts = frame.timeseries_from_observations(datetimeindex, "timestamp","name","value")
[===Job Progress===]

>>> ts.inspect()
[#]  name
==========
[0]  Sarah

[#]  value
================================================================================
[0]  [12.88969427, 13.54964408, 13.8432745, 12.13843611, 12.81156092, 14.2499628, 15.12102595]

Use the frame take function to get one row of data with just the “value” column

>>> ts_frame_data = ts.take(n=1,offset=0,columns=["value"])

From the ts_frame_data, get the first row and first column to extract out just the time series values.

>>> ts_values = ts_frame_data[0][0]

>>> ts_values
[12.88969427,
54964408,
8432745,
13843611,
81156092,
2499628,
12102595]

Create an ARIMA model:

>>> model = ta.ArimaModel()
[===Job Progress===]

Train the model using the timeseries frame:

>>> model.train(ts_values, 1, 0, 1)
[===Job Progress===]
{u'coefficients': [9.864444620964322, 0.2848511106449633, 0.47346114378593795]}

Call predict to forecast values by passing the number of future periods to predict beyond the length of the time series. Since the parameter in this example is 0, predict will forecast 7 values (the same number of values that were in the original time series vector).

>>> model.predict(0)
[===Job Progress===]
{u'forecasted': [12.674342627141744,
  13.638048984791693,
  13.682219498657313,
  13.883970022400577,
  12.49564914570843,
  13.66340392811346,
  14.201275185574925]}

>>> model.publish()
[===Job Progress===]

Take the path to the published model and run it in the Scoring Engine:

>>> import requests
>>> headers = {'Content-type': 'application/json', 'Accept': 'application/json,text/plain'}

Post a request to get the metadata about the model.

>>> r = requests.get('http://mymodel.demotrustedanalytics.com/v2/metadata')
>>> r.text
u'{"model_details":{"model_type":"ARIMA Model","model_class":"com.cloudera.sparkts.models.ARIMAModel","model_reader":"org.trustedanalytics.atk.scoring.models.ARIMAModelReaderPlugin","custom_values":{}},"input":[{"name":"timeseries","value":"Array[Double]"},{"name":"future","value":"Int"}],"output":[{"name":"timeseries","value":"Array[Double]"},{"name":"future","value":"Int"},{"name":"predicted_values","value":"Array[Double]"}]}'

ARIMA model support started in version 2 of the scoring engine REST API. We send the number of values to forecast beyond the length of the time series (in this example we are passing 0). This means that since 7 historical time series values were provided, 7 future periods will be forecasted.

>>> r = requests.post('http://mymodel.demotrustedanalytics.com/v2/score',json={"records":[{"future":0}]})

The ‘predicted_values’ array contains the future values, which have been forecasted based on the historical data.

>>> r.text
u'{"data":[{"future":0.0,"predicted_values":[12.674342627141744,13.638048984791693,13.682219498657313,13.883970022400577,12.49564914570843,13.66340392811346,14.201275185574925]}]}'

Quick search

Table Of Contents

ArimaModel __init__¶

ArimaModel init¶