Models RandomForestRegressorModel


class RandomForestRegressorModel

Entity RandomForestRegressorModel

Attributes

last_read_date Read-only property - Last time this model’s data was accessed.
name Set or get the name of the model object.
status Read-only property - Current model life cycle status.

Methods

__init__(self[, name, _info]) Create a ‘new’ instance of a Random Forest Regressor model.
predict(self, frame[, observation_columns]) Predict the values for the data points.
publish(self) Creates a tar file that will be used as input to the scoring engine
train(self, frame, value_column, observation_columns[, num_trees, impurity, ...]) Build Random Forests Regressor model.
__init__(self, name=None)

Create a ‘new’ instance of a Random Forest Regressor model.

Parameters:

name : unicode (default=None)

User supplied name.

Returns:

: Model

A new instance of RandomForestRegressor Model

Random Forest [R19] is a supervised ensemble learning algorithm used to perform regression. A Random Forest Regressor model is initialized, trained on columns of a frame, and used to predict the value of each observation in the frame. This model runs the MLLib implementation of Random Forest [R20]. During training, the decision trees are trained in parallel. During prediction, the average over-all tree’s predicted value is the predicted value of the random forest.

footnotes

[R19]https://en.wikipedia.org/wiki/Random_forest
[R20]https://spark.apache.org/docs/1.5.0/mllib-ensembles.html#random-forests

Examples

Consider the following model trained and tested on the sample data set in frame ‘frame’.

Consider the following frame containing three columns.

>>> frame.inspect()
[#]  Class  Dim_1          Dim_2
=======================================
[0]      1  19.8446136104  2.2985856384
[1]      1  16.8973559126  2.6933495054
[2]      1   5.5548729596  2.7777687995
[3]      0  46.1810010826  3.1611961917
[4]      0  44.3117586448  3.3458963222
[5]      0  34.6334526911  3.6429838715
>>> model = ta.RandomForestRegressorModel()
[===Job Progress===]
>>> train_output = model.train(frame, 'Class', ['Dim_1', 'Dim_2'], num_trees=1, impurity="variance", max_depth=4, max_bins=100)
[===Job Progress===]
>>> train_output
{u'impurity': u'variance', u'max_bins': 100, u'observation_columns': [u'Dim_1', u'Dim_2'], u'num_nodes': 3, u'max_depth': 4, u'seed': -1632404927, u'num_trees': 1, u'label_column': u'Class', u'feature_subset_category': u'all'}
>>> train_output['num_nodes']
3
>>> train_output['label_column']
u'Class'
>>> predicted_frame = model.predict(frame, ['Dim_1', 'Dim_2'])
[===Job Progress===]
>>> predicted_frame.inspect()
[#]  Class  Dim_1          Dim_2         predicted_value
========================================================
[0]      1  19.8446136104  2.2985856384                1.0
[1]      1  16.8973559126  2.6933495054                1.0
[2]      1   5.5548729596  2.7777687995                1.0
[3]      0  46.1810010826  3.1611961917                0.0
[4]      0  44.3117586448  3.3458963222                0.0
[5]      0  34.6334526911  3.6429838715                0.0
>>> model.publish()
[===Job Progress===]

Take the path to the published model and run it in the Scoring Engine

>>> import requests
>>> headers = {'Content-type': 'application/json', 'Accept': 'application/json,text/plain'}

Posting a request to get the metadata about the model

>>> r =requests.get('http://mymodel.demotrustedanalytics.com/v2/metadata')
>>> r.text
u'{"model_details":{"model_type":"Random Forest Regressor Model","model_class":"org.trustedanalytics.atk.scoring.models.RandomForestRegressorScoreModel","model_reader":"org.trustedanalytics.atk.scoring.models.RandomForestRegressorModelReaderPlugin","custom_values":{}},"input":[{"name":"Dim_1","value":"Double"},{"name":"Dim_2","value":"Double"}],"output":[{"name":"Dim_1","value":"Double"},{"name":"Dim_2","value":"Double"},{"name":"Prediction","value":"Double"}]}'

Posting a request to version 1 of Scoring Engine supporting strings for requests and response:

>>> r = requests.post('http://mymodel.demotrustedanalytics.com/v1/score?data=19.8446136, 2.2985856384', headers=headers)
>>> r.text
u'1.0'

Posting a request to version 1 with multiple records to score:

>>> r = requests.post('http://mymodel.demotrustedanalytics.com/v1/score?data=19.8446136, 2.2985856384&data=46.1810010826, 3.1611961917', headers=headers)
>>> r.text
u'1.0,0.0'

Posting a request to version 2 of Scoring Engine supporting Json for requests and responses.

>>> r = requests.post("http://mymodel.demotrustedanalytics.com/v2/score", json={"records": [{"Dim_1": 19.8446136, "Dim_2": 2.2985856384}]})
>>> r.text
u'{"data":[{"Dim_1":19.8446136,"Dim_2":2.2985856384,"Prediction":1.0}]}'

Posting a request to version 2 with multiple records to score:

>>> r = requests.post("http://mymodel.demotrustedanalytics.com/v2/score", json={"records": [{"Dim_1": 19.8446136, "Dim_2": 2.2985856384}, {"Dim_1": 46.1810010826, "Dim_2": 3.1611961917}]})
>>> r.text
u'{"data":[{"Dim_1":19.8446136,"Dim_2":2.2985856384,"Prediction":1.0},{"Dim_1":46.1810010826,"Dim_2":3.1611961917,"Prediction":0.0}]}'