Models RandomForestRegressorModel¶
-
class
RandomForestRegressorModel
¶ Entity RandomForestRegressorModel
Attributes
last_read_date Read-only property - Last time this model’s data was accessed. name Set or get the name of the model object. status Read-only property - Current model life cycle status. Methods
__init__(self[, name, _info]) Create a ‘new’ instance of a Random Forest Regressor model. predict(self, frame[, observation_columns]) Predict the values for the data points. publish(self) Creates a tar file that will be used as input to the scoring engine train(self, frame, value_column, observation_columns[, num_trees, impurity, ...]) Build Random Forests Regressor model.
-
__init__
(self, name=None)¶ Create a ‘new’ instance of a Random Forest Regressor model.
Parameters: name : unicode (default=None)
User supplied name.
Returns: : Model
A new instance of RandomForestRegressor Model
Random Forest [R19] is a supervised ensemble learning algorithm used to perform regression. A Random Forest Regressor model is initialized, trained on columns of a frame, and used to predict the value of each observation in the frame. This model runs the MLLib implementation of Random Forest [R20]. During training, the decision trees are trained in parallel. During prediction, the average over-all tree’s predicted value is the predicted value of the random forest.
footnotes
[R19] https://en.wikipedia.org/wiki/Random_forest [R20] https://spark.apache.org/docs/1.5.0/mllib-ensembles.html#random-forests Examples
Consider the following model trained and tested on the sample data set in frame ‘frame’.
Consider the following frame containing three columns.
>>> frame.inspect() [#] Class Dim_1 Dim_2 ======================================= [0] 1 19.8446136104 2.2985856384 [1] 1 16.8973559126 2.6933495054 [2] 1 5.5548729596 2.7777687995 [3] 0 46.1810010826 3.1611961917 [4] 0 44.3117586448 3.3458963222 [5] 0 34.6334526911 3.6429838715 >>> model = ta.RandomForestRegressorModel() [===Job Progress===] >>> train_output = model.train(frame, 'Class', ['Dim_1', 'Dim_2'], num_trees=1, impurity="variance", max_depth=4, max_bins=100) [===Job Progress===] >>> train_output {u'impurity': u'variance', u'max_bins': 100, u'observation_columns': [u'Dim_1', u'Dim_2'], u'num_nodes': 3, u'max_depth': 4, u'seed': -1632404927, u'num_trees': 1, u'label_column': u'Class', u'feature_subset_category': u'all'} >>> train_output['num_nodes'] 3 >>> train_output['label_column'] u'Class' >>> predicted_frame = model.predict(frame, ['Dim_1', 'Dim_2']) [===Job Progress===] >>> predicted_frame.inspect() [#] Class Dim_1 Dim_2 predicted_value ======================================================== [0] 1 19.8446136104 2.2985856384 1.0 [1] 1 16.8973559126 2.6933495054 1.0 [2] 1 5.5548729596 2.7777687995 1.0 [3] 0 46.1810010826 3.1611961917 0.0 [4] 0 44.3117586448 3.3458963222 0.0 [5] 0 34.6334526911 3.6429838715 0.0 >>> model.publish() [===Job Progress===]
Take the path to the published model and run it in the Scoring Engine
>>> import requests >>> headers = {'Content-type': 'application/json', 'Accept': 'application/json,text/plain'}
Posting a request to get the metadata about the model
>>> r =requests.get('http://mymodel.demotrustedanalytics.com/v2/metadata') >>> r.text u'{"model_details":{"model_type":"Random Forest Regressor Model","model_class":"org.trustedanalytics.atk.scoring.models.RandomForestRegressorScoreModel","model_reader":"org.trustedanalytics.atk.scoring.models.RandomForestRegressorModelReaderPlugin","custom_values":{}},"input":[{"name":"Dim_1","value":"Double"},{"name":"Dim_2","value":"Double"}],"output":[{"name":"Dim_1","value":"Double"},{"name":"Dim_2","value":"Double"},{"name":"Prediction","value":"Double"}]}'
Posting a request to version 1 of Scoring Engine supporting strings for requests and response:
>>> r = requests.post('http://mymodel.demotrustedanalytics.com/v1/score?data=19.8446136, 2.2985856384', headers=headers) >>> r.text u'1.0'
Posting a request to version 1 with multiple records to score:
>>> r = requests.post('http://mymodel.demotrustedanalytics.com/v1/score?data=19.8446136, 2.2985856384&data=46.1810010826, 3.1611961917', headers=headers) >>> r.text u'1.0,0.0'
Posting a request to version 2 of Scoring Engine supporting Json for requests and responses.
>>> r = requests.post("http://mymodel.demotrustedanalytics.com/v2/score", json={"records": [{"Dim_1": 19.8446136, "Dim_2": 2.2985856384}]}) >>> r.text u'{"data":[{"Dim_1":19.8446136,"Dim_2":2.2985856384,"Prediction":1.0}]}'
Posting a request to version 2 with multiple records to score:
>>> r = requests.post("http://mymodel.demotrustedanalytics.com/v2/score", json={"records": [{"Dim_1": 19.8446136, "Dim_2": 2.2985856384}, {"Dim_1": 46.1810010826, "Dim_2": 3.1611961917}]}) >>> r.text u'{"data":[{"Dim_1":19.8446136,"Dim_2":2.2985856384,"Prediction":1.0},{"Dim_1":46.1810010826,"Dim_2":3.1611961917,"Prediction":0.0}]}'