Table Of Contents

LogisticRegressionModel train


train(self, frame, label_column, observation_columns, frequency_column=None, num_classes=2, optimizer='LBFGS', compute_covariance=True, intercept=True, feature_scaling=False, threshold=0.5, reg_type='L2', reg_param=0.0, num_iterations=100, convergence_tolerance=0.0001, num_corrections=10, mini_batch_fraction=1.0, step_size=1)

[ALPHA] Build logistic regression model.

Parameters:

frame : <bound method AtkEntityType.__name__ of <trustedanalytics.rest.jsonschema.AtkEntityType object at 0x7f9e686f3fd0>>

A frame to train the model on.

label_column : unicode

Column name containing the label for each observation.

observation_columns : list

Column(s) containing the observations.

frequency_column : unicode (default=None)

Optional column containing the frequency of observations.

num_classes : int32 (default=2)

Number of classes

optimizer : unicode (default=LBFGS)

Set type of optimizer. | LBFGS - Limited-memory BFGS. | LBFGS supports multinomial logistic regression. | SGD - Stochastic Gradient Descent. | SGD only supports binary logistic regression.

compute_covariance : bool (default=True)

Compute covariance matrix for the model.

intercept : bool (default=True)

Add intercept column to training data.

feature_scaling : bool (default=False)

Perform feature scaling before training model.

threshold : float64 (default=0.5)

Threshold for separating positive predictions from negative predictions.

reg_type : unicode (default=L2)

Set type of regularization | L1 - L1 regularization with sum of absolute values of coefficients | L2 - L2 regularization with sum of squares of coefficients

reg_param : float64 (default=0.0)

Regularization parameter

num_iterations : int32 (default=100)

Maximum number of iterations

convergence_tolerance : float64 (default=0.0001)

Convergence tolerance of iterations for L-BFGS. Smaller value will lead to higher accuracy with the cost of more iterations.

num_corrections : int32 (default=10)

Number of corrections used in LBFGS update. Default is 10. Values of less than 3 are not recommended; large values will result in excessive computing time.

mini_batch_fraction : float64 (default=1.0)

Fraction of data to be used for each SGD iteration

step_size : int32 (default=1)

Initial step size for SGD. In subsequent steps, the step size decreases by stepSize/sqrt(t)

Returns:

: dict

An object with a summary of the trained model. The data returned is composed of multiple components:

int : numFeatures
Number of features in the training data
int : numClasses
Number of classes in the training data
table : summaryTable
A summary table composed of:
Frame : CovarianceMatrix (optional)
Covariance matrix of the trained model.

The covariance matrix is the inverse of the Hessian matrix for the trained model. The Hessian matrix is the second-order partial derivatives of the model’s log-likelihood function.

Creating a Logistic Regression Model using the observation column and label column of the train frame.

Examples

Train logistic regression model using Limited-memory-BFGS.

In the example below, the flag for computing the covariance matrix is enabled. When the covariance matrix is enabled, the summary table contains additional statistics about the quality of the trained model.

>>> my_model = ta.LogisticRegressionModel(name='LogReg')
>>> metrics = my_model.train(train_frame, 'name_of_label_column', ['obs1', 'obs2'], 'frequency_column', num_classes=2, optimizer='LBFGS', compute_covariance=True)

>>> metrics.num_features
2

>>> metrics.num_classes
2

>>> metrics.summary_table

            coefficients  degrees_freedom  standard_errors  wald_statistic   p_value
intercept      0.924574                1         0.013052       70.836965    0.000000e+00
obs1           0.405374                1         0.005793       69.973643    1.110223e-16
obs2           0.707372                1         0.006709      105.439358    0.000000e+00

>>> metrics.covariance_matrix.inspect()