LogisticRegressionModel train¶
-
train
(self, frame, label_column, observation_columns, frequency_column=None, num_classes=2, optimizer='LBFGS', compute_covariance=True, intercept=True, feature_scaling=False, threshold=0.5, reg_type='L2', reg_param=0.0, num_iterations=100, convergence_tolerance=0.0001, num_corrections=10, mini_batch_fraction=1.0, step_size=1)¶ [ALPHA] Build logistic regression model.
Parameters: frame : <bound method AtkEntityType.__name__ of <trustedanalytics.rest.jsonschema.AtkEntityType object at 0x7f9e686f3fd0>>
A frame to train the model on.
label_column : unicode
Column name containing the label for each observation.
observation_columns : list
Column(s) containing the observations.
frequency_column : unicode (default=None)
Optional column containing the frequency of observations.
num_classes : int32 (default=2)
Number of classes
optimizer : unicode (default=LBFGS)
Set type of optimizer. | LBFGS - Limited-memory BFGS. | LBFGS supports multinomial logistic regression. | SGD - Stochastic Gradient Descent. | SGD only supports binary logistic regression.
compute_covariance : bool (default=True)
Compute covariance matrix for the model.
intercept : bool (default=True)
Add intercept column to training data.
feature_scaling : bool (default=False)
Perform feature scaling before training model.
threshold : float64 (default=0.5)
Threshold for separating positive predictions from negative predictions.
reg_type : unicode (default=L2)
Set type of regularization | L1 - L1 regularization with sum of absolute values of coefficients | L2 - L2 regularization with sum of squares of coefficients
reg_param : float64 (default=0.0)
Regularization parameter
num_iterations : int32 (default=100)
Maximum number of iterations
convergence_tolerance : float64 (default=0.0001)
Convergence tolerance of iterations for L-BFGS. Smaller value will lead to higher accuracy with the cost of more iterations.
num_corrections : int32 (default=10)
Number of corrections used in LBFGS update. Default is 10. Values of less than 3 are not recommended; large values will result in excessive computing time.
mini_batch_fraction : float64 (default=1.0)
Fraction of data to be used for each SGD iteration
step_size : int32 (default=1)
Initial step size for SGD. In subsequent steps, the step size decreases by stepSize/sqrt(t)
Returns: : dict
An object with a summary of the trained model. The data returned is composed of multiple components:
int : numFeaturesNumber of features in the training dataint : numClassesNumber of classes in the training datatable : summaryTableA summary table composed of:Frame : CovarianceMatrix (optional)Covariance matrix of the trained model.The covariance matrix is the inverse of the Hessian matrix for the trained model. The Hessian matrix is the second-order partial derivatives of the model’s log-likelihood function.
Creating a Logistic Regression Model using the observation column and label column of the train frame.
Examples
Train logistic regression model using Limited-memory-BFGS.
In the example below, the flag for computing the covariance matrix is enabled. When the covariance matrix is enabled, the summary table contains additional statistics about the quality of the trained model.
>>> my_model = ta.LogisticRegressionModel(name='LogReg') >>> metrics = my_model.train(train_frame, 'name_of_label_column', ['obs1', 'obs2'], 'frequency_column', num_classes=2, optimizer='LBFGS', compute_covariance=True) >>> metrics.num_features 2 >>> metrics.num_classes 2 >>> metrics.summary_table coefficients degrees_freedom standard_errors wald_statistic p_value intercept 0.924574 1 0.013052 70.836965 0.000000e+00 obs1 0.405374 1 0.005793 69.973643 1.110223e-16 obs2 0.707372 1 0.006709 105.439358 0.000000e+00 >>> metrics.covariance_matrix.inspect()