Table Of Contents

Commands model:lda/train

[BETA] Creates Latent Dirichlet Allocation model

POST /v1/commands/

GET /v1/commands/:id

Request

Route

POST /v1/commands/

Body

name:

model:lda/train

arguments:

model : <bound method AtkEntityType.__name__ of <trustedanalytics.rest.jsonschema.AtkEntityType object at 0x7f9e68702090>>

<Missing Description>

frame : <bound method AtkEntityType.__name__ of <trustedanalytics.rest.jsonschema.AtkEntityType object at 0x7f9e686f3fd0>>

Input frame data.

document_column_name : unicode

Column Name for documents. Column should contain a str value.

word_column_name : unicode

Column name for words. Column should contain a str value.

word_count_column_name : unicode

Column name for word count. Column should contain an int32 or int64 value.

max_iterations : int32 (default=None)

The maximum number of iterations that the algorithm will execute. The valid value range is all positive int. Default is 20.

alpha : float32 (default=None)

The hyperparameter for document-specific distribution over topics. Mainly used as a smoothing parameter in Bayesian inference. Larger value implies that documents are assumed to cover all topics more uniformly; smaller value implies that documents are more concentrated on a small subset of topics. Valid value range is all positive float.

Default is 0.1.

beta : float32 (default=None)

The hyperparameter for word-specific distribution over topics. Mainly used as a smoothing parameter in Bayesian inference. Larger value implies that topics contain all words more uniformly and smaller value implies that topics are more concentrated on a small subset of words. Valid value range is all positive float. Default is 0.1.

convergence_threshold : float32 (default=None)

The amount of change in LDA model parameters that will be tolerated at convergence. If the change is less than this threshold, the algorithm exits before it reaches the maximum number of supersteps. Valid value range is all positive float and 0.0. Default is 0.001.

evaluate_cost : bool (default=None)

“True” means turn on cost evaluation and “False” means turn off cost evaluation. It’s relatively expensive for LDA to evaluate cost function. For time-critical applications, this option allows user to turn off cost function evaluation. Default is “False”.

num_topics : int32 (default=None)

The number of topics to identify in the LDA model. Using fewer topics will speed up the computation, but the extracted topics might be more abstract or less specific; using more topics will result in more computation but lead to more specific topics. Valid value range is all positive int. Default is 10.


Headers

Authorization: test_api_key_1
Content-type: application/json

Description

See the discussion about Latent Dirichlet Allocation at Wikipedia.


Response

Status

200 OK

Body

Returns information about the command. See the Response Body for Get Command here below. It is the same.

GET /v1/commands/:id

Request

Route

GET /v1/commands/18

Body

(None)

Headers

Authorization: test_api_key_1
Content-type: application/json

Response

Status

200 OK

Body

dict

The data returned is composed of multiple components:

Frame : topics_given_doc
Conditional probabilities of topic given document.
Frame : word_given_topics
Conditional probabilities of word given topic.
Frame : topics_given_word
Conditional probabilities of topic given word.
str : report
The configuration and learning curve report for Latent Dirichlet

Allocation as a multiple line str.