UpTrain Config
The UpTrain config is a crucial component of the UpTrain Framework and contains all the necessary information for monitoring, training, and evaluating the performance of your machine learning models.
To illustrate the definition of the config, we use the human orientation classification example from the UpTrain repository.
The config is defined by the user and can include various settings such as:
Checks: This config section specifies the monitors and checks that UpTrain should perform on the input data. This includes checks for data drift, edge cases, and data integrity. Users can also specify custom signals specific to their use case to monitor. Further, users can add specific statistics to their training data to monitor or visualize high dimensional data through dimensionality reduction or observe inherent clusters. Such monitors help monitor unstructured data such as high-dimensional embeddings (common in domains such as NLP, recommender systems, etc.) and image data. This is how we can define a check for data drift and concept drift in our config (to learn about how to define we can define different checks, we recommend checking out the UpTrain Monitors section):
checks = [ { 'type': uptrain.Anomaly.DATA_DRIFT, 'reference_dataset': orig_training_file, 'measurable_args': { 'type': uptrain.MeasurableType.INPUT_FEATURE, 'feature_name': 'feat_0' }, }, { 'type': uptrain.Anomaly.CONCEPT_DRIFT, 'algorithm': uptrain.DataDriftAlgo.DDM } ]
Training pipeline: In this config section, attach your training arguments, such as annotation parameters, training functions, conditions to retrain the model, data warehouse location, etc., to enable automated model retraining. This is how the
training_args
looks like in the human orientation classification example:# Define the training pipeline to annotate collected edge cases and retrain the model automatically training_args = { "annotation_method": { "method": uptrain.AnnotationMethod.MASTER_FILE, "args": annotation_args }, "training_func": train_model_torch, "orig_training_file": orig_training_file, }
Evaluation pipeline: In this config section, attach your evaluation arguments, such as inference function, golden testing dataset, what measures & data slices to report, etc., to generate a comprehensive comparison report comparing the production and the retrained models. This report can be used to get deep insights into how the model performance changed due to retraining and can help you decide if you want to deploy the new model or continue with the existing one. Following is an example of the
evaluation_args
definition, borrowed from the human orientation classification example:# Define the evaluation pipeline to compare the retrained and the original model evaluation_args = { "inference_func": get_accuracy_torch, "golden_testing_dataset": golden_testing_file, }
Logging configuration: This section allows users to configure the logging settings for the UpTrain Framework. Uptrain supports visualizations with the streamlit dashboard. Users can define whether they prefer logging with streamlit to be enabled through the variable
st_logging
. This will allow them to monitor their models through the streamlit dashboard. The UpTrain community is working on integrating other popular dashboards, such as grafana, into the framework. The config also allows users to customize the UpTrain dashboard. Users can specify the dashboard layout, the metrics to be displayed, the URL and port on which the dashboard app runs, the time range for displaying the data, etc. Following is an example definition of the logging args:logging_args = { 'st_logging': True, 'log_folder': 'uptrain_logs', 'dashboard_port': 50001, }
Retraining parameters: The parameter
retrain_after
determines the retraining of the model after sufficient data-points are collected.
With all the individual pipelines defined, we are now ready to define the dictionary config
for the UpTrain framework:
config = {
"checks": checks,
"training_args": training_args,
"evaluation_args": evaluation_args,
# Retrain when 200 datapoints are collected in the retraining dataset
"retrain_after": 200,
# A local folder to store the retraining dataset (such as the edge cases)
"retraining_folder": "uptrain_smart_data",
"logging_args": logging_args
}
An important point to note is that all the above arguments (except checks
) are optional in the above config. So, as long as we know what we want to monitor, we can quickly get started with UpTrain. The config definition from the fraud detection example requires little information to get started.
config = {
# Check to identify concept drift using the DDM algorithm
"checks": [{
'type': uptrain.Anomaly.CONCEPT_DRIFT,
'algorithm': uptrain.DataDriftAlgo.DDM
}],
}
Next, let's see how we can utilize the UpTrain config
to initialize the UpTrain framework
and seamlessly observe and improve our ML models.
Last updated
Was this helpful?