Fraud Detection

Monitoring Performance of a Fraud Detection Model

Dataset and ML model: In this example, we train a binary classifier on a popular network traffic dataset called the NSL-KDD datasetarrow-up-right for cyber-attack classification using the XGBoost classifierarrow-up-right.

Problem: Once we train the cyber-attack classification model, it performs well initially, but later, the attackers catch up and change their manner of attacks, which causes our model predictions to go wrong.

Solution: Use the UpTrain framework to indentify the drift in model predictions (aka concept drift).

Divide the data into training and test sets

We use first 10% of the data to train and 90% of the data to evaluate the model in production

X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.1, test_size=0.9, shuffle=False)

Step 1: Train our XGBoost Classifier

# Train the XGBoost classifier with training data
classifier = XGBClassifier()
classifier.fit(X_train, y_train)

y_pred = classifier.predict(X_train)
print("Training accuracy: " + str(100*accuracy_score(y_train, y_pred)))

The above code prints the following output:

Training accuracy: 100.0

Woah! 😲πŸ”₯ The training accuracy is 100%. Let's see how long the model lasts in production.

Identifying Concept Drift

In this example, we implement two methods to identify concep drift:

  1. Use the popular concept drift detectection algorithm for binary tasks called the Drift Detection Method (DDM)arrow-up-right. DDM is implemented as a part of the UpTrain package.

  2. A custom drift metric that is defined by the user below. Specifically, the user wants to monitor the difference between accuracy of the model on the first 200 predictions and the most recent 200 predictions. This way, they can quickly identify if there was a sudden degradation in the model performance.

Step 2: Defining a Custom Monitor on the initial and most recent performance of the model

Step 3: Define the list of checks to perform on model

Here, we have two checks: concept drfit check with DDM algorithm and the customized check from above

Step 4: Define config and initialize the UpTrain framework

Step 5: Deploy the model in production and wait for alerts!

Console will print a message whenever drift is detected

As can be noted from above, our two drift monitors predict a drift around the timestamp of 111k

Verification of drifts with the UpTrain dashboard

The UpTrain framework automatically logs important metrics such as accuracy for the user to observe the performance of their models. The dashboard is currently integrated with streamlit and is launched automatically if st_logging is enables in streamlit.

Accuracy versus num_predictions

The following is a screenshot of average accuracy versus time from the dashboard. We can observe a data drift around the timestamp of 111k, which is also predicted by our drift monitors.

Custom Monitor

Finally, the users can also plot the customized metrics we defined earlier, which in this case were the initial accuracy of the model and the most recent accuracy.

Observe how the most recent accuracy of the model is far lower than the initial accuracy, implying that the attackers have learned to fool the model.

Last updated

Was this helpful?