Ride Time Estimation
Monitoring a Trip Time prediction model
Last updated
Was this helpful?
Monitoring a Trip Time prediction model
Last updated
Was this helpful?
Overview: In this example, we consider a regression task where we want to predict the trip time given the trip distance, number of passengers, booking date, vendor ID, etc.
Dataset: We use the from Kaggle, where the data contains features such as vendor_id, pickup_datetime, passenger_count, pickup_location, drop_location, etc., and the trip durations (in seconds). We want to train and ML model that takes the input features and predicts the trip duration.
Monitoring: In this notebook, we will see how we can use UpTrain package to monitor model accuracy, run data integrity checks, and add SHAP in UpTrain dashboard to explain model predictions.
Let's see how the input looks like
306046
1
1
0
0.006330
2016
5
6
4
12
22
21
253777
1
1
0
0.020341
2016
4
15
4
14
2
12
882330
1
1
0
0.116831
2016
3
18
4
9
26
43
Check the training MAPE (Mean Absolute Percentage Error) of our model
Next, we define monitors over our ML model in test/production with UpTrain.
Accuracy monitors
We define Mean Absolute Error (MAE) and Mean Absolute Percentage Error (MAPE) accuracy monitors for our regression task
SHAP explanability
Data integrity monitor
We can also define data integrity checks over our test data. One obvious check is that the number of passengers should >= 1.
As we can notice from the dashboard, we get two plots for SHAP explainability. The plot on the left is the average weight of each feature. As expected, the feature "dist" (that represents the distance of the trip) has the biggest impact on the trip duration prediction. Moreover, pickup hour and taxi vendor id also somewhat affect the trip duration.
On the right, we can see how the model arrives at the prediction for any data-point (in this case ID id1787379
). Due to the low distance of the trip, the feature "dist" contributed -256.06
to the overall trip duration.
We have added a small illustration on how it looks at the UpTrain dashboard below.
The following is how the MAE and MAPE accuracies are evolving with time on the UpTrain dashboard.
We added a data integrity check for feature passenger_count >=1. We observe that data integrity is maintained through the test dataset (as data integrity is close to one)
SHAP (SHapley Additive exPlanations) is a game theoretic approach to explain the ML model predictions and it available as a . Through UpTrain dashboard, we will use SHAP to explain our model's preditions.