UpTrain Statistics 👨🔬
Calculate standard (and custom) metrics over your data
Distances
UpTrain provides various measures to analyze the distance of embeddings. These measures can help in determining whether the model is overfitting or underfitting, and how much the embeddings are changing over time. UpTrain currently supports the following measures for measuring distances between data points:
Norm ratio: This measure calculates the ratio of the current embeddings' norm to the initial embeddings' norm. If the norm ratio is close to 1, it indicates that the embeddings have not changed much from the initial embeddings. A large change in the norm ratio can indicate that the model is overfitting or underfitting.
L2-norm distance: This measure calculates the L2-norm distance between the current and initial embeddings.
Cosine distance: This measure calculates the cosine distance between the current and initial embeddings. Unlike the norm ratio and L2-norm distance, cosine distance considers the direction of the embeddings.
The following is how we can define the check for the distance between embeddings
Here, model_args
defines the models that we want to compare, and the reference
is the initial embedding (another option is the running difference). Further, we calculate all three distance_types
defined above.
In the following figure, we check the comparison between the two model_types
in terms of cosine distance from the initial embedding, and note that for the realtime model, the learning happens much earlier compared to the batch model.
In addition to the above measures, UpTrain also provides the running difference of the embeddings. This is the difference between the current embeddings and the previous embeddings. By analyzing the running difference, we can determine how much the embeddings change over time. A large change in the running difference can indicate that the model is experiencing significant changes in the input data or that the optimization process is unstable.
Overall, these measures can help in analyzing the stability and performance of the model's embeddings over time. By monitoring these measures, we can detect issues such as overfitting, underfitting, or instability, and take corrective actions to improve the model's performance.
Convergence analysis
UpTrain also provides convergence analysis for embeddings, a technique for evaluating the performance of an embedding algorithm, and involves measuring how well the embeddings converge as the algorithm iterates. UpTrain provides several methods for conducting convergence analysis on embeddings, including visualization tools and metrics that can be used to evaluate the quality of the embeddings.
The following is how we can define the config to check for convergence statistics:
In our dashboard, we observe that at time 100k, the norm ratio for embeddings generated by the batch model is higher, implying that there is a greater popularity bias.
Last updated
Was this helpful?