t-SNE Dimensionality Reduction
Visualize your data with t-SNE
t-SNE (t-Distributed Stochastic Neighbor Embedding) is a popular technique for reducing the dimensionality of high-dimensional data into a two- or three-dimensional representation. t-SNE is often used for data visualization, as it can reveal underlying structures or patterns in the data that may not be apparent in the original high-dimensional space. It works by modeling similarities between data points in the high and lower-dimensional space and iteratively optimizing the mapping to minimize the difference between the two.
UpTrain supports t-SNE dimensionality reduction through the scikit-learn package. Here's how we define the config for t-SNE visualization for the text summarization example
tsne_visual =
{
'type': uptrain.Visual.TSNE,
"measurable_args": {
'type': uptrain.MeasurableType.INPUT_FEATURE,
'feature_name': 'bert_embs'
},
"label_args": {
'type': uptrain.MeasurableType.INPUT_FEATURE,
'feature_name': 'dataset_label'
},
# Hyperparameters for t-SNE
'dim': '2D',
'perplexity': 10,
# Frequency to Calculate t-SNE
'update_freq': 100,
}
Here, the parameters related to the dataset's features on which dimensionality reduction is applied are the same as in the case of UMAP. Further, t-SNE related hyperparameters, such as perplexity
, are the same as defined in the scikit-learn package.
Continuing our example reference from UMAP, the following is how the t-SNE visualization looks like for the text summarization example.

Similar to UMAP, we see that the embeddings corresponding to the wikihow dataset have a different distribution than the billsum training and testing dataset.
Last updated
Was this helpful?