t-SNE Dimensionality Reduction

Visualize your data with t-SNE

t-SNE (t-Distributed Stochastic Neighbor Embedding) is a popular technique for reducing the dimensionality of high-dimensional data into a two- or three-dimensional representation. t-SNE is often used for data visualization, as it can reveal underlying structures or patterns in the data that may not be apparent in the original high-dimensional space. It works by modeling similarities between data points in the high and lower-dimensional space and iteratively optimizing the mapping to minimize the difference between the two.

UpTrain supports t-SNE dimensionality reduction through the scikit-learn package. Here's how we define the config for t-SNE visualization for the text summarization example

tsne_visual =    
{
    'type': uptrain.Visual.TSNE,
    "measurable_args": {
        'type': uptrain.MeasurableType.INPUT_FEATURE,
        'feature_name': 'bert_embs'
    },
    "label_args": {
        'type': uptrain.MeasurableType.INPUT_FEATURE,
        'feature_name': 'dataset_label'
    },
    # Hyperparameters for t-SNE
    'dim': '2D',
    'perplexity': 10,
    # Frequency to Calculate t-SNE 
    'update_freq': 100,
}

Here, the parameters related to the dataset's features on which dimensionality reduction is applied are the same as in the case of UMAP. Further, t-SNE related hyperparameters, such as perplexity, are the same as defined in the scikit-learn package.

Continuing our example reference from UMAP, the following is how the t-SNE visualization looks like for the text summarization example.

Similar to UMAP, we see that the embeddings corresponding to the wikihow dataset have a different distribution than the billsum training and testing dataset.

PreviousUMAP Visualization NextUpTrain Statistics 👨‍🔬

Last updated 2 years ago

Was this helpful?