← Back to all workshops

Workshop 1: Understanding Embeddings

Visualize word embeddings in 2D space

What are embeddings?

Word embeddings represent words as vectors in high-dimensional space. Words with similar meanings end up close together. We'll use a tool that projects these vectors onto 2D using PCA so we can see the relationships.

We will be working using the OpenAI Embeddings API. Even though we are working at the word level, the embeddings are computed for full sentences or documents. This may cause some words to be less representative of their individual meaning.

Note: Keep in mind that what you see in 2D is a projection of a much higher-dimensional space, so some relationships may be distorted.

Exercise 1: Two Categories

  1. Open the Embeddings Visualization Tool and enter the OpenAI API key
  2. Pick 3 categories of words (e.g., fruits, vehicles, emotions, colors, animals, professions, etc)
  3. Enter 10 words from the first category and click the Reset button to rearrange the visualization
  4. Repeat for the second and third category of words
  5. Observe how words from the same category cluster together

Example: Try "apple, banana, orange, grape, mango" vs. "car, truck, bike, plane, boat"

Exercise 2: Fill the Gap

  1. Now try to find words that would appear between your clusters
  2. What kind of words bridge the gap? Why?

Hint: Think about words that share properties with both categories, or abstract concepts that relate to both.

Questions to Consider