Embeddings — What It Is and Why It Matters

An embedding is a list of floating-point numbers (a vector) that encodes the semantic meaning of a piece of text. Two texts with similar meaning will have vectors that are close together in vector space; unrelated texts will be far apart.

**How they're generated**

Embedding models (like OpenAI's text-embedding-3 or Sentence Transformers) take text as input and output a fixed-length vector — typically 768 to 3,072 numbers. They're trained to push semantically similar texts close together.

**What "distance" means**

Cosine similarity is the most common measure: a score of 1.0 means identical direction (very similar meaning), 0 means perpendicular (unrelated). Dot product and Euclidean distance are also used.

**Core use cases**

*Semantic search*: Convert a query and a corpus of documents to embeddings, then rank documents by similarity to the query. Unlike keyword search, this matches meaning — "car" will match "automobile."

*RAG retrieval*: The first step in most RAG pipelines — embed your chunks, store them in a vector database, embed the query, and return the nearest neighbors.

*Clustering and classification*: Group similar content automatically or use embeddings as features for downstream ML models.

*Duplicate detection and recommendations*: Find near-duplicate content or recommend similar articles.

**Pitfalls**

Embeddings capture semantics but not factual correctness. Two contradictory sentences can have similar embeddings if they discuss the same topic. Embeddings are also language-agnostic only to a degree — cross-lingual models vary in quality. And embedding quality depends heavily on the model used; switching models requires re-embedding your entire corpus.