Vector Databases 101: What They Are, Why They Matter, How to Start
đ Welcome to this edition
In every issue, we explore one concept thatâs shaping the future of technologyâbreaking it down in plain English, with examples you can actually try.
This time, weâre looking at Vector Databasesâthe engines behind smarter search, better recommendations, and AI-powered apps.
Whether itâs Netflix suggesting the right show or Google Photos recognising faces, vector databases are doing the heavy lifting.
Letâs unpack what they are, why they matter, and how you can get started.
Introduction: Why Vector Databases Matter
Think about how Generative AI works.
You ask ChatGPT: âExplain quantum physics like Iâm 12.â
Or you upload a picture and ask: âWrite a caption for this.â
The magic isnât just that the AI âknows everything.â
Itâs that it can understand meaning and retrieve context before generating an answer.
Now, hereâs the secret:
Behind every GenAI system â from ChatGPT to recommendation engines on YouTube, Flipkart, or Netflix â thereâs usually a vector database working silently in the background.
Why? Because Large Language Models (LLMs) donât search the internet live. Instead, they work with embeddings â numerical lists that capture the essence of text, audio, or images.
âbudget phone with good cameraâ becomes a vector.
âaffordable smartphone with quality photosâ becomes another vector.
Even if the words are different, the vectors are close. That closeness is how AI understands meaning instead of just words.
And where do these embeddings live?
In vector databases â designed specifically to store, index, and search millions (or billions) of such vectors in milliseconds.
Thatâs why GenAI needs vector databases.
Without them, LLMs would generate hallucinations far more often, recommendations would be random, and search would feel like the early 2000s again.
Why learn Vector Databases now?
AI models (like GPT, BERT, CLIP) create embeddings â number lists that represent meaning. A vector database is designed to store these embeddings and quickly find the âclosestâ ones.
Learning vector databases matters because they power:
Chatbots with memory (retrieving past conversations).Recommendation engines (finding similar movies, songs, or products).Image and audio search (finding content by similarity).Research tools (semantic search across millions of documents).
In short: if you want to build AI apps that feel smart, vector databases are the missing piece.
Core Concepts
Here are 10 essential ideas
Vector
A vector is just a list of numbers that represent something.
Example: the word âAppleâ â [0.21, -0.33, 0.95], Bananaâ â [0.22, -0.31, 0.90].
2 .Embedding
An embedding is how we turn text, images, or audio into vectors.
Example: âdogâ and âpuppyâ get embeddings that are close together.
Cosine Similarity
It measures the angle between two vectors. If they point the same way, theyâre similar.
Example: [1,0] vs [2,0] â similarity = 1 (very similar).Euclidean Distance
The straight-line distance between vectors. Closer = more similar.
Example: [0,0] vs [1,1] â distance â 1.41.Index
An index is a shortcut that makes search faster.
Example: like an index in a book that tells you where to look, instead of flipping every page.ANN (Approximate Nearest Neighbor)
Instead of searching everything, ANN finds close-enough matches quickly.
Example: instead of checking every shop, you ask locals where to buy milk.HNSW (Hierarchical Navigable Small World)
A graph-based index that connects vectors like cities on a map. You âtravelâ through shortcuts to reach the nearest one.Metadata Filtering
Search vectors but also filter by tags.
Example: âphones like iPhone 13 but under âš40,000.âHybrid Search
Mix keyword search and vector search for best results.
Example: search âbudget laptopâ using both text match + meaning match..
From Embeddings to Search: The Full Journey
How Vector Search Works (End to End)
Above is the beginner-friendly pipeline.
Read it once, then skim it again with the diagram.
Collect data
Text, images, audio, or mixed. For your first project, start with text (notes, product titles, FAQs).Clean and chunk
Break long text into small chunks (300â500 tokens).
Reason: small chunks keep the meaning tight and search precise.Embed
Use an embedding model to convert each chunk into a vector (a list of numbers).
Tip: start withall-MiniLM-L6-v2(fast, free, CPU-friendly).Attach metadata
Store the original text plus extra fields (title, URL, tags, price, topic).
Reason: youâll filter and display results using this info.Store vectors
Keep the vectors and metadata in a vector database (or FAISS locally).
This is your searchable âmeaning index.âChoose an index
Small data: Flat (brute force).
Large data: HNSW or IVF (fast, approximate).
Very large data: IVF + PQ (cluster + compress).Build the index
Create the structure once you have enough data (e.g., >50k vectors).
Indexing makes queries much faster.Embed the query
User types a question. Convert it into a vector using the same model you used for your data.Search (k-NN)
Ask for the top k nearest neighbors (k=5 or k=10).
Distance metric: cosine similarity for text (normalize first).Return results
Show text, titles, links, and scores. Keep it simple and readable.
Vector Database Architecture (At a Glance)
A vector database is built in layers, like an onion. Each layer has a clear role, and together they make search fast, scalable, and reliable.
Client Query â Execution Engine
When you type a query (like âbudget phone with good cameraâ), it first goes to the query engine. This engine decides how to run the search.Indexing Layer
Vectors are stored in a way that makes them easy to find later. Think of it as building a special âmapâ of all stored vectors.Hybrid Search Module
Combines keyword search (traditional database style) with vector search (semantic search). This means results can match both exact words and meaning.Storage Layer
Keeps all the data safe and organized. Works hand in hand with indexing to allow quick lookups.Sharding & Replication
Large datasets are split across many servers (sharding). Each shard has backups (replication) for reliability.Compression Techniques
Vectors can be very large. Compression helps reduce space while keeping accuracy.Scalability Features
Parallel processing makes queries faster.
Failover mechanisms keep the system running if a server fails.
Optimized disk usage reduces storage costs.
đ Together, these layers ensure that when you search, results come back quickly, even if billions of vectors are stored.
Indexing Techniques
Think of indexing as creating shortcuts in a big city so you donât visit every house to find a friend.
Flat / Brute Force
Checks every vector. Exact and simple.
Great for small sets (â¤50k).
Slow for millions of items.
HNSW
A graph of vectors with âhighwaysâ and âstreets.â
You jump via hubs to reach close neighbors quickly.
Very fast and high recall; popular default for large text search.
IVF (Inverted File Index)
First cluster vectors into âbuckets.â
During search, probe only the most relevant buckets.
Faster than flat; recall depends on how many buckets you probe.
PQ (Product Quantization)
Compress vectors to save memory.
Good when your data is huge and RAM is a constraint.
Small drop in accuracy for a big win in cost/latency.
IVF + PQ (cluster + compress)
Combine IVF and PQ for massive scale.
Common in production to balance speed, memory, and quality.
Rules of thumb
â¤50k vectors â Flat is fine.
50kâ5M â HNSW or IVF.
5M or tight RAM â IVF + PQ.
Retrieval Strategies (How You Actually Search)
Exact vs Approximate
Exact (Flat): perfect results, slower at scale.
Approximate (HNSW/IVF): very fast, almost as good.
Most apps use approximate methods.
Cosine vs Dot vs L2
Text embeddings: cosine is a safe default.
Normalize vectors â dot product â cosine.
L2 is okay but less common for text.
Filtering
Use metadata filters before or after search.
Example:
(brand = "Samsung") AND (price <= 15000).
Hybrid retrieval
Combine keyword scores (BM25) with vector scores.
Why: some queries need exact terms and meaning.
Simple tactic: normalize both scores and add them with weights.
Reranking (optional)
Apply a second pass to reorder candidates: freshness, click-rate, a tiny reranker model.
Use when âgood enoughâ isnât good enough
.
Mini Case Study â âStudy Buddyâ Notes Search
Goal: Help a student find relevant notes fast and suggest what to read next.
Inputs
300 notes across topics (arrays, graphs, SQL).
Each note ~2â4 paragraphs.
Metadata: topic, week, difficulty.
Processing
Chunk each note to ~350 tokens, 50-token overlap.
Embed chunks with
all-MiniLM-L6-v2.Store vectors + metadata in a vector DB.
Build HNSW index (fast, high recall).
Build a tiny keyword index too (BM25) for hybrid scoring.
Query flow
User asks: âfast way to find an item in a sorted list.â
Embed query â vector search (k=10).
Filter by topic = âsearchingâ (optional).
Hybrid: mix BM25 score (notes that say âbinary searchâ) + vector score (notes about âfind in sortedâ).
Rerank by difficulty near userâs level.
Outputs
Top 3 notes (âBinary Search Basicsâ, âCommon pitfallsâ, âLower/Upper Boundâ).
âNext upâ recommendation: âSearch in rotated arrayâ (difficulty +1).
Why this beats keywords
âfast way to findâ â âbinary searchâ even if not named.
Hybrid ensures exact terms (if present) still help.
Comparison Table â SQL vs NoSQL vs Vector DB
*FAISS is a local library, not a server DB, but it powers many vector systems under the hood.
Full Working Example â Text â Embeddings â Vector Search
This example uses sentence-transformers for embeddings and FAISS for search. It runs on CPU.
# pip install sentence-transformers faiss-cpu numpy
from sentence_transformers import SentenceTransformer
import numpy as np
import faiss
# 1) Data: short, clear sentences (chunks)
docs = [
"Binary search finds an item in a sorted array in O(log n).",
"Bubble sort repeatedly swaps adjacent elements to sort a list.",
"Hash tables store key-value pairs for near O(1) lookup.",
"Dijkstra's algorithm finds shortest paths in weighted graphs.",
"The two-pointer technique scans from both ends efficiently.",
"Depth-first search explores as far as possible along a branch.",
"A stack uses LIFO order; a queue uses FIFO order.",
"Merge sort divides, sorts, and merges in O(n log n).",
"A heap supports efficient retrieval of the min or max element.",
"Dynamic programming breaks problems into overlapping subproblems."
]
# 2) Embed with MiniLM (works well on CPU)
model = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2")
emb = model.encode(docs, convert_to_numpy=True, normalize_embeddings=True)
# normalize_embeddings=True lets us use dot product as cosine similarity
# 3) Build a FAISS index for inner product (cosine on normalized vectors)
dim = emb.shape[1]
index = faiss.IndexFlatIP(dim) # IP = inner product
index.add(emb) # store all document vectors
# 4) Query â embed â search
def search(query, k=3):
qv = model.encode([query], convert_to_numpy=True, normalize_embeddings=True)
scores, idx = index.search(qv, k)
print(f"\nQuery: {query}")
for rank, (i, score) in enumerate(zip(idx[0], scores[0]), 1):
print(f"{rank}. {docs[i]} [score={score:.3f}]")
# 5) Try a few queries
search("fast way to find in a sorted list", k=3) # expect Binary Search
search("shortest route between cities", k=3) # expect Dijkstra
search("keep largest element on top", k=3) # expect heap
What youâll see
The first query should return the binary search sentence on top, even though you didnât say âbinary search.â
The second query should return the Dijkstra sentence.
The third should return the heap sentence.
Why this matters
You just built a tiny semantic search system: text â embeddings â vector index â nearest neighbors.
Retrieval Patterns Youâll Actually Use
Pure semantic
Only vector search.
Good for fuzzy, intent-heavy queries (âsuggest similar papersâ).
Semantic + filters
Vector search + metadata filter (brand, price, topic).
Good for catalogs, notes, knowledge bases.
Hybrid (keyword + vector)
Combine BM25 (keywords) and vectors.
Good when specific terms and meaning both matter.
Reranking
Take top 50 results, then apply a lightweight reranker.
Good for premium UX where top 5 must feel perfect.
Use Cases in the Real World (Concrete Pipelines)
Chatbot Memory & RAG (Retrieval-Augmented Generation)
Data: chunks of documents/FAQs.
Flow: chunk â embed â store â query embed â k-NN â pass retrieved chunks to LLM â answer.
Benefit: answers grounded in your data; fewer hallucinations.
E-commerce âSimilar Itemsâ
Data: product titles, descriptions, images.
Flow: embed text/images â store with price/brand tags â when user views a product, search nearest vectors â filter by availability/price â show âYou may also like.â
Benefit: intent-aware recommendations beyond keywords.
Image Search by Example
Data: images; compute image embeddings (e.g., CLIP).
Flow: user uploads a photo â embed â k-NN â return visually similar images.
Benefit: find things you canât describe with words.
Reader Q&A (Quick)
Do I need a GPU?
No. For MiniLM embeddings and FAISS, CPU is fine for small/medium projects.
Which vector DB should I start with?
Local experiments: FAISS. For a hosted DB with APIs: Qdrant, Weaviate, or Milvus.
Can I mix embeddings from different models?
Avoid mixing in the same index. Different models map meaning differently; results degrade.
Hope you enjoyed reading this article.
If you found it valuable, hit a like and consider subscribing for more such content every week.
If you have any questions or suggestions, leave a comment.
This post is public so feel free to share it.
Subscribe for free to receive new articles every week.
Thanks for reading Rockyâs Newsletter ! Subscribe for free to receive new posts and support my work.
I actively post coding, system design and software engineering related content on
Spread the word and earn rewards!
If you enjoy my newsletter, share it with your friends and earn a one-on-one meeting with me when they subscribe. Let's grow the community together.
I hope you have a lovely day!
See you soon,
Rocky









Thanks for the breadth and depth of recursion for the good đ