Case study · AI system
How I built this portfolio's RAG
This portfolio doesn't talk about an AI system: it is one. A RAG chatbot over my experience, on a free, self-hosted stack.
Gemini 2.5 Flash
LLM (+ fallback)
pgvector
self-hosted vector store
768-dim
embeddings
Hono
API
The best way to show I can build applied-AI systems is for the portfolio to be one. The chatbot in the bottom right is a real RAG over my experience: ask it anything.
I built it with an extra constraint: a free, self-hosted stack, with no paid APIs and no managed vector service.
How it works#
The corpus is a set of markdown notes about me (path, projects, what drives me). An ingestion pipeline chunks them, embeds them with gemini-embedding-001 at 768 dimensions, and stores them in Postgres with pgvector.
On each question, I embed the query, retrieve the nearest chunks by cosine similarity (HNSW index), and pass them to Gemini as context so it answers grounded in facts, not hallucinations.
Ingestion: markdown → chunks → embeddings (768d) → chunks table in Postgres.
Retrieval: cosine similarity with an HNSW index (match_chunks function).
Generation: Gemini answers grounded in the retrieved context, in my voice.
Stack choices#
I chose Gemini with a fallback chain (2.5 Flash → 2.5 Flash-Lite → 2.0 Flash-Lite) because each model has its own free-tier quota: if one returns 429/503, the next takes over with backoff. Resilience at no cost.
For vectors, instead of a managed service I run Postgres + pgvector in Docker. It is portable, free, and enough for this corpus size.
The API is Hono (light, fast), serving responses over streaming SSE.
What it is and is not#
I'm honest about scope: this is a well-built RAG, not an enterprise LLMOps platform. It has no Langfuse observability, no dozens of automated evals, and no voice mode (I parked that on purpose).
What it shows is what matters for an applied-AI role: I understand embeddings, vector search, retrieval, and grounded prompting, and I ship it end to end.
I prefer a small system that actually works and that I can fully explain over a list of buzzwords.
Stack#
Frequently Asked Questions#
What model does the chatbot use?
Gemini 2.5 Flash, with a fallback chain to 2.5 Flash-Lite and 2.0 Flash-Lite to survive free-tier limits.
Where are the vectors stored?
In Postgres with the pgvector extension, self-hosted in Docker. HNSW index with cosine similarity.
Why a free stack?
To show you can build a solid RAG without paid APIs or managed services, and to keep it sustainable.