Problem Statement
People constantly field the same questions — recruiters, founders, creators, mentors all repeat their story, opinions, and expertise dozens of times. There's no good way to create a faithful, always-available "version of you" that others can interrogate. A naive chatbot over a bio is shallow and hallucination-prone; the hard part is building a structured, searchable knowledge base of a person and grounding answers in it.
Proposed Solution
AskAboutMe lets a user build an AI digital knowledge twin. Through an adaptive onboarding questionnaire (questions that branch based on prior answers), the system constructs a comprehensive knowledge base; visitors then ask questions through a chat interface and receive answers grounded in that base — as if talking to the user directly. Semantic search (vector embeddings) matches questions to the most relevant stored knowledge.
Full Solution Details
- AI-powered onboarding — a dynamic questionnaire that adapts based on responses to extract a rich profile.
- Semantic search — vector embeddings (pgvector) for intelligent question→answer matching, so phrasing variations still hit the right knowledge.
- Background processing — async normalization of raw Q&As into structured, searchable data via a custom queue.
- Three-tier system — Free, Pro, Business access levels.
- Rate limiting — per-user and site-wide.
- Admin panel — user management, tier configuration, and job monitoring.
Technical Documentation
Node 18+ + Express + TypeScript on PostgreSQL 15 with the pgvector extension. The knowledge pipeline: raw onboarding answers are enqueued and a custom database-backed queue normalizes them into structured Q&As and computes embeddings (OpenAI text-embedding-3-small); at query time, the visitor's question is embedded and matched via pgvector similarity, and GPT-4 generates a grounded answer from the top matches. Caching uses in-memory node-cache. Notably, the queue is database-backed rather than Redis — durable jobs without extra infrastructure. Dockerized (Docker Compose) for Postgres+pgvector.
Tech Stack
Node.js, Express, TypeScript, PostgreSQL 15 + pgvector, OpenAI (GPT-4 + text-embedding-3-small), node-cache, custom DB-backed queue, Docker.
System Design
User ──adaptive onboarding (branching Qs)──► raw answers
│ enqueue
▼
Custom DB-backed queue ──► normalize into structured Q&As
│ │ embed (text-embedding-3-small)
▼ ▼
PostgreSQL 15 + pgvector ◄── store Q&A + vector
▲
│ similarity search (top-k)
Visitor question ──embed──► match ──► GPT-4 grounded answer (chat)
Tiers (Free/Pro/Business) · per-user + global rate limits · Admin (users, tiers, jobs)
Smart Architectural Decisions
- Retrieval-grounded answers (RAG) via pgvector. Embedding stored knowledge and matching by similarity means answers are grounded in the user's actual data, not the model's imagination — the correct architecture for a faithful "twin" and the antidote to hallucination.
- Database-backed queue instead of Redis. Durable, inspectable background jobs (normalization, embedding) without standing up Redis — a pragmatic infra-minimizing choice (echoed in his other backends) that the admin panel can monitor.
- Adaptive onboarding as the data-quality lever. Branching questions extract a richer, more structured knowledge base than a static form — better input means better retrieval.
- Normalize-then-embed pipeline. Converting raw answers into clean structured Q&As before embedding improves match quality and keeps the vector store meaningful.
- Tiers + rate limits + admin show it was built as a real multi-tenant SaaS, not a demo.
Impacts
A production-shaped platform that converts a person into an interrogable, retrieval-grounded AI knowledge base — with the right RAG architecture (pgvector + normalization), durable background processing, and SaaS scaffolding (tiers, limits, admin).
Demonstrated Skills
RAG / vector-search engineering (pgvector, embeddings, similarity retrieval, grounding); LLM application architecture (adaptive questioning, normalization, GPT-4 answer generation); background-job systems (custom durable DB-backed queue); PostgreSQL + extensions; multi-tenant SaaS concerns (tiers, rate limiting, admin); Docker; TypeScript/Express.