Problem Statement
Nigeria's Tax Act 2025 takes effect on 1 January 2026 and rewrites personal income tax: the Consolidated Relief Allowance is abolished and replaced by a capped rent relief, a new six-band progressive table applies, and the first ₦800,000 of taxable income becomes tax-free. Ordinary salary earners, freelancers, and mixed-income workers have no easy, trustworthy way to see what their tax actually becomes under the new rules — or how it differs from what they paid before. Generic AI chatbots will happily hallucinate a tax figure, which is exactly the wrong tool for a question where a wrong number has real consequences.
Proposed Solution
A single visitor-facing web tool (no account, no stored data) with a four-step flow: input income (sample data, manual entry, or a bank-statement PDF) → see a full NTA 2025 tax position (bands, reliefs, effective rate, take-home) → see a side-by-side comparison against the old PITA regime in plain language ("you save ₦X" / "you pay ₦X more") → ask grounded follow-up questions answered only from the already-computed numbers, with statute citations. Every number on screen comes from a deterministic tax engine, never from a model.
Full Solution Details
- Module 1 — Income input: three paths (sample / manual / PDF upload) and a required profile type (salary earner, freelancer, mixed). Bank-statement PDFs (3–12 months) are parsed by a two-tier AI pipeline that extracts credits, classifies them (salary / business / transfer / other), annualises income, and lets the user confirm or reclassify before anything is computed.
- Module 2 — Tax position: exemption status, annual liability, effective rate, monthly estimate, take-home (annual and monthly), and a band-by-band visual breakdown with every applied relief itemised.
- Module 3 — What changed for you: old PITA vs new NTA 2025 computed in the same engine, net change in plain language, and only the reform points relevant to the user.
- Module 4 — Grounded AI panel: natural-language follow-ups constrained to personal income tax under NTA 2025; refuses out-of-scope (VAT, corporate, business) questions; cites the relevant section with an inline snippet; can never produce a tax number the calculator didn't already produce; always ends with a 'not tax advice' disclaimer.
- Module 5 — How it works / About: methodology, statute references, and the build rationale (scope chosen, what was cut, v2 plans).
Technical Documentation
The backend is a stateless-by-design Express API under /api/v1 with a strict response envelope ({ data } for success; flat { errorCode, errorMessage, type, field } for errors). Clients key off a numeric errorCode (1001–1009), never on message text. Validation surfaces exactly one field error at a time (shallowest path, then alphabetical). All money is integer kobo (1 NGN = 100 kobo); decimals are rejected.
Key endpoints:
POST /tax/compute— stateless NTA 2025 position for an income declaration.POST /tax/compare— both regimes plus the net change and relevant reforms.POST /statement/parse— multipart PDF upload; returns an opaque 8-digitcodeand202 Acceptedimmediately, then runs the parse pipeline in the background.GET /statement/:code/events— Server-Sent Events stream that pushes astatusevent on every pipeline transition (emits current state on connect, ~15s heartbeat, closes on a terminal status).GET /statement/:code— durable polling fallback for the same status.POST /statement/:code/recompute— user reclassifies which inflows count as income; gross is recomputed, the engine re-runs, and the corrected numbers are persisted so the AI panel stays grounded.POST /ai/ask— grounded follow-up keyed bycode; continues the OpenAI conversation from the analysis turn.
Processes are ephemeral: a reaper job deletes each process and its audit one hour after the last interaction, after which :code endpoints return 404.
Tech Stack
- Monorepo: Nx 22 + pnpm workspaces.
packages/core(pure TS, depends on nothing) →packages/api(browser client, depends on core) andpackages/ui(design system, depends on core). Apps never import from another app. - Frontend (
apps/taxlens-web): Vite 6, React 19, React Router 6, TanStack Query 5, Tailwind CSS 3, ky-based API client. Ships a Storybook-lite/previewdesign-system catalogue. - Backend (
apps/main-backend): Express 4, MongoDB 7 driver, OpenAI SDK, Zod validation, Pino structured logging, Multer for PDF multipart, Helmet + CORS + compression. - AI: OpenAI Responses API — gpt-4o-mini as a cheap validation gate, gpt-4o for extraction/analysis, with structured (schema-validated) outputs and threaded
previousResponseIdfor conversation continuity. - Language: TypeScript 5 end to end.
System Design
apps/taxlens-web (Vite + React 19)
landing -> income -> result -> compare -> AI
ky client . TanStack Query . EventSource (SSE)
|
| /api/v1 (JSON . multipart . SSE)
v
apps/main-backend (Express, stateless)
Zod validate . envelope . requestId . Pino
/tax /statement /ai/ask
| | async pipeline |
| pending -> validating -> |
| analyzing -> ready / |
| needs_review / failed |
| | (SSE + poll) |
v v v
@taxlens/core engine LLM client
(the ONLY source of circuit breaker
every tax number) + audit log
| |
v v
MongoDB 7 OpenAI API
process + llm_audit; gate: gpt-4o-mini
1h reaper analysis: gpt-4o
chat: grounded
Smart Architectural Decisions
- The LLM never produces a tax number. A pure, framework-free TypeScript engine (
@taxlens/core) owns every figure; the model is confined to extracting and classifying bank-statement inflows. The AI follow-up panel can only explain numbers the calculator already produced. This is the core trust decision of the whole product. - Two-tier AI pipeline as a cost firewall. A cheap gpt-4o-mini 'gate' first decides whether the upload is even a usable Nigerian bank statement; only valid documents reach the expensive gpt-4o analysis. Invalid input fails fast without burning the costly call.
- Shared pure-TS engine runs on both client and server. Because
corehas zero I/O dependencies, the same calculator powers the stateless backend endpoint and a client-side estimate — one source of truth, no drift. - Stateless, ephemeral, privacy-first. No accounts, no stored user data; processes are keyed by an opaque 8-digit code and reaped one hour after the last interaction. No raw statement text or PDF bytes are ever logged.
- Resilience around the AI dependency: a classic 3-state circuit breaker (closed → open → half-open) fast-fails after consecutive OpenAI failures; an
llm_auditcollection records model, tokens, latency, prompt hash, and circuit state for every call. - Contract failure vs outage are distinct. A model turn that fails schema validation is repaired once then surfaced as
422/1008(rephrase and retry) — explicitly separated from a real upstream outage (503/1007) so non-conforming output doesn't masquerade as downtime or trip the breaker. needs_reviewover a false 'exempt'. If extraction reads every credit as a transfer (gross 0 while inflows sum > 0), the process lands onneeds_reviewand routes the user to reclassify, rather than serenely declaring them tax-exempt.- Deterministic LLM stub transport for CI. An
LLM_MODE=stubseam lives entirely inside the OpenAI client and runs the same breaker, audit, and schema-validation path as the real transport, with steering levers (reject/fail/refuse/nonconforming/all-transfer) — so AI behaviour is testable without hitting the network. Rejected at boot in production. - Money as integer kobo. All amounts are integer kobo end to end; decimals are rejected at the boundary, eliminating floating-point drift in financial math.
Impacts
Turns a high-stakes, hallucination-prone question ("what is my tax under the new law?") into a trustworthy, auditable answer by structurally separating computation from generation: a deterministic statute-sourced engine produces every number and the LLM is fenced into extraction and explanation only. Delivered as a clean Nx + pnpm monorepo with a privacy-first stateless backend, SSE-driven live status, a circuit breaker, full structured-output validation, and a deterministic stub transport for CI.
Demonstrated Skills
- Monorepo architecture and dependency discipline (Nx + pnpm, layered
core/api/uipackages with enforced direction). - Domain modelling of real tax legislation (NTA 2025 bands and reliefs vs legacy PITA) in a pure, testable engine.
- Production-grade LLM integration: tiered models, structured/schema-validated outputs, conversation threading, prompt/auditing, and grounding constraints.
- Backend resilience and reliability engineering: circuit breaker, contract-vs-outage error taxonomy, idempotency, request-id correlation, structured logging.
- Real-time UX with Server-Sent Events plus a durable polling fallback and an idle-expiry reaper.
- API design rigour: stable numeric error codes, strict one-field-at-a-time validation, integer-kobo money handling, strict request bodies.
- Privacy-by-design: no accounts, no persisted user data, no raw statement content logged.
- Testability engineering: a swappable deterministic LLM transport that preserves the full production code path.