TaxLens

Problem Statement

Nigeria's Tax Act 2025 takes effect on 1 January 2026 and rewrites personal income tax: the Consolidated Relief Allowance is abolished and replaced by a capped rent relief, a new six-band progressive table applies, and the first ₦800,000 of taxable income becomes tax-free. Ordinary salary earners, freelancers, and mixed-income workers have no easy, trustworthy way to see what their tax actually becomes under the new rules — or how it differs from what they paid before. Generic AI chatbots will happily hallucinate a tax figure, which is exactly the wrong tool for a question where a wrong number has real consequences.

Proposed Solution

A single visitor-facing web tool (no account, no stored data) with a four-step flow: input income (sample data, manual entry, or a bank-statement PDF) → see a full NTA 2025 tax position (bands, reliefs, effective rate, take-home) → see a side-by-side comparison against the old PITA regime in plain language ("you save ₦X" / "you pay ₦X more") → ask grounded follow-up questions answered only from the already-computed numbers, with statute citations. Every number on screen comes from a deterministic tax engine, never from a model.

Full Solution Details

Module 1 — Income input: three paths (sample / manual / PDF upload) and a required profile type (salary earner, freelancer, mixed). Bank-statement PDFs (3–12 months) are parsed by a two-tier AI pipeline that extracts credits, classifies them (salary / business / transfer / other), annualises income, and lets the user confirm or reclassify before anything is computed.
Module 2 — Tax position: exemption status, annual liability, effective rate, monthly estimate, take-home (annual and monthly), and a band-by-band visual breakdown with every applied relief itemised.
Module 3 — What changed for you: old PITA vs new NTA 2025 computed in the same engine, net change in plain language, and only the reform points relevant to the user.
Module 4 — Grounded AI panel: natural-language follow-ups constrained to personal income tax under NTA 2025; refuses out-of-scope (VAT, corporate, business) questions; cites the relevant section with an inline snippet; can never produce a tax number the calculator didn't already produce; always ends with a 'not tax advice' disclaimer.
Module 5 — How it works / About: methodology, statute references, and the build rationale (scope chosen, what was cut, v2 plans).

Technical Documentation

The backend is a stateless-by-design Express API under /api/v1 with a strict response envelope ({ data } for success; flat { errorCode, errorMessage, type, field } for errors). Clients key off a numeric errorCode (1001–1009), never on message text. Validation surfaces exactly one field error at a time (shallowest path, then alphabetical). All money is integer kobo (1 NGN = 100 kobo); decimals are rejected.

Key endpoints:

POST /tax/compute — stateless NTA 2025 position for an income declaration.
POST /tax/compare — both regimes plus the net change and relevant reforms.
POST /statement/parse — multipart PDF upload; returns an opaque 8-digit code and 202 Accepted immediately, then runs the parse pipeline in the background.
GET /statement/:code/events — Server-Sent Events stream that pushes a status event on every pipeline transition (emits current state on connect, ~15s heartbeat, closes on a terminal status).
GET /statement/:code — durable polling fallback for the same status.
POST /statement/:code/recompute — user reclassifies which inflows count as income; gross is recomputed, the engine re-runs, and the corrected numbers are persisted so the AI panel stays grounded.
POST /ai/ask — grounded follow-up keyed by code; continues the OpenAI conversation from the analysis turn.

Processes are ephemeral: a reaper job deletes each process and its audit one hour after the last interaction, after which :code endpoints return 404.

Tech Stack

Monorepo: Nx 22 + pnpm workspaces. packages/core (pure TS, depends on nothing) → packages/api (browser client, depends on core) and packages/ui (design system, depends on core). Apps never import from another app.
Frontend (apps/taxlens-web): Vite 6, React 19, React Router 6, TanStack Query 5, Tailwind CSS 3, ky-based API client. Ships a Storybook-lite /preview design-system catalogue.
Backend (apps/main-backend): Express 4, MongoDB 7 driver, OpenAI SDK, Zod validation, Pino structured logging, Multer for PDF multipart, Helmet + CORS + compression.
AI: OpenAI Responses API — gpt-4o-mini as a cheap validation gate, gpt-4o for extraction/analysis, with structured (schema-validated) outputs and threaded previousResponseId for conversation continuity.
Language: TypeScript 5 end to end.

System Design

  apps/taxlens-web  (Vite + React 19)
  landing -> income -> result -> compare -> AI
  ky client . TanStack Query . EventSource (SSE)
        |
        |  /api/v1  (JSON . multipart . SSE)
        v
  apps/main-backend  (Express, stateless)
  Zod validate . envelope . requestId . Pino
   /tax        /statement            /ai/ask
     |             |  async pipeline     |
     |        pending -> validating ->   |
     |        analyzing -> ready /        |
     |        needs_review / failed       |
     |             |  (SSE + poll)        |
     v             v                      v
  @taxlens/core engine            LLM client
  (the ONLY source of            circuit breaker
   every tax number)             + audit log
        |                              |
        v                              v
     MongoDB 7                     OpenAI API
     process + llm_audit;          gate: gpt-4o-mini
     1h reaper                     analysis: gpt-4o
                                   chat: grounded

Smart Architectural Decisions

The LLM never produces a tax number. A pure, framework-free TypeScript engine (@taxlens/core) owns every figure; the model is confined to extracting and classifying bank-statement inflows. The AI follow-up panel can only explain numbers the calculator already produced. This is the core trust decision of the whole product.
Two-tier AI pipeline as a cost firewall. A cheap gpt-4o-mini 'gate' first decides whether the upload is even a usable Nigerian bank statement; only valid documents reach the expensive gpt-4o analysis. Invalid input fails fast without burning the costly call.
Shared pure-TS engine runs on both client and server. Because core has zero I/O dependencies, the same calculator powers the stateless backend endpoint and a client-side estimate — one source of truth, no drift.
Stateless, ephemeral, privacy-first. No accounts, no stored user data; processes are keyed by an opaque 8-digit code and reaped one hour after the last interaction. No raw statement text or PDF bytes are ever logged.
Resilience around the AI dependency: a classic 3-state circuit breaker (closed → open → half-open) fast-fails after consecutive OpenAI failures; an llm_audit collection records model, tokens, latency, prompt hash, and circuit state for every call.
Contract failure vs outage are distinct. A model turn that fails schema validation is repaired once then surfaced as 422/1008 (rephrase and retry) — explicitly separated from a real upstream outage (503/1007) so non-conforming output doesn't masquerade as downtime or trip the breaker.
needs_review over a false 'exempt'. If extraction reads every credit as a transfer (gross 0 while inflows sum > 0), the process lands on needs_review and routes the user to reclassify, rather than serenely declaring them tax-exempt.
Deterministic LLM stub transport for CI. An LLM_MODE=stub seam lives entirely inside the OpenAI client and runs the same breaker, audit, and schema-validation path as the real transport, with steering levers (reject/fail/refuse/nonconforming/all-transfer) — so AI behaviour is testable without hitting the network. Rejected at boot in production.
Money as integer kobo. All amounts are integer kobo end to end; decimals are rejected at the boundary, eliminating floating-point drift in financial math.

Impacts

Turns a high-stakes, hallucination-prone question ("what is my tax under the new law?") into a trustworthy, auditable answer by structurally separating computation from generation: a deterministic statute-sourced engine produces every number and the LLM is fenced into extraction and explanation only. Delivered as a clean Nx + pnpm monorepo with a privacy-first stateless backend, SSE-driven live status, a circuit breaker, full structured-output validation, and a deterministic stub transport for CI.

Demonstrated Skills

Monorepo architecture and dependency discipline (Nx + pnpm, layered core/api/ui packages with enforced direction).
Domain modelling of real tax legislation (NTA 2025 bands and reliefs vs legacy PITA) in a pure, testable engine.
Production-grade LLM integration: tiered models, structured/schema-validated outputs, conversation threading, prompt/auditing, and grounding constraints.
Backend resilience and reliability engineering: circuit breaker, contract-vs-outage error taxonomy, idempotency, request-id correlation, structured logging.
Real-time UX with Server-Sent Events plus a durable polling fallback and an idle-expiry reaper.
API design rigour: stable numeric error codes, strict one-field-at-a-time validation, integer-kobo money handling, strict request bodies.
Privacy-by-design: no accounts, no persisted user data, no raw statement content logged.
Testability engineering: a swappable deterministic LLM transport that preserves the full production code path.

Problem Statement

Proposed Solution

Full Solution Details

Module 1 — Income input: three paths (sample / manual / PDF upload) and a required profile type (salary earner, freelancer, mixed). Bank-statement PDFs (3–12 months) are parsed by a two-tier AI pipeline that extracts credits, classifies them (salary / business / transfer / other), annualises income, and lets the user confirm or reclassify before anything is computed.
Module 2 — Tax position: exemption status, annual liability, effective rate, monthly estimate, take-home (annual and monthly), and a band-by-band visual breakdown with every applied relief itemised.
Module 3 — What changed for you: old PITA vs new NTA 2025 computed in the same engine, net change in plain language, and only the reform points relevant to the user.
Module 4 — Grounded AI panel: natural-language follow-ups constrained to personal income tax under NTA 2025; refuses out-of-scope (VAT, corporate, business) questions; cites the relevant section with an inline snippet; can never produce a tax number the calculator didn't already produce; always ends with a 'not tax advice' disclaimer.
Module 5 — How it works / About: methodology, statute references, and the build rationale (scope chosen, what was cut, v2 plans).

Technical Documentation

Key endpoints:

POST /tax/compute — stateless NTA 2025 position for an income declaration.
POST /tax/compare — both regimes plus the net change and relevant reforms.
POST /statement/parse — multipart PDF upload; returns an opaque 8-digit code and 202 Accepted immediately, then runs the parse pipeline in the background.
GET /statement/:code/events — Server-Sent Events stream that pushes a status event on every pipeline transition (emits current state on connect, ~15s heartbeat, closes on a terminal status).
GET /statement/:code — durable polling fallback for the same status.
POST /statement/:code/recompute — user reclassifies which inflows count as income; gross is recomputed, the engine re-runs, and the corrected numbers are persisted so the AI panel stays grounded.
POST /ai/ask — grounded follow-up keyed by code; continues the OpenAI conversation from the analysis turn.

Processes are ephemeral: a reaper job deletes each process and its audit one hour after the last interaction, after which :code endpoints return 404.

Tech Stack

Monorepo: Nx 22 + pnpm workspaces. packages/core (pure TS, depends on nothing) → packages/api (browser client, depends on core) and packages/ui (design system, depends on core). Apps never import from another app.
Frontend (apps/taxlens-web): Vite 6, React 19, React Router 6, TanStack Query 5, Tailwind CSS 3, ky-based API client. Ships a Storybook-lite /preview design-system catalogue.
Backend (apps/main-backend): Express 4, MongoDB 7 driver, OpenAI SDK, Zod validation, Pino structured logging, Multer for PDF multipart, Helmet + CORS + compression.
AI: OpenAI Responses API — gpt-4o-mini as a cheap validation gate, gpt-4o for extraction/analysis, with structured (schema-validated) outputs and threaded previousResponseId for conversation continuity.
Language: TypeScript 5 end to end.

System Design

  apps/taxlens-web  (Vite + React 19)
  landing -> income -> result -> compare -> AI
  ky client . TanStack Query . EventSource (SSE)
        |
        |  /api/v1  (JSON . multipart . SSE)
        v
  apps/main-backend  (Express, stateless)
  Zod validate . envelope . requestId . Pino
   /tax        /statement            /ai/ask
     |             |  async pipeline     |
     |        pending -> validating ->   |
     |        analyzing -> ready /        |
     |        needs_review / failed       |
     |             |  (SSE + poll)        |
     v             v                      v
  @taxlens/core engine            LLM client
  (the ONLY source of            circuit breaker
   every tax number)             + audit log
        |                              |
        v                              v
     MongoDB 7                     OpenAI API
     process + llm_audit;          gate: gpt-4o-mini
     1h reaper                     analysis: gpt-4o
                                   chat: grounded

Smart Architectural Decisions

The LLM never produces a tax number. A pure, framework-free TypeScript engine (@taxlens/core) owns every figure; the model is confined to extracting and classifying bank-statement inflows. The AI follow-up panel can only explain numbers the calculator already produced. This is the core trust decision of the whole product.
Two-tier AI pipeline as a cost firewall. A cheap gpt-4o-mini 'gate' first decides whether the upload is even a usable Nigerian bank statement; only valid documents reach the expensive gpt-4o analysis. Invalid input fails fast without burning the costly call.
Shared pure-TS engine runs on both client and server. Because core has zero I/O dependencies, the same calculator powers the stateless backend endpoint and a client-side estimate — one source of truth, no drift.
Stateless, ephemeral, privacy-first. No accounts, no stored user data; processes are keyed by an opaque 8-digit code and reaped one hour after the last interaction. No raw statement text or PDF bytes are ever logged.
Resilience around the AI dependency: a classic 3-state circuit breaker (closed → open → half-open) fast-fails after consecutive OpenAI failures; an llm_audit collection records model, tokens, latency, prompt hash, and circuit state for every call.
Contract failure vs outage are distinct. A model turn that fails schema validation is repaired once then surfaced as 422/1008 (rephrase and retry) — explicitly separated from a real upstream outage (503/1007) so non-conforming output doesn't masquerade as downtime or trip the breaker.
needs_review over a false 'exempt'. If extraction reads every credit as a transfer (gross 0 while inflows sum > 0), the process lands on needs_review and routes the user to reclassify, rather than serenely declaring them tax-exempt.
Deterministic LLM stub transport for CI. An LLM_MODE=stub seam lives entirely inside the OpenAI client and runs the same breaker, audit, and schema-validation path as the real transport, with steering levers (reject/fail/refuse/nonconforming/all-transfer) — so AI behaviour is testable without hitting the network. Rejected at boot in production.
Money as integer kobo. All amounts are integer kobo end to end; decimals are rejected at the boundary, eliminating floating-point drift in financial math.

Impacts

Demonstrated Skills

Monorepo architecture and dependency discipline (Nx + pnpm, layered core/api/ui packages with enforced direction).
Domain modelling of real tax legislation (NTA 2025 bands and reliefs vs legacy PITA) in a pure, testable engine.
Production-grade LLM integration: tiered models, structured/schema-validated outputs, conversation threading, prompt/auditing, and grounding constraints.
Backend resilience and reliability engineering: circuit breaker, contract-vs-outage error taxonomy, idempotency, request-id correlation, structured logging.
Real-time UX with Server-Sent Events plus a durable polling fallback and an idle-expiry reaper.
API design rigour: stable numeric error codes, strict one-field-at-a-time validation, integer-kobo money handling, strict request bodies.
Privacy-by-design: no accounts, no persisted user data, no raw statement content logged.
Testability engineering: a swappable deterministic LLM transport that preserves the full production code path.

TaxLens

Problem Statement

Proposed Solution

Full Solution Details

Technical Documentation

Tech Stack

System Design

Smart Architectural Decisions

Impacts

Demonstrated Skills

Notes

TaxLens

Problem Statement

Proposed Solution

Full Solution Details

Technical Documentation

Tech Stack

System Design

Smart Architectural Decisions

Impacts

Demonstrated Skills

Notes