← all projects
Fintech

TaxLens

A personal income tax tool for Nigerians under the Nigeria Tax Act 2025 (effective 1 January 2026). A visitor enters income manually, loads sample data, or uploads a bank statement; TaxLens computes their tax position under the new regime, shows what changed versus the old PITA rules, and lets them ask grounded follow-up questions. No accounts, and no user data is stored.

TaxLens

A statute-grounded tax engine where the LLM never invents a number: a pure-TypeScript NTA 2025 calculator owns every figure, while a two-tier OpenAI pipeline (cheap gpt-4o-mini gate + gpt-4o analysis) only extracts and classifies bank-statement inflows. Built as an Nx + pnpm monorepo — Vite/React 19 frontend, Express + MongoDB backend with SSE live status, a circuit breaker, and an LLM stub transport for deterministic CI.

TypeScript
React
Vite
Tailwind
Node.js
Express
MongoDB
OpenAI
Fintech

Problem Statement

Nigeria's Tax Act 2025 takes effect on 1 January 2026 and rewrites personal income tax: the Consolidated Relief Allowance is abolished and replaced by a capped rent relief, a new six-band progressive table applies, and the first ₦800,000 of taxable income becomes tax-free. Ordinary salary earners, freelancers, and mixed-income workers have no easy, trustworthy way to see what their tax actually becomes under the new rules — or how it differs from what they paid before. Generic AI chatbots will happily hallucinate a tax figure, which is exactly the wrong tool for a question where a wrong number has real consequences.

Proposed Solution

A single visitor-facing web tool (no account, no stored data) with a four-step flow: input income (sample data, manual entry, or a bank-statement PDF) → see a full NTA 2025 tax position (bands, reliefs, effective rate, take-home) → see a side-by-side comparison against the old PITA regime in plain language ("you save ₦X" / "you pay ₦X more") → ask grounded follow-up questions answered only from the already-computed numbers, with statute citations. Every number on screen comes from a deterministic tax engine, never from a model.

Full Solution Details

  • Module 1 — Income input: three paths (sample / manual / PDF upload) and a required profile type (salary earner, freelancer, mixed). Bank-statement PDFs (3–12 months) are parsed by a two-tier AI pipeline that extracts credits, classifies them (salary / business / transfer / other), annualises income, and lets the user confirm or reclassify before anything is computed.
  • Module 2 — Tax position: exemption status, annual liability, effective rate, monthly estimate, take-home (annual and monthly), and a band-by-band visual breakdown with every applied relief itemised.
  • Module 3 — What changed for you: old PITA vs new NTA 2025 computed in the same engine, net change in plain language, and only the reform points relevant to the user.
  • Module 4 — Grounded AI panel: natural-language follow-ups constrained to personal income tax under NTA 2025; refuses out-of-scope (VAT, corporate, business) questions; cites the relevant section with an inline snippet; can never produce a tax number the calculator didn't already produce; always ends with a 'not tax advice' disclaimer.
  • Module 5 — How it works / About: methodology, statute references, and the build rationale (scope chosen, what was cut, v2 plans).

Technical Documentation

The backend is a stateless-by-design Express API under /api/v1 with a strict response envelope ({ data } for success; flat { errorCode, errorMessage, type, field } for errors). Clients key off a numeric errorCode (1001–1009), never on message text. Validation surfaces exactly one field error at a time (shallowest path, then alphabetical). All money is integer kobo (1 NGN = 100 kobo); decimals are rejected.

Key endpoints:

  • POST /tax/compute — stateless NTA 2025 position for an income declaration.
  • POST /tax/compare — both regimes plus the net change and relevant reforms.
  • POST /statement/parse — multipart PDF upload; returns an opaque 8-digit code and 202 Accepted immediately, then runs the parse pipeline in the background.
  • GET /statement/:code/events — Server-Sent Events stream that pushes a status event on every pipeline transition (emits current state on connect, ~15s heartbeat, closes on a terminal status).
  • GET /statement/:code — durable polling fallback for the same status.
  • POST /statement/:code/recompute — user reclassifies which inflows count as income; gross is recomputed, the engine re-runs, and the corrected numbers are persisted so the AI panel stays grounded.
  • POST /ai/ask — grounded follow-up keyed by code; continues the OpenAI conversation from the analysis turn.

Processes are ephemeral: a reaper job deletes each process and its audit one hour after the last interaction, after which :code endpoints return 404.

Tech Stack

  • Monorepo: Nx 22 + pnpm workspaces. packages/core (pure TS, depends on nothing) → packages/api (browser client, depends on core) and packages/ui (design system, depends on core). Apps never import from another app.
  • Frontend (apps/taxlens-web): Vite 6, React 19, React Router 6, TanStack Query 5, Tailwind CSS 3, ky-based API client. Ships a Storybook-lite /preview design-system catalogue.
  • Backend (apps/main-backend): Express 4, MongoDB 7 driver, OpenAI SDK, Zod validation, Pino structured logging, Multer for PDF multipart, Helmet + CORS + compression.
  • AI: OpenAI Responses API — gpt-4o-mini as a cheap validation gate, gpt-4o for extraction/analysis, with structured (schema-validated) outputs and threaded previousResponseId for conversation continuity.
  • Language: TypeScript 5 end to end.

System Design

  apps/taxlens-web  (Vite + React 19)
  landing -> income -> result -> compare -> AI
  ky client . TanStack Query . EventSource (SSE)
        |
        |  /api/v1  (JSON . multipart . SSE)
        v
  apps/main-backend  (Express, stateless)
  Zod validate . envelope . requestId . Pino
   /tax        /statement            /ai/ask
     |             |  async pipeline     |
     |        pending -> validating ->   |
     |        analyzing -> ready /        |
     |        needs_review / failed       |
     |             |  (SSE + poll)        |
     v             v                      v
  @taxlens/core engine            LLM client
  (the ONLY source of            circuit breaker
   every tax number)             + audit log
        |                              |
        v                              v
     MongoDB 7                     OpenAI API
     process + llm_audit;          gate: gpt-4o-mini
     1h reaper                     analysis: gpt-4o
                                   chat: grounded

Smart Architectural Decisions

  • The LLM never produces a tax number. A pure, framework-free TypeScript engine (@taxlens/core) owns every figure; the model is confined to extracting and classifying bank-statement inflows. The AI follow-up panel can only explain numbers the calculator already produced. This is the core trust decision of the whole product.
  • Two-tier AI pipeline as a cost firewall. A cheap gpt-4o-mini 'gate' first decides whether the upload is even a usable Nigerian bank statement; only valid documents reach the expensive gpt-4o analysis. Invalid input fails fast without burning the costly call.
  • Shared pure-TS engine runs on both client and server. Because core has zero I/O dependencies, the same calculator powers the stateless backend endpoint and a client-side estimate — one source of truth, no drift.
  • Stateless, ephemeral, privacy-first. No accounts, no stored user data; processes are keyed by an opaque 8-digit code and reaped one hour after the last interaction. No raw statement text or PDF bytes are ever logged.
  • Resilience around the AI dependency: a classic 3-state circuit breaker (closed → open → half-open) fast-fails after consecutive OpenAI failures; an llm_audit collection records model, tokens, latency, prompt hash, and circuit state for every call.
  • Contract failure vs outage are distinct. A model turn that fails schema validation is repaired once then surfaced as 422/1008 (rephrase and retry) — explicitly separated from a real upstream outage (503/1007) so non-conforming output doesn't masquerade as downtime or trip the breaker.
  • needs_review over a false 'exempt'. If extraction reads every credit as a transfer (gross 0 while inflows sum > 0), the process lands on needs_review and routes the user to reclassify, rather than serenely declaring them tax-exempt.
  • Deterministic LLM stub transport for CI. An LLM_MODE=stub seam lives entirely inside the OpenAI client and runs the same breaker, audit, and schema-validation path as the real transport, with steering levers (reject/fail/refuse/nonconforming/all-transfer) — so AI behaviour is testable without hitting the network. Rejected at boot in production.
  • Money as integer kobo. All amounts are integer kobo end to end; decimals are rejected at the boundary, eliminating floating-point drift in financial math.

Impacts

Turns a high-stakes, hallucination-prone question ("what is my tax under the new law?") into a trustworthy, auditable answer by structurally separating computation from generation: a deterministic statute-sourced engine produces every number and the LLM is fenced into extraction and explanation only. Delivered as a clean Nx + pnpm monorepo with a privacy-first stateless backend, SSE-driven live status, a circuit breaker, full structured-output validation, and a deterministic stub transport for CI.

Demonstrated Skills

  • Monorepo architecture and dependency discipline (Nx + pnpm, layered core/api/ui packages with enforced direction).
  • Domain modelling of real tax legislation (NTA 2025 bands and reliefs vs legacy PITA) in a pure, testable engine.
  • Production-grade LLM integration: tiered models, structured/schema-validated outputs, conversation threading, prompt/auditing, and grounding constraints.
  • Backend resilience and reliability engineering: circuit breaker, contract-vs-outage error taxonomy, idempotency, request-id correlation, structured logging.
  • Real-time UX with Server-Sent Events plus a durable polling fallback and an idle-expiry reaper.
  • API design rigour: stable numeric error codes, strict one-field-at-a-time validation, integer-kobo money handling, strict request bodies.
  • Privacy-by-design: no accounts, no persisted user data, no raw statement content logged.
  • Testability engineering: a swappable deterministic LLM transport that preserves the full production code path.

Notes

  • Trust-by-architecture, not by prompt: the decision that the LLM may never emit a tax figure — a pure TypeScript engine owns every number and the model only extracts/explains — is a senior-level framing of an AI product where correctness is non-negotiable. Most engineers would let the model 'just answer'.
  • Cost-aware AI design: the cheap gpt-4o-mini gate as a firewall in front of the expensive gpt-4o analysis shows real thinking about token economics and abuse resistance, not just wiring up an API.
  • Failure-mode literacy: a 3-state circuit breaker, a deliberate 422 (contract failure, retryable) vs 503 (outage) distinction, and the needs_review guard against a false 'exempt' all show someone who designs for the unhappy paths, not just the demo.
  • Testability as a first-class concern: a deterministic stub LLM transport that runs the identical breaker/audit/schema path as production, with steering levers for every failure mode, is exactly how a senior engineer makes a non-deterministic dependency CI-friendly.
  • API and data-modelling discipline: integer-kobo money everywhere, stable numeric error codes clients can switch on, strict bodies, one-field-at-a-time validation, and request-id correlation reflect mature contract design.
  • Privacy and lifecycle by default: statelessness, opaque codes, a 1-hour reaper, and never logging raw statement bytes show product-level judgement about handling sensitive financial data.
Ask me anything