Problem Statement
Engineers repeat the same browser chores — filling forms, scraping values, multi-step QA flows — and the tools to automate them are either heavyweight (Selenium/Playwright, code + setup), cloud-bound (privacy + account required), or opaque (no-code SaaS where you can't see what's actually running). The gap: a transparent, local, version-controllable way to record and run browser workflows right inside Chrome.
Proposed Solution
Atuko (Yoruba: "driver") is a Chrome MV3 extension that lets you record, build, and run multi-step browser workflows directly in the browser — no cloud, no account. Every workflow compiles to portable, human-readable JSON you can version-control, share, or generate programmatically (including by AI agents).
Full Solution Details
- 18 step types —
click,fill,wait,scroll,navigate,submit,select,hover,keypress,extract,screenshot,tab,clipboard,storage,log,prompt,setVariable,branch,loop,stop,jump. - Action recorder — record interactions; auto-detects wait gaps (a 5s pause inserts a
waitstep) and generates stable CSS + XPath selectors with fallbacks and stability scores for every element. - Builder — drag-and-drop step canvas + inspector, visual branch/loop editors with inline conditions, and variable binding for any
{{var}}-supporting field. - Element picker — crosshair mode for manual selector targeting.
- Dry-run mode — simulates a full run without touching the DOM, validating selectors, variable refs, and step logic before execution.
- Run history — step-level logs, output/variable capture, and per-step screenshots.
- Live toast overlay — a shadow-DOM overlay injected into the page showing current step, progress, and pause/stop.
- Import/Export — workflows are plain JSON; AI agents can generate and import them directly.
Technical Documentation
React + TypeScript + Vite (via @crxjs/vite-plugin for hot-reload dev). The extension spans four UI surfaces — a Side Panel (the full builder SPA), a Popup (runtime dashboard), an Options page, and a Toast (a React root injected into the page via shadow DOM so it can't be styled or clobbered by the host site). Two background contexts do the work: the Service Worker is the orchestration engine — message router and run state machine — and the Content Script is the DOM step executor, element picker, and recorder. Communication is a clean request/response loop: Side Panel/Popup → chrome.runtime.sendMessage → Service Worker → chrome.tabs.sendMessage → Content Script → back. Workflows persist in extension storage as JSON.
Tech Stack
TypeScript, React, Chrome Manifest V3 (service worker + content scripts), Vite + @crxjs/vite-plugin, shadow DOM; CSS/XPath selector generation.
System Design
Surfaces: Side Panel (builder SPA) · Popup (dashboard) · Options · Toast (shadow DOM overlay)
│ chrome.runtime.sendMessage
▼
Service Worker ── run state machine · message router · orchestration
│ chrome.tabs.sendMessage
▼
Content Script ── DOM executor · recorder · element picker
│
18 step types · branch/loop · {{vars}} · dry-run validation
│
Workflow = portable JSON → export / import / AI-generate · run history + screenshots
Smart Architectural Decisions
- Workflow-as-JSON is the whole thesis. Compiling every workflow to portable, human-readable JSON makes automations version-controllable, shareable, and AI-generatable — turning a no-code tool into something engineers can treat as code.
- Dry-run before execution. Validating selectors, variable references, and step logic without touching the DOM is a thoughtful reliability feature that prevents half-run, broken automations.
- Stable selectors with fallbacks + stability scores. The recorder's selector strategy directly attacks the #1 cause of flaky browser automation (brittle selectors).
- Shadow-DOM toast overlay. Injecting the live progress UI in a shadow root isolates it from the host page's CSS/JS — correct, defensive content-script engineering.
- MV3 service-worker state machine. Modeling runs as an explicit state machine in the (ephemeral) MV3 service worker is the right pattern for orchestrating long, multi-step flows under Manifest V3's constraints.
- Local-first, no account keeps automations private and dependency-free.
Impacts
A transparent, local, engineer-friendly browser-automation tool with serious depth (18 step types, recorder, branch/loop, dry-run, run history) where every workflow is portable JSON — bridging no-code convenience with code-grade version control and AI generation.
Demonstrated Skills
Chrome MV3 extension architecture (service worker run state machine, content scripts, multi-surface messaging, shadow-DOM injection); robust DOM automation (selector generation with fallbacks/stability scoring, recorder); workflow engine + DSL design (18 step types, branch/loop/variables, dry-run validation); React + TypeScript + Vite tooling; local-first design.