Our website uses cookies.
Reject AllAllow all

This website stores cookies on your computer. The data is used to collect information about how you interact with our website and allow us to remember you. We use this information to improve and customize your browsing experience and for analytics and metrics about our visitors both on this website and other media.

dodaj tez tutaj ze przycisk ma byc schowany jezeli scrollY jest mniejsze niz 100vh

One Operator, Zero Manual Lines: An AI-Assisted Development Workflow

TL;DR: The targeted POC was built by one operator — Jakub Wąsowski — coordinating Claude Code against a Linear backlog generated from discovery transcripts. Zero lines of code were written by hand — that is a measurement, not the headline. The workflow itself is mundane: ticket → plan → implement → verify → iterate. Three constraints made the AI-assisted development workflow actually work: frequent demos with the actual users, an Appunite brandbook fed to the agent as a design system, and an operator who understood the recruitment domain better than any spec could capture. This is not a “developers are obsolete” story. It is a team-shape story with hard boundary conditions, and we want to be specific about where those boundaries are.

What we wanted to find out

Part 7 ended with a 50,000 PLN budget and two clusters to address:interview scheduling, and competency-driven evaluation. Part 8 ended with a Linear backlog generated from discovery transcripts, structured for agentic coding, with one decision already taken — Magda’s Notion was the source of truth, the new tool would connect to her workflow rather than replace it.

The question we wanted to answer with the build itself was not “can AI write code?” That question is settled in 2026 and we are not going to spend an article re-litigating it. The question was narrower and more useful: can one person, with the right context and the right tools, ship a usable internal product to production at this budget — and what does that workflow actually look like in motion?

The whole project is the answer to that question. The code lives at github.com/appunite/appunite-ats-poc — currently private, to be released publicly when ready. What follows is what produced it.

The AI-assisted development workflow loop

The loop that ran from mid-April to mid-May 2026 was not novel. It was the same sequence, repeated, until the backlog emptied.

  1. Open a new Claude Code session against a specific Linear ticket.
  2. Planning phase. The agent reads the ticket, references the PRD, and proposes an implementation plan. The operator verifies the plan against the PRD and against what the user actually needs. Mismatches get caught here, before any code exists.
  3. “Implement.” The agent writes the code, the tests, and the integration glue. For this project that included integrations with Roam, Recruitee, Resend, Notion, Anthropic, GCP, and Google Calendar.
  4. The operator verifies the output. Runs the feature against staging.If something breaks, the relevant logs go back to the agent with context.
  5. Iterate until the ticket closes. Move to the next ticket.

The Linear backlog itself was produced upstream by the discovery process described in Part 8 — two discovery calls recorded through Roam, transcripts processed by Claude into a PRD, the PRD broken into a backlog.md, the backlog pushed into Linear via the Model Context Protocol (MCP). By the time the build loop started, the tickets were already structured for agentic coding — each ticket scoped tightly enough that a Claude Code session could carry it end-to-end without context starvation.

No human-in-IDE rewriting. No “the agent got 80% there and I finished the rest.” Zero lines of code written by hand across the project. We are stating that as a measurement, not as a slogan — the slogan version of this claim is exactly the one we want to avoid. Read on.

What the operator actually did

If no code was written by hand, what was the operator doing for six weeks?

Holding the domain in his head. Verifying that what the agent produced was the right thing, not just working code. Catching when a technically-fine implementation missed the user-facing point. Spotting architectural decisions that would lock in a wrong shape downstream. Handing back logs, error contexts, and screenshots when iteration was needed.

This is not code review in the traditional sense. Code review checks that code is correct and idiomatic. This is closer to product management at the resolution of a function — does this thing, written by an agent, actually do what Magda needs it to do when she is mid-call with a candidate? Will this API choice still make sense in two weeks when the next ticket lands on top of it? Is the agent about to commit us to an integration shape we will regret?

Jakub wrote no Elixir by hand. No React by hand. No TypeScript by hand. What he wrote, in volume, was prompts, plans, decisions, and verifications. The output of his hands was natural language and judgement. The output of his project was a working tool.

That is the part of “AI-assisted development workflow” that does not show up in the productivity-claim version of the pitch. The operator is the bottleneck — not on typing speed, but on domain understanding. That bottleneck is the project. Take it away and the agent builds fast generic software that does not solve the problem.

Demos as a control mechanism

The speed of an AI-driven build is real. A working interface emerges in days, not weeks. Integrations come up faster than they used to come up. The agent does not get tired and does not lose context between sessions if the project is structured to preserve it.

That speed has a cost. If you do not check direction frequently, you go far down the wrong road before anything pings.

The control mechanism we used was the simplest one available:frequent short demos with Magda (the target user) and Szymon (engineering perspective). Not weekly status reports. Not Loom recordings. Live demos with the actual users, watching them try the thing, listening to what they ask about and what they ignore.

Planning could not have predicted what those demos surfaced. Most of what came out of them was small — an integration limitation here, a UI affordance there, a step in the recruiter’s workflow that was not in the PRD because nobody had thought to write it down. Some of it was not small. The Chrome extension idea — covered in detail in Part 10 — came out of one of those demo conversations, not out of any planning document. The best architectural decisions in this project surfaced when the user was looking at a working prototype and said something offhand about how they actually worked.

The rule we ended up with: if you are going to build at AI speed, you have to check direction at human speed. Skip that and you ship the wrong product, very efficiently.

Design without a designer

There was no UI designer on this project. No Figma mockups. No design system documents drafted from scratch.

What there was: a generated brandbook.md file. Jakub took Appunite’s existing brandbook — the logos, the type system, the colour palette, the asset library — and produced a markdown version of it sized for agent consumption. That file went into the CLAUDE.md context for the project. From that point on,every UI component the agent produced started visually consistent with Appunite identity by default.

The honest trade-off: mockups before implementation give stakeholders a clearer picture of where the build is going. They reduce the risk of building something the stakeholders cannot recognise. In a longer project with multiple users and a higher visual bar, skipping mockups would be a mistake.

In this project, mockups would also have been thrown out after the Chrome extension pivot. The whole UI got rewritten when we moved from a standalone web app to a plugin embedded inside Recruitee —Part 10 covers why. Mockups would not have prevented the pivot, because the pivot was a UX-architecture insight, not a visual one. They would have prevented some confusion in early demos. They would have cost real time. We did not regret skipping them. We might regret skipping them on a different project, and that is the point — name the trade-off, do not pretend it does not exist.

The takeaway is narrower than “you do not need designers.” It is: for an internal tool with a known brand and a single domain, the brandbook is the design system, and that is enough.

The two languages the operator had not used before

The backend was Elixir. The frontend ended up being React.

Jakub had not written either language in production before this project.

Five years ago that would have been a deal-breaker for a project of this size with this timeline. The cost of getting fluent in a new stack was paid in months, not weeks, and it was paid by the person writing the code.

An AI-assisted development workflow re-prices that cost. When the operator’s job is verification rather than authorship, the threshold for working in an unfamiliar language drops sharply. You still need to be able to read the code the agent produces — fluently enough to catch bad decisions before they compound. You do not need to be able to write it from a blank file at production speed. Those are different competencies, and the second one used to gate projects that the first one could carry.

The wider implication, which is the reason this matters to anyone reading: stack choice is no longer constrained by what the operator’s hands already know. The right question becomes “what is the right tool for this problem?” — not “what does the developer on this project already type fluently?” Part 10 will tell you how we got that question wrong on this project, and what the correct answer would have been. Two languages still cost something, even when neither of them is being written by hand.

Where the one-operator model breaks

The cleanest summary of when this works: one operator who understands the business domain and the user need, building an internal tool against a single, well-scoped problem, with frequent access to the actual users.

Outside that boundary the model breaks. We want to be specific about how it breaks because the most dishonest version of this article would be the one that implied otherwise.

It breaks when the operator does not understand the domain. The agent will happily build fast generic software against a vague spec. The output will look correct. It will compile, it will pass tests, it will demo cleanly to people who do not know what the users actually need. It will not solve the problem. The operator’s leverage in this model comes from holding the user, the workflow, and the constraints in their head while iterating with the agent. Remove that and the leverage goes with it.

It breaks when the problem is not single-domain. Cross-cutting work — a product surface used by three different teams with three different mental models, integrations that span departments with competing priorities — pulls the operator out of a single coherent context and into the kind of coordination work that team structure exists to solve.

It breaks when the build is not internal. External products have constraints — accessibility, internationalisation, performance under unknown load, security postures shaped by procurement — that one operator running ahead at AI speed will under-serve unless those constraints are themselves the operator’s domain.

And it breaks at scale. The team shape that emerged on this project — one operator, one domain, one project — is exactly that. It is not “one engineer running ten projects.” Spreading the operator across multiple builds reintroduces every coordination tax that team-of-many structures exist to absorb. Different problem, different answer.

This is not a “developers are obsolete” piece. The work changed shape. The shape that worked on this project is the shape we just described. Outside its boundary, you need a different shape — and that may well include a team of senior engineers operating their own agents at their own desks, which is a model worth a separate article and is not this one.

What is actually new here

What changed between the world where a project like this would have taken a four-person team and the world where it took one operator?

The cost of trying a build dropped. That is the new variable. Where a year ago a 50K PLN budget meant “you cannot afford to attempt this,” now it means “you can afford to attempt this and find out if it was worth attempting.” That changes which builds are worth attempting. It does not, by itself, change what makes a build succeed.Domain understanding still matters. User contact still matters.Architecture decisions still compound. Mistakes mid-flight still cost time.

Part 10 is the article about what mid-flight mistakes looked like on this one — two pivots that no amount of upfront planning would have prevented, and what they cost. Part 11 is the article where the tool actually meets its users and the pilot data decides whether the build was worth doing at all.

The economic argument for what this article describes operationally lives in two earlier Appunite pieces: “Process Native Software” and “The Paradox of Cheaper Code”. This article is the operating-detail companion to both.

The build happened. The verdict is in the next two articles.

Sources

  • The Paradox of Cheaper Code — Why AI is Making Custom Software Development More Valuable: https://www.appunite.com/blog/why-ai-is-making-custom-software-development-more-valuable
  • Process Native Software: https://www.appunite.com/blog/process-native-software
  • POC source repository (private; to be released publicly when ready)

Further reading

Buy vs build

AI-assisted engineering

The build series