Not Every Problem Needs Code: Discovery Before Building Custom Software

TL;DR: The 50K targeted build forced a different kind of discovery than the 86K full-replacement scope would have. Two pre-build conversations with the recruiter who would actually use the tool surfaced two findings that reshaped the brief: the most expensive-feeling problem was a Notion-ordering problem, not a tool gap;and the existing AI-in-Notion workflow could be extended, not replaced.Here is what discovery looked like, what it produced, and the rule it left us with — not every problem is an engineering problem.

The brief we almost wrote

Discovery before building custom software is supposed to happen after the audit and before the brief. You have run the audit — 12, 18, in our case 24 documented gaps from the Part 5 inventory. The dollar signs add up. The instinct, after weeks of disciplined audit work, is to skip the discovery step and translate every documented gap straight into a feature specification.

That instinct survives an 86K budget. It does not survive a 50Kone.

The pivot in Part 7 handed us a constraint, not a license. Two pains to build for. Twenty-two to leave alone — at least at engineering’s expense. The question stopped being “which features make the cut” and became something more uncomfortable: for the two pains we picked, what does discovery before building custom software actually have to tell us before we write a line of spec?

We had picked the two top-priority pains from the audit: interview scheduling friction and competency-driven evaluation. Competency-driven evaluation rated 5/5 by three of four recruiters; interview scheduling friction was confirmed as the second top-priority gap. Both confirmed by the scoping process described in Part 7 as structurally unaddressable inside Recruitee.

What we had not done was ask the person who would use the tool what her day actually looked like.

Two conversations before any code

Discovery at this stage was not a research project. It was two recorded conversations.

One with the recruiter who would be the target user. One with the software developer who held the engineering perspective on what was buildable in the budget. Both recorded through Roam. Both transcribed.Both fed into Claude. The output was a draft PRD, then a backlog.md, then — through MCP into Linear — a ticket structure already shaped for agentic coding.

That pipeline is worth naming because the constraint shapes the conversation. When you know the transcript will become a PRD inside a day, you ask differently. You stop asking abstract questions about pain and start asking concrete questions about Tuesday afternoons. What were you doing at 2pm yesterday? Which window was open? What was open next to it? Where did the data live before you typed it into Recruitee?

This is not the discovery methodology from Part 4. Part 4 was about how to find what is broken. This was something narrower — how to find what not to build once you already know the breakage. The audit had produced the list of gaps. The 50K budget had narrowed it. Discovery at this resolution exists to answer one question: for the two gaps you have committed to, what would actually be there if you built nothing?

Two findings came out of those conversations. Both reshaped the brief.

Finding one: recruiters already using Claude

The first thing that surfaced was that the target user already had an AI workflow.

Magda was using Claude with Notion daily. Not for the evaluation steps specifically — for parts of the work around them. Notes processing, structuring follow-ups, drafting candidate communications.The chat window was open most of the day. It was not a tool she had been told to use. It was a habit she had built.

The brief we had been about to write assumed a target user with a roughly empty AI surface. Build the new evaluation flow, surface it through a web UI, train her to use it. That brief survived contact with the actual workflow for about ten minutes.

If Magda already had Claude open, the question was no longer “what UI do we build for the new capability?” The question was: “what is the minimum custom code needed to connect what she already does to the gaps we identified?” The new system stopped being a replacement and became connecting tissue.

The downstream consequence was structural, not cosmetic. If the custom code lives behind a clean API, the UI on top of it is replaceable. The web interface we built first could later be swapped for an internal Claude Skill talking to the same backend, without touching the engineering work underneath. That is an exit door designed in from day one — useful insurance against the possibility that AI-native interfaces continue to absorb internal-tool surface area faster than custom UIs can be justified.

We did not build the Skill. We built the API such that a Skill would be a UI swap, not a rebuild.

The build became smaller because we counted what already worked, and shaped the architecture around the assumption that the worked-already surface would keep growing.

Finding two: not every problem is an engineering problem

The second finding was harder to take.

Cluster 4 in the audit — competency-driven evaluation — had been rated 5/5 severity by three of the four recruiters. It was one of the two pains we had committed the budget to. The intuition going in was that Recruitee lacked a competency-matrix module, and we would build one.

What discovery revealed: Recruitee was not the missing piece. The matrices that existed inside Appunite were inconsistent because no one had owned organizing them. They had grown organically inside Notion over several years. Different roles had different structures. The same competency was named three ways. Some pages had evaluation criteria;some had only competency labels; some had neither.

The expensive-feeling problem — the one that would have justified the largest module of the build — was a Notion-ordering problem.

You can build a competency-matrix tool over a disorganized data layer. The tool will work. It will pull whatever is in the source database and present it in a clean interface. The output will be a faster version of the existing chaos: the same inconsistencies, the same drift, the same lack of a canonical structure, except now wrapped in an interface that looks authoritative because it is custom.

The fix that mattered was not engineering. It was Magda ordering Notion in parallel with the build — establishing a canonical structure for the competency matrices before any tool tried to read from them.

That became a discovery deliverable. Not a “nice to have.” A required parallel work stream, without which the build would have produced something visibly working and substantively useless. The AI components that would later run evaluations against those matrices would have had nothing coherent to evaluate against.

Discovery’s most important output, on the most expensive-feeling pain, was a non-engineering task assigned to a non-engineer.

What changed in the brief

The brief that came out of those two conversations was structurally different from the one we had been about to write.

Before discovery:

Build for the two highest-rated pains.
Full feature build for each.
Target full replacement of the workflow steps each pain touched.

After discovery:

Build for the same two pains, but as connecting tissue around what Magda was already doing in Claude and Notion.
Notion-organization as a parallel, non-engineering deliverable, owned by Magda, no engineering hours allocated.
Exit door designed in from day one: backend independent of the UI layer, swappable for a Claude Skill without backend rework.

The scope did not get larger. It got differently shaped. Some of the work moved off the engineering ticket entirely. Some of the architectural choices got tighter because they had to anticipate a future where the UI was not the long-term interface.

The two pains stayed. The budget stayed. The shape of the answer changed.

What the framework actually does

Step back for a moment and ask what the framework is actually doing here.

Part 3 introduced the four questions for scoring a SaaS tool. Part 4 laid out the discovery methodology for finding what is actually broken. Both of those produced inputs to this discovery — without the audit, we would not have known which two pains were the right ones to scope against.

But the framework’s job is not to produce a feature list. A feature list is what falls out of the audit. The framework’s job is to produce the right scope — and the right scope is sometimes smaller than the gap list, sometimes shaped differently, and sometimes contains line items that are not engineering tasks at all.

This is the part of the work that does not survive a checkbox treatment. If you run discovery as a step to satisfy before the real work begins, you get a confirmation of the brief you walked in with. If you run it as the work, you get a chance — not a guarantee, a chance —to discover that the brief should not exist in that form. The 50K constraint did not invent that possibility. It removed the room to ignore it.

The published companion piece See Clearly. Fix Precisely. made the same point at a higher level of abstraction. This is what it looks like at resolution: discovery surfaces the cheapest fix, and the cheapest fix is sometimes not a fix you can ship through engineering at all.

The rule we left with

Honest version: not every problem is an engineering problem.

That sentence sounds obvious in the abstract and is regularly violated in practice. The reason is structural. When the people running build-vs-buy conversations are technical, the available levers are technical. The reflex is to translate every problem into the language you can act on. A Notion problem reads as a tooling problem because a tooling problem is something you can put on a backlog.

For any CTO, VP Engineering, or technical co-founder running a build-vs-buy decision, the question worth adding to the checklist is direct: for each gap on this list, what is the cheapest thing that would fix it?

If the answer is “an afternoon in Notion,” it is not a tooling problem.

If the answer is “a process change owned by the team that uses the tool,” it is not a tooling problem.

If the answer is “documentation that does not exist yet,” it is not a tooling problem.

Pulling that pattern out before the brief gets written is the single highest-leverage move in pre-build discovery. It is also the move most likely to be skipped — partly because it produces work for someone other than engineering, and partly because the conversations are awkward to host. “Half of what we have been calling tool problems are something else” is not a comfortable thing to say to a People Team that has been documenting tool problems for six months.

It is, however, the move that saves the budget for the problems that genuinely need engineering. We had 50K. We could not afford to spend it on a faster version of a Notion problem.

What comes next

The targeted build started after this. Two pains, one operator, agentic coding through Linear and Claude Code. Jakub Wąsowski as the operator who understood the business domain. Magda’s Notion work running in parallel. The exit-door architecture wired in from the first commit.

Part 9 of this series covers what that looked like — how a working internal product gets built with one operator, zero manually-written lines of code, and a brand book fed to an agent in lieu of a designer.The constraints that made it work matter more than the headline does. We will get to them.

Sources

Part 3 — Four Questions to Score Any SaaS Tool: https://www.appunite.com/blog/four-questions-to-score-any-saas-tool
Part 4 — How to Discover What’s Actually Broken in Your SaaS Tool: https://www.appunite.com/blog/how-to-discover-whats-actually-broken-in-your-saas-tool
Part 5 — What We Found Wrong with Our ATS: https://www.appunite.com/blog/what-we-found-wrong-with-our-ats
Part 7 — We Ran the Numbers. Then We Ran the Scoping. They Didn’t Agree: https://www.appunite.com/blog/saas-replacement-experiment-pivot
Campaign hub — Is Your SaaS Bill Worth It?: https://www.appunite.com/is-saas-over-honest-evaluation