The Honest Verdict: We Built It, And We’re Probably Sunsetting It

You have spent ten articles with us. You watched the ROI model produce a number, then the scoping process kill it, then a smaller build emerge, then the build actually ship, pivot twice, and arrive at a pilot.

If you came in skeptical of the build-vs-buy thesis, this is the article you have been waiting for. This is where we tell you whether it worked.

The short answer: the tool works. The default post-pilot scenario is to sunset it. Both sentences are true at the same time. The rest of this piece is why.

TL;DR: The targeted POC delivered. Average user satisfaction 7.2/10. Per-step rating 4.2/5. The real value was isolated cleanly — competency matrix to transcript to evaluation to feedback, without context switching between Metaview, Notion, and email. And the default post-pilot scenario is still to sunset the tool, because the data killed the hypothesis that justified the build. The original ROI case rested on “this tool will prevent bad hires.” In the last twelve months we had three failed hires, total cost about 23,500 PLN, none of them caused by problems this tool addresses. The real build cost crossed 50K once an unbudgeted backend refactor was counted. The stack costs about 50 USD per month to keep running. And a meta-signal: getting three usability tests done required repeated reminders. If the pain does not hurt enough to test, it does not hurt enough to use.

The build vs buy honest verdict the campaign was set up to produce

For ten articles, this campaign has made a specific claim: honest math beats motivated reasoning, and a rigorous build-vs-buy process will sometimes recommend not building. Part 3 gave you the four questions. Part 6 gave you the ROI model. Part 7 showed the framework killing a 320K full-replacement plan that was already in motion. The campaign has been staking its credibility on the method, not on the outcome.

Now we ran the method on ourselves all the way through. Here is the build vs buy honest verdict the process produced.

What the pilot measured

The test design was deliberately narrow. We did not run a satisfaction survey, and we did not ask the recruiters whether they liked the tool. We asked them to do their job with it.

Three recruiters — Milena, Maciej, and Magda — each ran the full path independently. Build a competency matrix for an open role. Send a booking link to the candidate. Run the meeting. Complete the evaluation. Send feedback. Jakub Wąsowski, the operator who built the tool, played the candidate. Time was measured per step, tool versus an estimated manual baseline. Hard numbers, not opinions.

The structure of the test was set up so that the answer could comeback unflattering and there would be no way to argue with it. That was the point.

What the numbers said

Average satisfaction across the three testers: 7.2/10. Per-step rating: 4.2/5. Both are decent. Neither is glowing.

The interesting result is not the average. It is what all three testers independently identified as the actual value: the path from competency matrix to Magic Minutes transcript to evaluation to feedback, without context switching between Metaview, Notion, and email. That sequence — which used to live across three tools and a mental cache of where the latest version of the matrix actually was — collapsed into one workflow. All three testers named that as the thing that worked.

That is a real piece of value. It is the part of the tool that, in isolation, did what Part 8 said the build was for: stop forcing the recruiter to be the integration layer between Recruitee, Roam, Notion, and Metaview.

This is the moment in a build-vs-buy article where most authors stop. The tool works, the testers liked it, ship it. We do not stop here. The next section is why.

Why the numbers do not pencil

At 43 meetings per month across 4 recruiters, the time saved by removing the context switching does not justify the build cost. That is the headline number, and the math under it has three parts.

Real cost was above 50K, not at 50K. The targeted POC was scoped at 50,000 PLN. Mid-project, a second developer joined to refactor the Elixir backend. That headcount was not in the original budget. Whatever number you want to put on it, the real build line is higher than the budgeted one, and the ROI we wrote in Part 6 does not allow you to quietly add cost without recalculating payback.

Ongoing infrastructure has a stack penalty. Elixir on GCP generates about 50 USD per month at this scale. TypeScript on Vercel would be effectively free at this user count. Fifty dollars a month is not a large number on its own. It is a large number when the underlying tool is being used by four people 43 times a month and the only confirmed value is reduced context switching. The Elixir-versus-TypeScript verdict from Part 10 is now also a cost-of-ownership verdict.

The payback model from Part 6 does not survive the cost update. The targeted build was approved on the assumption that the build cost would be roughly 50K and the time savings against 43 meetings a month would clear payback inside a reasonable horizon. With cost above 50K and the savings smaller in practice than estimated, the model breaks. Not by a small margin. By enough that we cannot square it.

This is the part of the article where we would normally tell you what we plan to do about it. We do not have a plan that fixes the math. The math is what it is.

The hypothesis the data killed

This is the paragraph that decides whether the rest of this campaign was real or marketing.

The original ROI case for any version of the ATS build — full replacement or targeted POC — rested on a specific hypothesis. The tool will prevent bad hires. That was the load-bearing assumption. A bad hire at a senior engineering level is expensive, and if a structured competency-driven evaluation could catch even one bad hire a year that the existing process missed, the build paid for itself.

We checked.

Over the last twelve months, Appunite had three failed hires. Total cost: roughly 23,500 PLN. We looked at each one. We mapped the cause back to the actual moment in the recruitment process where the failure became inevitable.

None of the three were caused by problems this tool addresses.

The failures were earlier in the funnel, or upstream of the interview entirely. Sourcing fit, role definition, expectations alignment aftero ffer. Not interview structure. Not competency matrix coverage. Not evaluation rigor. The hypothesis the entire build was justified by — if we had this tool, the failed hires would have been caught — is falsified by data, not by opinion.

The reframe lands plainly. The tool does not solve the problem we said we built it to solve. The problem we said we built it to solve has cost about 23,500 PLN over twelve months, and none of it for reasons this tool would have changed.

That is the honest math. The tool works for what it does. What it does is not what we said it was for.

The signal in the testing process itself

There is a second finding that is harder to put a number on, and matters more than it should.

Getting three usability tests done required repeated reminders. The recruiters did not refuse. They did not push back on the tool. They simply did not prioritize the testing, and the operator had to chase. Three times. For three sessions. From people who had asked for the tool in the discovery interviews.

The operator’s read from the handoff was direct: if the pain is not bad enough to actively test, it is not bad enough after deployment either.

That is not a comment on the testers. They are good at their jobs and they have a queue of real work. It is a market signal about urgency. Tools that fix non-urgent pains do not get used after the novelty wears off. They sit in a tab, they are open during the first week, and by week four the recruiter is back in Recruitee and Notion because that is where the rest of the workflow lives and the new tool is one more thing to remember.

The reader can take this one home. If your internal team will not test the tool you are building for them, the tool will not survive the first reorganization. That is true whether you are building or buying.It is more true when you are building, because the cost of carry is yours.

Default scenario: sunset

The handoff document spells out the default scenario in one line:“No decision equals sunset.”

The conditions for the tool to survive past pilot are written and pre-committed:

A maintenance owner has to be assigned.
A maintenance budget has to be approved (50 USD per month of infra plus engineer time for fixes and integrations as Recruitee changes underneath).
The pilot on live candidates has to produce a go signal against pre-written go/no-go criteria in the Executive Summary.

If any of those three does not happen, the tool goes dark. That is not a threat. That is the default state in the absence of a positive decision. The go/no-go criteria for the pilot are written before the pilot starts so that the pilot result cannot be re-interpreted after the fact. That is the only way honest math survives organizational politics.

As of this writing, none of the three decisions has an owner. The tool is in production, deployable, working, and one calendar quarter away from sunset by inaction.

What the campaign was actually for

This is the part that reframes the entire series.

The campaign was never “we built our own ATS and it worked.” If that was the headline, we would be writing a different article right now. We would be claiming the satisfaction score, picking three quotes from the testers, and ending on “we are looking forward to rolling this out.”

The campaign was “we will do the honest math, even when the math says we should not have built this.” That sentence is the brand asset. The tool was never going to be the asset. The tool was the forcing function that produced the math.

For the reader, the campaign’s value is the method. The 24-problem audit (Part5). The ROI model (Part6). The scoping reconciliation (Part7). The discovery muscle (Part8). The one-operator workflow (Part9). The two pivots (Part 10). And this article — the one where the method produces an answer that the operator did not want.

If the campaign closes with a satisfied conclusion, the method is decoration. If the campaign closes with a verdict the operator would rather not publish, the method is the asset. We chose the second.

The question for the reader

You have been using this framework — the four questions, the ROI model, the scoping process, the discovery method — to evaluate your own SaaS. Maybe you already have a build planned. Maybe you have one inflight.

One question.

If you ran your method to the end, and the math killed your project, would you publish the result?

If the answer is yes, you are doing the honest version of build-vs-buy. The framework is doing what it was built to do. The cost of carrying it is the occasional uncomfortable article like this one.

If the answer is no, the framework is decoration. It will produce whatever answer the budget owner needed before the process started. Thatis fine. It is just not the same exercise.

The four questions from Part 3 work the same either way. The discipline is in what you do when the answers are inconvenient.

The campaign ends here, for now

The targeted build had a job. It did most of it. The numbers say it was not worth doing. We are publishing that.

The next chapter of this work, if there is one, comes when commercial fixed-price projects produce a verdict on the same questions with client money on the line. That is where unit economics get proven externally. The marketing POC was always a different exercise — internal expertise demonstration, campaign forcing function, honest-math artifact. The verdict on PNS as a commercial product line lives in engagements we are currently setting up, and whatever comes after, not here.

For now, the campaign hub carries the full series. Parts 1 through 11. Including this one.

The Honest Verdict: We Built It, And We’re Probably Sunsetting It

The build vs buy honest verdict the campaign was set up to produce

What the pilot measured

What the numbers said

Why the numbers do not pencil

The hypothesis the data killed

The signal in the testing process itself

Default scenario: sunset

What the campaign was actually for

The question for the reader

The campaign ends here, for now

Sources

Further reading

Buy vs build

The verdict series

AI-assisted engineering