Team maps vendor workflows on a wall of sticky notes during a platform consolidation session
AI WorkforceFeatured Article

Why Frontier Labs Are Displacing Scale, Surge, Mercor, and Handshake

One lab is paying four vendors to do what should be one product. A capture vendor. A preference-data shop. A recruitment platform. A sourcing marketplace. The math on that arrangement stopped working. Here's what is replacing it — and why.

Written by
AuraOne AI Labs Team
April 15, 2026
12 min
ai-labsvendor-displacementworkforcecleoscale-aisurge-aimercorhandshake

Why Frontier Labs Are Displacing Scale, Surge, Mercor, and Handshake

The people behind the model are the product.

Every frontier lab figured that out in the last eighteen months. What they are still figuring out is what to do about it. Specifically, about the four vendors whose invoices make up most of the human-data line in the budget.

Scale AI. The capture-and-annotate vendor. Volume at scale. A workhorse for most of the frontier.

Surge AI. The preference-data shop. Ranked pairs at volume. Same customer list, different invoice.

Mercor. The recruitment platform for specialists. Sourcing strong. Quality-after-hire, not the product.

Handshake AI. University-credentialed specialists. Similar sourcing. Same gap past hire.

Four vendors. Four contracts. Four data-processing agreements. Four sets of access controls. Four places where the record of a preference pair and the reviewer who produced it can fall into a seam and never come back out.

Each of these vendors solved a real problem in the era they were built for. The era they were built for is ending. And the frontier labs that led their adoption are the ones leading the displacement now.

What changed

Three things, in eighteen months.

Specialists replaced crowdworkers. The ceiling on cheap labels dropped below what a frontier model already knew. Training a post-training model on preference pairs written by generalists became a waste of compute. The preference pairs had to come from credentialed domain experts — radiologists, pharmacologists, licensed attorneys, senior engineers. Vendors built for crowd-scale throughput had to bolt on a credentialed specialist pool. The bolt-on is not the product.

The record became the deliverable. A frontier lab used to need labels. The lab now needs the labels plus the roster of who produced them, plus the calibration session each reviewer passed, plus the agreement tracking that shows whether the reviewer stayed calibrated, plus the regression memory that shows whether past failures are still being caught. A vendor that ships labels but not the record forces the customer to build the record alongside. That is a six-FTE integration team. That is the cost the math used to hide.

Compliance became a procurement checkbox. An AI Act audit, an FDA premarket submission, an OCC review of a decisioning model — each of these now demands lineage that ties a training example back to a credentialed reviewer who produced it. A vendor that cannot answer that question at audit time is a vendor the customer cannot use on the regulated workloads, which is to say on the workloads where the budget actually is.

None of the four vendors were structurally built to solve all three of these at once. One system was. That is the displacement story.

What the one system looks like

Four modules, under one record.

Workforce. The reviewer operating layer. Who is calibrated on what, who is working now, who drifted last week, who is ready for a harder case. One roster. Live. Capable of being the answer to the compliance question "who reviewed this."

Cleo. Specialist sourcing, ranked shortlists, and structured interviews on one record. Outreach against a credentialed pool, shortlists ready in hours, and a calibrated first round at volume with a defined rubric — scored live, and handed to the hiring lead as a ready packet. Matches happen inside the roster — credential verified, availability known, quality score attached. First-round conversion rates higher than unstructured phone screens because the structure is the point.

Annotation. Where reviewed work is produced. Where calibration history attaches to every output. Where the output carries the metadata a regulator will ask about.

Together these three make up the part of AuraOne sometimes called the Human Data OS — the layer a lab runs when the people behind the model are the product.

What "displacement" actually means in practice

Be specific. None of this is theoretical.

A frontier AI lab in the post-training phase of a new flagship model. Three existing vendor contracts — one on capture, one on preference data, one on specialist sourcing. Six integration engineers maintaining the seams. A compliance team reconstructing lineage by hand every time an audit request lands.

The displacement does not happen in one procurement cycle. It happens in three.

Cycle one. Replace the specialist sourcing vendor with Cleo. The shortlist-to-hire timeline drops from weeks to hours. The candidate records carry calibration baselines, so the first calibration session at the lab is faster. The sourcing line in the budget drops by a factor that makes the pilot self-funding.

Cycle two. Replace the preference-data vendor with Workforce and Annotation. The preference pairs start carrying reviewer metadata, calibration history, and agreement scores. The reward model training data is now traceable end to end. The integration team that used to maintain the preference-data ETL is freed for real work.

Cycle three. Replace the capture-and-annotate vendor. By now the roster has been running under one record for six months. The calibration telemetry is live. The regression replay gates every new reward model. The compliance team pulls lineage with a single query. The fourth invoice gets cancelled.

Three cycles. Eighteen months. One record at the end that would have been impossible to stitch together from four invoices.

Why the incumbents cannot follow

This is the part people do not want to write down.

Each of the four vendors was architected for a business model that rewards shipping volume, not operating a record. Adding the record to an existing capture platform is not a feature. It is a rewrite. The data model, the access control layer, the reviewer operations layer, the calibration telemetry — all of it is different. A vendor that ships labels and owes the customer the record would have to re-architect the product around a customer asset, and then rebuild the sales motion around a longer sales cycle and a different economic buyer.

Some of the four will try. The ones that try will be dragging a decade of legacy into the transition. The ones that do not try will lose the frontier first and the enterprise second.

This is not a prediction about technology. It is a prediction about what happens to a product category when the customer's requirements move past the structural bounds of the incumbents' architectures. It has happened to other categories before.

What a lab should do this quarter

If you are a post-training team inside a frontier lab, three moves.

One. Audit the human-data spend. Count the vendors. Count the integration engineers. Count the hours your compliance team spends reconstructing lineage. The total is usually a surprise on the first read.

Two. Pilot one module. Not all three. Start with Cleo on an open specialist search you would have sent to a sourcing vendor. Measure the cycle time. Measure the first-calibration quality of the specialists who land.

Three. Run one workflow end to one record. Pick a preference-data generation workflow. Run it on Workforce and Annotation for a quarter. Pull the lineage report at the end of the quarter. Compare to the baseline.

The displacement does not require a revolution. It requires the first honest count of what the current stack is actually costing you, and the first demo of what one record looks like when you run it.

Most labs we have worked with are surprised by the first number. Nobody we have worked with has been surprised by the second.

The people behind the model are the product. The record that ties them to the model is the deliverable.

One system ships both.

---

Ready to displace one vendor this quarter?

Why AuraOne vs Scale AIWhy AuraOne vs Surge AIWhy AuraOne vs MercorWhy AuraOne vs Handshake AITalk to us

Written by
AuraOne AI Labs Team

Building AI evaluation and hybrid intelligence at AuraOne.

Get new AuraOne dispatches

Evaluation, production operations, hybrid AI — as it publishes.

No spam. Unsubscribe anytime.

Ready to Start

Turn this article into a working evaluation path

Move from the editorial take into product proof, implementation docs, or a guided walkthrough.

Keep the next step obvious.