Why Frontier Labs Are Displacing Scale, Surge, Mercor, and Handshake

The people behind the model are the product.

Every frontier lab figured that out in the last eighteen months. What they are still figuring out is what to do about it. Specifically, about the four vendors whose invoices make up most of the human-data line in the budget.

Scale AI. The capture-and-annotate vendor. Volume at scale. A workhorse for most of the frontier.

Surge AI. The preference-data shop. Ranked pairs at volume. Same customer list, different invoice.

Mercor. The recruitment platform for specialists. Sourcing strong. Quality-after-hire, not the product.

Handshake AI. University-credentialed specialists. Similar sourcing. Same gap past hire.

Four vendors. Four contracts. Four data-processing agreements. Four sets of access controls. Four places where the record of a preference pair and the reviewer who produced it can fall into a seam and never come back out.

Each of these vendors solved a real problem in the era they were built for. The era they were built for is ending. And the frontier labs that led their adoption are the ones leading the displacement now.

What changed

Three things, in eighteen months.

Specialists replaced crowdworkers. The ceiling on cheap labels dropped below what a frontier model already knew. Training a post-training model on preference pairs written by generalists became a waste of compute. The preference pairs had to come from credentialed domain experts — radiologists, pharmacologists, licensed attorneys, senior engineers. Vendors built for crowd-scale throughput had to bolt on a credentialed specialist pool. The bolt-on is not the product.

The record became the deliverable. A frontier lab used to need labels. The lab now needs the labels plus the roster of who produced them, plus the calibration session each reviewer passed, plus the agreement tracking that shows whether the reviewer stayed calibrated, plus the regression memory that shows whether past failures are still being caught. A vendor that ships labels but not the record forces the customer to build the record alongside. That is a six-FTE integration team. That is the cost the math used to hide. And when a vendor holds that record loosely, the customer inherits the blast radius: the March 2026 supply-chain compromise at Mercor exfiltrated roughly 4TB — contractor PII for 40,000-plus people, video interviews, and reportedly labeling protocols and RLHF strategies belonging to multiple frontier labs. Meta paused its work indefinitely. The custody question stopped being a procurement nicety the day that happened.

Compliance became a procurement checkbox. The EU AI Act's high-risk training-data provenance provisions begin enforcement in August 2026, and most organizations are not ready — surveys put 78% as unable to validate data before training and 77% unable to trace training-data origins. An AI Act audit, an FDA premarket submission, an OCC review of a decisioning model — each of these now demands lineage that ties a training example back to a credentialed reviewer who produced it. A vendor that cannot answer that question at audit time is a vendor the customer cannot use on the regulated workloads, which is to say on the workloads where the budget actually is.

None of the four vendors were structurally built to solve all three of these at once. One system was. That is the displacement story.

What the one system looks like

Four modules, under one record.

Workforce. The reviewer operating layer. Who is calibrated on what, who is working now, who drifted last week, who is ready for a harder case. One roster. Live. Capable of being the answer to the compliance question "who reviewed this."

Cleo. Specialist sourcing, ranked shortlists, and structured interviews on one record. Outreach against a credentialed pool, shortlists ready in hours, and a calibrated first round at volume with a defined rubric — scored live, and handed to the hiring lead as a ready packet. Matches happen inside the roster — credential verified, availability known, quality score attached. First-round conversion rates higher than unstructured phone screens because the structure is the point.

Annotation. Where reviewed work is produced. Where calibration history attaches to every output. Where the output carries the metadata a regulator will ask about.

Together these three make up the part of AuraOne sometimes called the Human Data OS — the layer a lab runs when the people behind the model are the product.

What "displacement" actually means in practice

Be specific. None of this is theoretical.

A frontier AI lab in the post-training phase of a new flagship model. Three existing vendor contracts — one on capture, one on preference data, one on specialist sourcing. Six integration engineers maintaining the seams. A compliance team reconstructing lineage by hand every time an audit request lands.

The displacement does not happen in one procurement cycle. It happens in three.

Cycle one. Replace the specialist sourcing vendor with Cleo. The shortlist-to-hire timeline drops from weeks to hours. The candidate records carry calibration baselines, so the first calibration session at the lab is faster. The sourcing line in the budget drops by a factor that makes the pilot self-funding.

Cycle two. Replace the preference-data vendor with Workforce and Annotation. The preference pairs start carrying reviewer metadata, calibration history, and agreement scores. The reward model training data is now traceable from example back to the reviewer who produced it. The integration team that used to maintain the preference-data ETL is freed for real work.

Cycle three. Replace the capture-and-annotate vendor. By now the roster has been running under one record for six months. The calibration telemetry is live. The regression replay gates every new reward model. The compliance team pulls lineage with a single query. The fourth invoice gets cancelled.

Three cycles. Eighteen months. One record at the end that would have been impossible to stitch together from four invoices.

Why the incumbents cannot follow

This is the part people do not want to write down.

Each of the four vendors was architected for a business model that rewards shipping volume, not operating a record. Adding the record to an existing capture platform is not a feature. It is a rewrite. The data model, the access control layer, the reviewer operations layer, the calibration telemetry — all of it is different. A vendor that ships labels and owes the customer the record would have to re-architect the product around a customer asset, and then rebuild the sales motion around a longer sales cycle and a different economic buyer.

Some of the four will try. The ones that try will be dragging a decade of legacy into the transition. The ones that do not try will lose the frontier first and the enterprise second.

This is not a prediction about technology. It is a prediction about what happens to a product category when the customer's requirements move past the structural bounds of the incumbents' architectures. It has happened to other categories before.

What a lab should do this quarter

If you are a post-training team inside a frontier lab, three moves.

One. Audit the human-data spend. Count the vendors. Count the integration engineers. Count the hours your compliance team spends reconstructing lineage. The total is usually a surprise on the first read.

Two. Pilot one module. Not all three. Start with Cleo on an open specialist search you would have sent to a sourcing vendor. Measure the cycle time. Measure the first-calibration quality of the specialists who land.

Three. Run one workflow down to one record. Pick a preference-data generation workflow. Run it on Workforce and Annotation for a quarter. Pull the lineage report at the end of the quarter. Compare to the baseline.

The displacement does not require a wholesale rip-and-replace. It requires the first honest count of what the current stack is actually costing you, and the first demo of what one record looks like when you run it.

Most labs we have worked with are surprised by the first number. Nobody we have worked with has been surprised by the second.

The people behind the model are the product. The record that ties them to the model is the deliverable.

One system ships both.

---

Ready to displace one vendor this quarter?

→ Why AuraOne vs Scale AI → Why AuraOne vs Surge AI → Why AuraOne vs Mercor → Why AuraOne vs Handshake AI → Talk to us