AI JOBS·Frontier Model Evaluation·ROLE · ROLE

Prompt Robustness Model Evaluation Specialist.

Prompt Robustness Model Evaluation Specialist is a remote evaluation track for reviewing prompt robustness model evaluation evaluation prompts and responses against AuraOne's quality rubric.

Apply now Browse open roles

TRACK

Evaluation & annotation

Aligned to the AuraOne specialist routing.

TYPE

Contractor

Remote-first specialist work, paid per accepted task.

LOCATION

Remote · Independent specialist contractor

Remote — US-eligible

ABOUT THE ROLE

Prompt Robustness Model Evaluation Specialist is a remote evaluation track for reviewing prompt robustness model evaluation evaluation prompts and responses against AuraOne's quality rubric. Reviewers compare paired outputs, label edge cases, and write the kind of structured feedback the modeling team can use to retrain.

AI data reviewers help turn prompt robustness model evaluation evaluation outputs into auditable labels, rationales, and regression cases for AuraOne Human Data.

Review advanced model outputs, benchmark failures, rubric decisions, and evaluator calibration across frontier AI workflows.

RESPONSIBILITIES

Evaluate prompt robustness model evaluation evaluation model outputs against a versioned rubric and assign severity tags for Prompt Robustness Model Evaluation Specialist assignments.
Compare paired responses and pick the stronger answer with a written rationale.
Label hallucinations, instruction-following failures, and unsafe content with structured tags.
Capture ambiguous prompts and route them back to the program team for rubric updates.
Maintain reviewer-quality scores by calibrating against gold-standard examples each week.
Document recurring failure modes so the modeling team can target them in the next training run.

REQUIREMENTS

Prior evaluation, annotation, or human-rater experience on prompt robustness model evaluation evaluation or adjacent content for Prompt Robustness Model Evaluation Specialist work.
Comfort applying multi-page rubrics consistently across long batches.
Clear written reasoning that names the issue and the rubric clause being applied.
Strong attention to detail and the ability to flag when a prompt itself is the problem.
Reliable async availability for at least 10 hours per week.

EXAMPLE TASKS

Compare two prompt robustness model evaluation evaluation model responses to the same prompt and pick the stronger one with rationale.
Tag an unsafe response with the correct policy category and severity.
Audit a 50-row batch for rubric consistency and report drift to the program lead.
Propose a rubric clarification after spotting a recurring failure mode.

NICE TO HAVE

Background in linguistics, content moderation, or trust & safety review.
Experience with inter-rater agreement metrics and calibration cycles.
Domain expertise that lets you spot subject-matter errors automated checks miss.

COMPENSATION

Hourly rate confirmed after the interview process.

Expected schedule: contractor, remote specialist work with program-defined task volume and review pacing.

SKILLS USED IN MATCHING

Model output evaluation
Rubric-based annotation
Severity tagging
Inter-rater calibration
Prompt Robustness Model Evaluation evaluation
Frontier evaluation
Rubric calibration
Failure analysis
Prompt
Robustness

HOW TO APPLY

AuraOne uses a shared specialist intake to confirm track fit, review readiness, and the best queue for your profile. Applications submitted from partner job boards carry the source, role, and category on the apply URL.

Apply now Browse other roles

STAY IN THIS TRACK

Other roles in Frontier Model Evaluation

See full track →

ROLE · ROLE

No. 001

Agent Regression Test Evaluation Specialist

ContractorRemote · Independent specialist contractor

Agent Regression Test Evaluation Specialist is a remote evaluation track for reviewing agent regression test evaluation evaluation...

Apply now View role

ROLE · ROLE

No. 002

Agent Regression Test Workflow Reviewer

ContractorRemote · Independent specialist contractor

Agent Regression Test Workflow Reviewer is a remote evaluation track for reviewing agent regression test workflow evaluation prompts and...

Apply now View role

ROLE · ROLE

No. 003

Agentic Task Completion Failure Analysis Reviewer

ContractorRemote · Independent specialist contractor

Agentic Task Completion Failure Analysis Reviewer is a remote evaluation track for reviewing agentic task completion failure analysis...

Apply now View role

READY · APPLY NOW

One intake. One reviewed record.

Submit your specialist intake once. AuraOne routes you to the track that matches your work and reviews your file.

Apply now Browse open roles

AI JOBS · LOADING

OPEN ROLES